Deciding how to improve the performance

To improve on this, we basically have the following options:

  • Add more data: Maybe there is just not enough data for the learning algorithm; adding more training data should help.
  • Play with the model complexity: Maybe the model is not complex enough? Or maybe it is already too complex? In this case, we could decrease k so that it would take fewer nearest-neighbors into account and thus be better at predicting non-smooth data. Or we could increase it to achieve the opposite.
  • Modify the feature space: Maybe we do not have the right set of features? We could be missing some important aspect of the posts. Or should we remove some of our current features in case some features are aliasing others?
  • Change the model: Maybe kNN isn't a good fit for our use case; maybe it will never be capable of achieving good prediction performance, no matter how complex we allow it to be and how sophisticated the feature space becomes.

Stuck at this point, people often try to improve the current performance by randomly picking one of these options and trying it out in no particular order, hoping to find the golden configuration by chance. We could do the same here, but it will surely take longer than making informed decisions. Let's take the informed route, for which we need to introduce the bias-variance tradeoff.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset