Once you have the basics down, some simple tricks will help you build good models faster:
Time series training sets will often be generated with a time shift or lag. Discovering this can help you on Kaggle competitions that hide the source of the data, like the Santander Value Prediction competition (www.kaggle.com/c/santander-value-prediction-challenge/discussion/61394).
Hyperparameters are all the values that determine the performance of your pipeline, including the model type and how it’s configured. This can be things like how many neurons and layers are in a neural network or the value of alpha in a sklearn.linear_model.Ridge regressor. Hyperparameters also include the values that govern any preprocessing steps, like the tokenizer type, any list of words that are ignored, the minimum and maximum document frequency for the TF-IDF vocabulary, whether or not to use a lemmatizer, the TF-IDF normalization approach, and so on.
Hyperparameter tuning can be a slow process, because each experiment requires you to train and validate a new model. So it pays to reduce your dataset size to a minimum representative sample while you’re searching a broad range of hyperparameters. When your search gets close to the final model that you think is going to meet your needs, you can increase the dataset size to use as much of the data as you need.
Tuning the hyperparameters of your pipeline is how you improve the performance of your model. Automating the hyperparameter tuning can give you more time to spend reading books like this or visualizing and analyzing your results. You can still guide the tuning with your intuition by setting the hyperparameter ranges to try.
The most efficient algorithms for hyperparameter tuning are (from best to worst)
But any algorithm that lets your computer do this searching at night while you sleep is better than manually guessing new parameters one by one.