There are many caveats of machine learning. Many are specific to different models being implemented, but there are some assumptions that are universal for any machine learning model, as follows:
Almost no machine learning model will tolerate dirty data with missing values or categorical values. Use dummy variables and filling/dropping techniques to handle these discrepancies.
This assumption is particularly important. Many machine learning models take this assumption very seriously. These models are not able to communicate that there might not be a relationship.
The machine is very smart but has a hard time putting things into context. The output of most models is a series of numbers and metrics attempting to quantify how well the model did. It is up to a human to put these metrics into perspective and communicate the results to an audience
This means that the models get confused when you include data that doesn't make sense. For example, if you are attempting to find relationships between economic data around the world and one of your columns is puppy adoption rates in the capital city, that information is likely not to be relevant and will confuse the model.
These assumptions will come up again and again when dealing with machine learning. They are all too important and often ignored by novice data scientists.