We can use the analogies to evaluate the impact of different parameter settings. The following results stand out (see detailed results in the models folder):
- Negative sampling outperforms the hierarchical softmax, while also training faster
- The Skip-Gram architecture outperforms CBOW given the objective function
- Different min_count settings have a smaller impact, with the midpoint of 50 performing best
Further experiments with the best performing SG model, using negative sampling and a min_count of 50, show the following:
- Smaller context windows than five lower the performance
- A higher negative sampling rate improves performance at the expense of slower training
- Larger vectors improve performance, with a size of 600 yielding the best accuracy at 38.5%