One interesting thing to play around with is how to classify distinct words. For instance, are Free, free, and FREE the same words? What about punctuation?
Please note that the sample code is written for optimal teaching, instead of for performance. There are some clear, trivial changes that could drastically improve its performance.