Research papers and talks

One of the best way to gain a deep understanding of a topic is to try to repeat the experiments of researchers and then modify them in some way. That’s how the best professors and mentors “teach” their students, by just encouraging them to try to duplicate the results of other researchers they’re interested in. You can’t help but tweak an approach if you spend enough time trying to get it to work for you.

Vector space models and semantic search

  • Semantic Vector Encoding and Similarity Search Using Fulltext Search Engines (https://arxiv.org/pdf/1706.00957.pdf)—Jan Rygl et al. were able to use a conventional inverted index to implement efficient semantic search for all of Wikipedia.
  • Learning Low-Dimensional Metrics (https://papers.nips.cc/paper/7002-learning-low-dimensional-metrics.pdf)—Lalit Jain et al. were able to incorporate human judgement into pairwise distance metrics, which can be used for better decision-making and unsupervised clustering of word vectors and topic vectors. For example, recruiters can use this to steer a content-based recommendation engine that matches resumes with job descriptions.
  • RAND-WALK: A latent variable model approach to word embeddings (https://arxiv.org/pdf/1502.03520.pdf) by Sanjeev Arora, Yuanzhi Li, Yingyu Liang, Tengyu Ma, and Andrej Risteski—Explains the latest (2016) understanding of the “vector-oriented reasoning” of Word2vec and other word vector space models, particularly analogy questions
  • Efficient Estimation of Word Representations in Vector Space (https://arxiv.org/pdf/1301.3781.pdf) by Tomas Mikolov, Greg Corrado, Kai Chen, and Jeffrey Dean at Google, Sep 2013—First publication of the Word2vec model, including an implementation in C++ and pretrained models using a Google News corpus
  • Distributed Representations of Words and Phrases and their Compositionality (https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf) by Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean at Google—Describes refinements to the Word2vec model that improved its accuracy, including subsampling and negative sampling
  • From Distributional to Semantic Similarity (https://www.era.lib.ed.ac.uk/bitstream/handle/1842/563/IP030023.pdf) 2003 Ph.D. Thesis by James Richard Curran —Lots of classic information retrieval (full-text search) research, including TF-IDF normalization and page rank techniques for web search

Finance

Question answering systems

Deep learning

LSTMs and RNNs

We had a lot of difficulty understanding the terminology and architecture of LSTMs. This is a gathering of the most cited references, so you can let the authors “vote” on the right way to talk about LSTMs. The state of the Wikipedia page (and Talk page discussion) on LSTMs is a pretty good indication of the lack of consensus about what LSTM means:

  • Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation (https://arxiv.org/pdf/1406.1078.pdf) by Cho et al.—Explains how the contents of the memory cells in an LSTM layer can be used as an embedding that can encode variable length sequences and then decode them to a new variable length sequence with a potentially different length, translating or transcoding one sequence into another.
  • Reinforcement Learning with Long Short-Term Memory (https://papers.nips.cc/paper/1953-reinforcement-learning-with-long-short-term-memory.pdf) by Bram Bakker—Application of LSTMs to planning and anticipation cognition with demonstrations of a network that can solve the T-maze navigation problem and an advanced pole-balancing (inverted pendulum) problem.
  • Supervised Sequence Labelling with Recurrent Neural Networks (https://mediatum.ub.tum.de/doc/673554/file.pdf)—Thesis by Alex Graves with advisor B. Brugge; a detailed explanation of the mathematics for the exact gradient for LSTMs as first proposed by Hochreiter and Schmidhuber in 1997. But Graves fails to define terms like CEC or LSTM block/cell rigorously.
  • Theano LSTM documentation (http://deeplearning.net/tutorial/lstm.html) by Pierre Luc Carrier and Kyunghyun Cho—Diagram and discussion to explain the LSTM implementation in Theano and Keras.
  • Learning to Forget: Continual Prediction with LSTM (http://mng.bz/4v5V) by Felix A. Gers, Jurgen Schmidhuber, and Fred Cummins—Uses nonstandard notation for layer inputs (yin) and outputs (yout) and internal hidden state (h). All math and diagrams are “vectorized.”
  • Sequence to Sequence Learning with Neural Networks (http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf) by Ilya Sutskever, Oriol Vinyals, and Quoc V. Le at Google.
  • Understanding LSTM Networks (http://colah.github.io/posts/2015-08-Understanding-LSTMs) 2015 blog by Charles Olah—lots of good diagrams and discussion/feedback from readers.
  • Long Short-Term Memory (http://www.bioinf.jku.at/publications/older/2604.pdf) by Sepp Hochreiter and Jurgen Schmidhuber, 1997—Original paper on LSTMs with outdated terminology and inefficient implementation, but detailed mathematical derivation.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset