Appendix F. Locality sensitive hashing

In chapter 4, you learned how to create topic vectors with hundreds of dimensions of real-valued (floating point) numbers. In chapter 6, you learned how to create word vectors that have hundreds of dimensions. Even though you can do useful math operations on these vectors, you cannot quickly search them like you can discrete vectors or strings. Databases don’t have efficient indexing schemes for vectors of more than four dimensions.[1] To use word vectors and document topic vectors efficiently, you need a search index that can help find the nearest neighbors for any given vector.

1

Some advanced databases such as PostgreSQL can index higher-dimensional vectors, but efficiency drops quickly with dimensionality.

You need this to convert the results of vector math into a word or set of words (because the resultant vector is never an exact match for the vector of a word in the English language). You also need it to do semantic search. This appendix shows an example approach based on locality sensitive hashing (LSH).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset