Searching, scanning, and querying

A search can be inefficient if we examine all of the objects in a database, and apply a filter. We'd prefer to work with a more focused subset of items. We'll look at how we can create more useful indices in the Creating indexes to improve efficiency section. The fallback plan of brute-force scanning all the objects, however, always works. For searches that occur rarely, the computation required to create a more efficient index may not be worth the time saved. 

Don't panic; searching, scanning, and querying are synonyms. We'll use the terms interchangeably.

When a child class has an independent-style key, we can scan a shelf for all instances of some Child class using an iterator over the keys. Here's a generator expression that locates all the children:

children = (shelf[k] 
for k in shelf.keys()
if k.startswith("Child:"))

This looks at every single key in the shelf to pick the subset that begins with "Child:". We can build on this to apply more criteria by using a more complex generator expression:

children_by_title = (c 
for c in children
if c.startswith("Child:") and c.title == "some title")

We've used a nested generator expression to expand on the initial children query, adding criteria. Nested generator expressions such as this are remarkably efficient in Python. This does not make two scans of the database. It's a single scan with two conditions. Each result from the inner generator feeds the outer generator to build the result.

When a child class has a dependent-style key, we can search the shelf for children of a specific parent using an iterator with a more complex matching rule. Here's a generator expression that locates all the children of a given parent:

children_of = (shelf[k] 
for k in shelf.keys()
if k.startswith(parent+":Child:"))

This dependent-style key structure makes it particularly easy to remove a parent and all children in a simple loop:

query = (key
for key in shelf.keys()
if key.startswith(parent))
for k in query: del shelf[k]

When using hierarchical "Parent:pid:Child:cid" keys, we do have to be careful when separating parents from their children. With this multi-part key, we'll see lots of object keys that start with "Parent:pid". One of these keys will be the proper parent, simply "Parent:pid". The other keys will be children with "Parent:pid:Child:cid". We have three kinds of conditions that we'll often use for these brute-force searches:

  • key.startswith(f"Parent:{pid}"): Finds a union of parents and children; this isn't a common requirement.
  • key.startswith(f"Parent:{pid}:Child:"): Finds children of the given parent. An alternative to startswith() is a regular expression, such as r"^(Parent:d+):(Child:d+)$", to match the keys.
  • key.startswith(f"Parent:{pid}") and ":Child:" not in key: Finds parents, excluding any children. An alternative is to use a regular expression, such as r"^Parent:d+$", to match the keys.

All of these queries can be optimized by building indices to restrict the search space to more meaningful subsets.

Let's take a look at how to design an access layer for shelve.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset