Adding yet more index maintenance

Clearly, the index maintenance aspect of a shelf can grow. With our simple data model, we could easily add more top-level indexes for tags, dates, and titles of Posts. Here's another access-layer implementation that defines two indices for Blogs. One index simply lists the keys for Blog entries. The other index provides keys based on the Blog title. We'll assume that the titles are not unique. We'll present this access layer in three parts. Here's the create part of the CRUD processing:

class Access4(Access3):

def new(self, path: Path) -> None:
super().new(path)
self.database["_Index:Blog_Title"] = dict()

def create_blog(self, blog):
super().create_blog(blog)
blog_title_dict = self.database["_Index:Blog_Title"]
blog_title_dict.setdefault(blog.title, [])
blog_title_dict[blog.title].append(blog._id)
self.database["_Index:Blog_Title"] = blog_title_dict
return blog

We added yet another index. In this example, there is a dict that provides us with a list of keys for a given title string. If each title is unique, each of these lists will be a singleton key. If the titles are not unique, each title will have a list of the Blog keys.

When we add a Blog instance, we also update the title index. The title index requires us to get the existing dict from the shelf, append to the list of keys mapped to the Blog's title, and then put the defaultdict back onto the shelf.

An update to the Blog object might involve changing the title of the Blog attribute. If there is a title change, this will lead to a complex pair of updates:

  1. Remove the old title from the index. Since each title has a list of keys, this operation removes one key from the list. If the list is now empty, the entire title entry can be removed from the dictionary.
  2. Add the new title to the index. This echoes the operation shown for adding a new Blog object.

Is this additional complexity needed? The only way to be sure is to gather actual performance details for the queries actually used by an application. There is a cost to maintaining an index, and time-saving from using an index to avoid search. There's a fine balance between these, and it often requires some data gathering and experimentation to determine the optimal use of a shelf.

Let's take a look at the writeback alternative to index updates.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset