Filtering CSV rows with an iterator

We can refactor the previous load example to iterate through the Blog objects rather than constructing a list of the Blog objects. This allows us to skim through a large CSV file and locate just the relevant Blog and Post rows. This function is a generator that yields each individual Blog instance separately:

def blog_iter(source: TextIO) -> Iterator[Blog]:
rdr = csv.reader(source)
header = next(rdr)
assert header == ["__class__", "title", "date", "title", "rst_text", "tags"]
blog = None
for r in rdr:
if r[0] == "Blog":
if blog:
yield blog
blog = blog_builder(r)
elif r[0] == "Post":
post = post_builder(r)
blog.append(post)
if blog:
yield blog

This blog_iter() function creates the Blog object and appends the Post objects. Each time a Blog header appears, the previous Blog is complete and can be yielded. At the end, the final Blog object must also be yielded. If we want the large list of Blog instances, we can use the following code:

with (Path.cwd()/"data"/"ch10_blog3.csv").open() as source:
blogs = list(blog_iter(source))

This will use the iterator to build a list of Blogs in the rare cases that we actually want the entire sequence in memory. We can use the following to process each Blog individually, rendering it to create reST files:

with (Path.cwd()/"data"/"ch10_blog3.csv").open() as source:
for b in blog_iter(source): with open(blog.title+'.rst','w') as rst_file: render(blog, rst_file)

We used the blog_iter() function to read each blog. After being read, it can be rendered into an .rst format file. A separate process can run rst2html.py to convert each blog into HTML.

We can easily add a filter to process only selected Blog instances. Rather than simply rendering all the Blog instances, we can add an if statement to decide which Blogs should be rendered.

Let's see how to dump and load joined rows into a CSV file.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset