Getting raw data

The raw_data() function is similar to the example shown in Chapter 3, Functions, Iterators, and Generators. We included some important changes. Here's what we're using for this application:

from Chapter_3.ch03_ex5 import (
series, head_map_filter, row_iter)
from typing import (
NamedTuple, Callable, List, Tuple, Iterable, Dict, Any)

RawPairIter = Iterable[Tuple[float, float]]

class Pair(NamedTuple):
x: float
y: float

pairs: Callable[[RawPairIter], List[Pair]]
= lambda source: list(Pair(*row) for row in source)

def raw_data() -> Dict[str, List[Pair]]:
with open("Anscombe.txt") as source:
data = tuple(head_map_filter(row_iter(source)))
mapping = {
id_str: pairs(series(id_num, data))
for id_num, id_str in enumerate(
['I', 'II', 'III', 'IV'])
}
return mapping

The raw_data() function opens the local data file, and applies the row_iter() function to return each line of the file parsed into a row of separate items. We applied the head_map_filter() function to remove the heading from the file. The result created a tuple-of-list structure, which is assigned the variable data. This handles parsing the input into a structure that's useful. The resulting structure is an instance of the Pair subclass of the NamedTuple class, with two fields that have float as their type hints.

We used a dictionary comprehension to build the mapping from id_str to pairs assembled from the results of the series() function. The series() function extracts (xy) pairs from the input document. In the document, each series is in two adjacent columns. The series named I is in columns zero and one; the series() function extracts the relevant column pairs.

The pairs() function is created as a lambda object because it's a small generator function with a single parameter. This function builds the desired NamedTuple objects from the sequence of anonymous tuples created by the series() function.

Since the output from the raw_data() function is a mapping, we can do something like the following example to pick a specific series by name:

 >>> raw_data()['I']
[Pair(x=10.0, y=8.04), Pair(x=8.0, y=6.95), ...

Given a key, for example, 'I', the series is a list of Pair objects that have the x, y values for each item in the series.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset