Programming in the large

Here's an example that shows us why we shouldn't put unique, working code into the __main__.py module. We'll show you a quick hypothetical example based on extending an existing package.

Imagine that we have a generic statistical package, named analysis, with a top-level __main__.py module. This implements a command-line interface that will compute descriptive statistics of a given CSV file. This application has a command-line API as follows:

python3 -m analysis -c 10 some_file.csv 

This command uses a -c option to specify which column to analyze. The input filename is provided as a positional argument on the command line.

Let's assume, further, that we have a terrible design problem. We've defined a high-level function, analyze(), in the analysis/__main__.py module. Here's an outline of the __main__.py module:

import argparse
from analysis import some_algorithm

def analyze(config: argparse.Namespace) -> None: ...

def main(argv: List[str] = sys.argv[1:]) -> None: ...

if __name__ == "__main__":
main(sys.argv[1:])

The analysis package includes a __main__.py module. This module does more than just simply run functions and classes defined elsewhere. It also includes a unique, reusable function definition, analyze(). This is not a problem until we try to reuse elements of the analysis package.

Our goal is to combine this with our Blackjack simulation. Because of the design error here, this won't work out well. We might think we can do this:

import analysis 
import simulation 
import types 

def sim_and_analyze(): with simulation.Build_Config() as config_sim: config_sim.outputfile = "some_file.csv"
s = simulation.Simulate()
s.configure(config_sim) s.run() config_stats = types.SimpleNamespace(
column=10, input="some_file.csv") analysis.analyze(config_stats)

We tried to use analysis.analyze(), assuming that the useful analyze() function was part of a simple module. The Python naming rules make it appear as though analysis is a module with a function named analyze(). Most of the time, the implementation details of the module and package structure don't matter for successful use. This, however, is an example of reuse and Programming In The Large, where the structure needs to be transparent.

This kind of simple composition was made needlessly difficult by defining a function in __main__. We want to avoid being forced to do this:

def analyze(column, filename): 
    import subprocess 
    subprocess.run(
["python3", "-m", "stats", "-c", column, filename])

We shouldn't need to create composite Python applications via the command-line API. In order to create a sensible composition of the existing applications, we might be forced to refactor analysis/__main__.py to remove any definitions from this module and push them up into the package as a whole.

The next section shows how to design long-running applications.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset