Adding actual code

Now that the package structure is in place, we can start adding the actual code. For starters, we copy and paste the code from Chapter 7, Scraping Data from the Web with Beautiful Soup 4, for the wiki.py package. As we want to have code for both collecting and cleaning in the same package, it sounds smart to create two sub-folders—collect and parse. The code from Chapter 7, Scraping Data from the Web with Beautiful Soup 4, will go to the latter one. For now, we will create two files—battles.py and fronts.py—in the parse folder. In Python, upon import, they will be mapped to a path such as wikiwwii.parse.battles, enabling access to all the functions and variables in them.

Next, we add the code for cleaning in a similar fashion. However, most of the cleaning code here is stored in the 1_data_cleaning.ipynb notebook. Of course, we could run a Jupyter server and copy and paste all the files into Visual Studio (VS) Code, but there is an even better option. Instead, open the command palette (Shift + command/Ctrl + P by default), select Python: Import Jupyter Notebook, and pick our notebook. As you'll see, VS Code will convert the file into a normal script, marking cells with the comments.

VS Code actually even allows the running of converted cells interactively, step by step. Moreover, it supports converting this file back into a notebook. This is useful when you need to tweak something in a notebook from within VS Code.

Here is the folder tree after we moved the actual code:

wikiwwii
├── README.md
├── pyproject.toml
├── tests
│ ├── __init__.py
│ └── test_wikiwwii.py
└── wikiwwii
  ├── __init__.py
  ├── collect
  │ ├── __init__.py
  │ ├── battles.py
  │ └── fronts.py
  └── parse
    ├── __init__.py
    ├── bellengerets.py
    ├── casualties.py
    ├── dates.py
    ├── geocode.py
    └── qa.py

The code in packages is not generally meant to be run directly on import; thus, it (usually) consists only of functions, variables, and objects. Until it is clearly needed, consider it a bad practice to actually execute code directly in a package—it will then be executed every time someone import from this file. Similarly, where possible, try not to import packages you don't need, or generate big structures, until you actually need them. It is a good practice to import dependency packages only where they are necessary so that even if the package is missing, some code will still be executable.

Table of Contents for Adding actual code

Create new playlist

Sign In

Sign Up

Table of Contents for
Adding actual code