Now that the package structure is in place, we can start adding the actual code. For starters, we copy and paste the code from Chapter 7, Scraping Data from the Web with Beautiful Soup 4, for the wiki.py package. As we want to have code for both collecting and cleaning in the same package, it sounds smart to create two sub-folders—collect and parse. The code from Chapter 7, Scraping Data from the Web with Beautiful Soup 4, will go to the latter one. For now, we will create two files—battles.py and fronts.py—in the parse folder. In Python, upon import, they will be mapped to a path such as wikiwwii.parse.battles, enabling access to all the functions and variables in them.
Next, we add the code for cleaning in a similar fashion. However, most of the cleaning code here is stored in the 1_data_cleaning.ipynb notebook. Of course, we could run a Jupyter server and copy and paste all the files into Visual Studio (VS) Code, but there is an even better option. Instead, open the command palette (Shift + command/Ctrl + P by default), select Python: Import Jupyter Notebook, and pick our notebook. As you'll see, VS Code will convert the file into a normal script, marking cells with the comments.
VS Code actually even allows the running of converted cells interactively, step by step. Moreover, it supports converting this file back into a notebook. This is useful when you need to tweak something in a notebook from within VS Code.
Here is the folder tree after we moved the actual code:
wikiwwii
├── README.md
├── pyproject.toml
├── tests
│ ├── __init__.py
│ └── test_wikiwwii.py
└── wikiwwii
├── __init__.py
├── collect
│ ├── __init__.py
│ ├── battles.py
│ └── fronts.py
└── parse
├── __init__.py
├── bellengerets.py
├── casualties.py
├── dates.py
├── geocode.py
└── qa.py