Key information

Key information is stored right below the map and includes dates, locations, and outcomes. Let's call this section the main one. As you can see via the developer console, this section is designed as a table with two columns – the first column representing a key (metric names), and the second, corresponding values. Indeed, it is very similar to how dictionaries are structured, so let's write a generic converter from this two-column table to a dictionary. Take a look at the following snippet. Here, we traverse through rows, adding the value of the first column as a key, and the second as a value, to the dictionary:

def _table_to_dict(table):
result = {}
for row in table.find_all('tr'):
result[row.th.text] = row.td.get_text().strip()

return result

Now, we can select the section and parse it. Again, we can find all rows in the info card and select this section by its order, but this approach will fail if there are a different number of sections or a different order. In contrast to the previous task, we cannot tolerate this now; given the dozens of links we have to hand, we have to write robust code that can work with any structure. So, instead of the order, let's search by the content – say, all sections containing the Location string. In the following code snippet, we do precisely that – traverse through all the tables within the info card, and pull only those with the Location word inside. Assuming that there is only one such table per page, we then pull the first one and transform it into a dictionary:

def _get_main_info(table):
main = [el for el in table.tbody.find_all('tr', recursive=False) if 'Location' in el.get_text()][0]
return {'main': _table_to_dict(main) }

Now, let's test it by running table we pulled from the Operation Skorpion page. As you can see here, it seems to work perfectly:

>>> _get_main_info(table)
{'main': {'Date': '26–27 May 1941', 'Location': 'Halfaya Pass, Egypt31°30′N 25°11′Eufeff / ufeff31.500°N 25.183°Eufeff / 31.500; 25.183Coordinates: 31°30′N 25°11′Eufeff / ufeff31.500°N 25.183°Eufeff / 31.500; 25.183', 'Result': 'Axis victory', 'Territorial': 'Axis re-captured Halfaya Pass'}}

Of course, the location string is a mess, but we should resist the temptation to parse it right now – all parsing should be done once the data is collected! Also, as you'll see, many pages won't have geocoordinates, so our attempt to parse those would fail in any case. Next, let's collect the supplementary information from each page.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset