Writing the file

Finally, we have all the data we wanted, in a more-or-less good condition. Let's store it in CSV format. We can always use other formats instead. For example, the pickle format, by definition, preserves all the data types and properties of the dataframe (we won't need to convert dates from strings again), but can't be read manually (it also has a number of security risks). CSV, on the other hand, can be opened manually or with something like Excel, edited, and then stored again if you observed that there are factual errors in the data or something that is easier to correct manually.

In the following code block, we export our CSV file into a dataframe just to specify a relative path to the file we want it to be. The index=None argument is optional—this ensures that the index (a generic range of numbers in our case) won't be written:

new_dataset.to_csv('./data/EF_battles.csv', index=None)

With that, our data is processed, converted, checked, and stored as a new CSV file. We're now ready to move on and (finally) analyze the data we obtained.

Given the sensitivity of the subject, we went ahead and cross-checked the main values, row by row, manually, and indeed, had to correct a few values. This work cannot be completely automated. The corrected version is stored as EF_battles_corrected.csv and will be used in all further chapters referring to the WWII dataset. 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset