File structure - best practices to store your data

The file structure for this chapter was outlined in the logistics section, but I will go over it again here in order to make a few points. It is convenient to keep all of your data and code in the same folder so that you can easily keep track of where everything is for a particular project. I like to go a step further and put all of the data in another folder within the project folder. The following is the directory structure I will be using in this chapter:

There is a practical reason for the separate data folder beyond mental organization. In Atom and many other text editors, accidentally opening a data file that is too large can crash the text editor. Keeping the data files in an entirely separate directory helps you avoid accidentally clicking on and opening a data file instead of a code file.

The exact directory structure that you choose for a given project should vary depending on the specifics of the project. For example, I will sometimes subdivide my data folder when working with multiple different data sources. I also sometimes include an additional directory for my code when I use several code files in the same project.

You may also hear of something called version control (that is, Git). Version control is a method to keep track of changes to a code base. I won't be using version control in this book, but if you do find yourself needing to use version control for a data wrangling project, note that it is important not to include large data files in the tracking. Keeping all of the data files in a separate location makes.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset