Understanding Jupyter

Finally, there is Jupyter. We're familiar with this tool already, as it proved invaluable for teachingand learning Python on simple examples, but it especially shines for data science; given its rich media and visualization capabilities, Jupyter is an excellent environment for data analysis. It allows quick iteration and experimentation, supports markdown documentation and rich mediaimages, plots, interactive widgets, video, and so on. Of course, Jupyter is 100% open source and free. 

Jupyter is also language agnostic. At the moment, there is a handful of languages to use with Jupyter, including Ruby, C, Rust, R, and many more. It also supports third-party plugins, for example, leaflet and Mapbox viewers for GeoJSON files or the Vega data visualization viewer. Another advantage is that Jupyter Notebooks are properly rendered on GitHub, so you can read other people's code from the repository with no need to run your own server.

On top of that, Jupyter can be spawned remotely using JupyterHub, on one server or even on a cluster of machines via Kubernetes or similar orchestration software. Hence, it is a perfect engine for remote work (if, for example, your data is too big to access on one machine or cannot be transferred due to security reasons). It is also a great environment for teaching, as it helps teachers to ensure all students have the same environment and are generally in an equal position. Finally, it is proven to be a good tool for writing code-related books—this way, all code is executable and can be tested.

Recently, notebooks started to get traction as some sort of interactive logs: large data-driven companies, such as Netflix, realized they can parametrize notebooks and ran them via some pipeline scheduler (Apache airflow, for examplemore on pipelines in Chapter 17, Let's Build a Dashboard). Once the pipeline is executed (or failed), a notebook can be stored as an artifact with all of the warnings, printed samples, and plots. Using certain data visualization libraries and techniques allows the storing of snippets of data within the notebook, keeping the resulting plots interactive.

Jupyter is a great environment for research and constantly adds more functionality. Here, we're finishing our exposéit is now time to get our hands dirty with data!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset