Python is a widely used programming language that has been around for over 20 years. Among many other things, this language is quite popular for its simplicity and dynamic typing. Type(datum) dynamically determines the type of the data object. It has a syntax that allows programmers to write a very few lines of code. Python supports multiple programming paradigms that include functional, object-oriented, and procedural styles.
Python interpreters are available on almost every operating system that is in use. Its built-in data structures combined with dynamic binding make it very attractive to use as a high performance language to connect the existing manipulative components quickly. Even in distributed applications, Python is used as a glue in conjunction with Hive (NoSQL) to accomplish something very quick and efficient. Python, which is powerful and popular in the software development community, needs an interactive environment to create, edit, test, debug, and run programs.
An integrated development environment (IDE) is a software application that provides a comprehensive and powerful set of tools to build applications for target systems that run Windows, Linux, or Mac OS operating systems. These tools provide a single and consistent integrated environment and are designed to maximize productivity. There are many choices of IDE for Python programming. The details will be discussed in the following section of this chapter. In addition, we will discuss the following topics:
Analyzing and visualizing data requires several software tools: a text editor to write code (preferably the syntax highlight), additional tools and libraries to run and test the code, and perhaps another set of tools to present the results. There are many advantages of an IDE. Some notable ones are as follows:
Python 3.x is not backward compatible with the 2.x version. This is why Python 2.7 is still being used. In this book, we will use Python 2.7 and try not to focus on Python 3.x. This issue is beyond the scope of this book, and we recommend that you search for information about how to write code that works with different versions. Some IDE tools have specific instructions to use both these versions. In some cases, the code may have to be written a little differently.
Before discussing further about the Python IDEs, it is important to consider the different ways available to display interactive data visualization. There are many options to create interactive data visualization, but here, we will consider only two popular tools to accomplish this:
In the year 2001, Fernando Perez began working on IPython, an enhanced interactive Python shell with improvements, such as history caching, profiles, object information, and session logging. Originally focused on the interactive computing in Python, it later included Julia, R, Ruby, and so on. Some features—such as automatic parenthesizing and tab completion—are small timesavers and very productive in terms of usability. In standard Python, to do tab completion, you have to import a few modules, whereas IPython offers tab completion by default.
IPython provides the following rich set of tools for Python scripting:
The four most helpful commands for IPython with a brief description:
Command |
Description |
---|---|
|
This specifies the introduction and overview of IPython's features |
|
This denotes quick reference |
|
This specifies Python's help |
|
This gives information about identifiers |
The IPython notebook is a web-based interactive computational environment. Here, you can merge code, mathematics, and plotting into a single document.
IPython (http://ipython.scipy.org/) provides an enhanced interactive Python shell and is highly recommended mostly because data analysis and visualization are interactive in nature. IPython is supported on most platforms. Some added features that come with IPython are:
An example that was run on IPython is shown in the following screenshot. To learn more about IPython and the IPython notebook, refer to http://nbviewer.ipython.org.
Plotly is an online analytics and data visualization tool that provides online graphing, analytics, and statistical tools for better collaboration. This tool was built using Python with a user interface that uses JavaScript and a visualization library that uses D3.js, HTML, and CSS. Plotly includes the scientific graphic libraries for many languages, such as Arduino, Julia, MATLAB, Perl, Python, and R. For an example source of Plotly, refer to https://plot.ly/~etpinard/84/fig-31a-hans-roslings-bubble-chart-for-the-year-2007/.
The following is the infamous example of bubble chart that shows GDP per capita around the globe.
Plotly provides a convenient way to convert plots from matplotlib
to Plotly, as shown in the following code (assuming that you have a Plotly account and signed in with your credentials):
import plotly.plotly as py import matplotlib.pyplot as plt #auto sign-in with credentials or use py.sign_in() mpl_fig_obj = plt.figure() #code for creating matplotlib plot py.plot_mpl(mpl_fig_obj)
The following are some of the popular Python IDEs that are available today:
PyCharm is one of the few popular IDEs that has great features, and the community version is free. The PyCharm 4.0.6 community edition is the current version that is available for free download at https://www.jetbrains.com/pycharm/download. They have shortcuts reference cards available for Mac, Linux, and Windows. Dr. Pedro Kroger had written an elaborate description on PyCharm at http://pedrokroger.net/getting-started-pycharm-python-ide/. You can refer to this link for more details. Among many interesting features, the code wizard and the NumPy array viewer are shown in the following screenshot:
Polar projection can be done quickly, as shown in the preceding screenshot, and the creation of an array of random samples is shown in the following screenshot:
A similar random sample is created in a different IDE (such as Spyder); here is an example:
rand_4 = np.random.random_sample((2,2,2,2))-1 array([[[[-0.6565232 , -0.2920045 ], [-0.45976502, -0.70469325]], [[-0.80218558, -0.77538009], [-0.34687551, -0.42498698]]], [[[-0.60869175, -0.9553122 ], [-0.05888953, -0.70585856]], [[-0.69856656, -0.21664848], [-0.29017137, -0.61972867]]]])
PyDev is a plugin for the Eclipse IDE. In other words, rather than creating a new IDE, a plugin for Eclipse was sufficient to make use of other default functionalities that a regular IDE may have. PyDev supports code refactoring, graphical debugging, interactive console, code analysis, and code folding.
You can install PyDev as a plugin for Eclipse or install LiClipse, an advanced Eclipse distribution. LiClipse adds support not only for Python, but also for languages such as CoffeeScript, JavaScript, Django templates, and so on.
PyDev comes preinstalled in LiClipse, but it requires Java 7 to be installed first. For the complete installation steps, you can refer to http://pydev.org/manual_101_install.html.
IEP is another Python IDE that has similar tools available in other IDEs, but appears similar to any tool that you may have used on Microsoft Windows.
IEP is a cross-platform Python IDE aimed at interactivity and introspection, which makes it very suitable for scientific computing. Its practical design is aimed at simplicity and efficiency.
IEP consists of two main components, the editor and the shell, and uses a set of pluggable tools to help the programmer in various ways. Some example tools are source structure, project manager, interactive help, and workspace. Some key features are as follows:
The following screenshot shows how you can use two different versions of Python in the same IDE:
Some people do not consider IEP as an IDE tool, but it serves the purpose of developing the programs of Python, editing them, and running them. It supports multiple Python shells simultaneously. Therefore, it is a very productive tool for someone who wants to program using more than one GUI toolkit, such as PySide, PyQt4, GTK, and Tk interactively.
IEP is written in (pure) Python 3 and uses the Qt GUI toolkit, but it can be used to execute code on any Python version available. You can download IEP from http://www.iep-project.org/downloads.html.
Enthought Canopy has a free version that is released under the BSD-style license, which comes with GraphCanvas, SciMath, and Chaco as plotting tools, among several other libraries. Like all the IDEs, it has a text editor. It also has the IPython console that is quite useful to be able to run and visualize results. In addition, it comes with a graphics package manager as well. When Canopy is launched, it gives an option with an Editor, Package Manager, and Doc Browser to choose from. One may also attempt to use their training materials, as shown in the following screenshot:
Besides other development code, Canopy has the IPython notebook integrated and convenient functions that you can use to create data visualization. Like most IDEs, this has an editor, a file browser, and the IPython console. In addition, there is a status display that shows the current editing status. These components of Canopy IDE mainly perform the following:
The following screenshot shows the number highlighted. This represents the components of IDEs described before this. The file browser and Python panes can be dragged and dropped onto the different positions in a code editor window or outside the borders. When a pane is dragged, the location where it could dock is highlighted in blue, as shown in the following screenshot:
The documentation is organized via a browser called Canopy Documentation Browser, which is accessible from the Help menu. This includes the links to documentation for some commonly used Python packages.
One significant feature of Documentation Browser is that it provides easy access to the sample code presented in the documentation. When a user right-clicks on a sample code box, a display to the context menu is shown. Further, you can select the Copy code option to copy the contents of the code block into Canopy's copy-and-paste buffer to be used in an editor.
Canopy comes in several different products for individuals, and the free version is called Canopy Express with approximately 100 core packages. This free version is a useful tool for easy Python development for scientific and analytic computing. You can download this at https://store.enthought.com/downloads/ after selecting the target operating system as one of Windows, Linux, or Mac OS.
One of the challenges in the Python development environment is that managing the packages of many different libraries and tools can be a very time-consuming and daunting task. This is how their Documentation Browser looks like.
Canopy has a package manager that can be used to discover the Python packages available with Canopy and decide which additional ones to install and which ones to remove. There is a convenient search interface to find and install any available packages and to revert to the previous states of packages.
Canopy uses a Python capability to determine the Python packages that are available. When Canopy starts, it looks for packages first in the virtual environment and displays them, as shown in the following screenshot:
The numbered highlighted areas of the IDE are:
Anaconda is one of the most popular IDEs that is being used by the community. It comes with a compiled long list of packages that are already integrated. This IDE is based on the core component called conda (which is explained in detail later), and you may either install or update the Python packages using conda
or pip
.
Anaconda is a free collection of powerful packages for Python that enables large-scale data management, analysis, and visualization for business intelligence, scientific analysis, engineering, machine learning, and more.
Anaconda has a
Scientific PYthon Development EnviRonment (Spyder), which has an IPython viewer as well. In addition, IPython can be launched as a GUI or a web-based notebook. The most convenient aspect is that you can install Python in a home directory and not touch the system installed Python. Not all packages are yet ready to work with Python 3; therefore, it is better to use Python 2 with these IDEs. The Anaconda IDE has two important components and is based on the conda
package manager. The two components are conda
and spyder
.
The following screenshot appears when Anaconda is launched. This gives users several options that include the IPython console, the IPython notebook, the Spyder IDE, and glueviz:
Spyder is a Python development environment that comes with the following components:
The code editor and the IPython console are shown in the following screenshot:
Conda is a command-line tool used for managing environments and the packages of Python, rather than using pip
. There are ways to query and search the packages, create new environments if necessary, and install and update the Python packages in the existing conda environments. This command-line tool also keeps track of dependencies between packages and platform specifics, helping you to create working environments from the different combination of packages. To check which version of conda is running, you can enter the following code (in my environment, it shows the 3.10.1 version):
Conda –v 3.10.1
A conda environment is a filesystem directory that contains a specific collection of conda packages. As a concrete example, you may want to have one environment that provides NumPy 1.7 and another environment that provides NumPy 1.6 for legacy testing; conda makes this kind of mixing and matching easy. To begin using an environment, simply set the PATH
variable to point to its bin directory.
Let's take a look at an example of how to install a package called SciPy with conda. Assuming that you have installed Anaconda correctly and conda is available in the running path, you may have to enter the following code to install SciPy:
$ conda install scipy Fetching package metadata: .... Solving package specifications: . Package plan for installation in environment /Users/MacBook/anaconda: The following packages will be downloaded: package | build ---------------------------|----------------- flask-0.10.1 | py27_1 129 KB itsdangerous-0.23 | py27_0 16 KB jinja2-2.7.1 | py27_0 307 KB markupsafe-0.18 | py27_0 19 KB werkzeug-0.9.3 | py27_0 385 KB The following packages will be linked: package | build ---------------------------|----------------- flask-0.10.1 | py27_1 itsdangerous-0.23 | py27_0 jinja2-2.7.1 | py27_0 markupsafe-0.18 | py27_0 python-2.7.5 | 2 readline-6.2 | 1 sqlite-3.7.13 | 1 tk-8.5.13 | 1 werkzeug-0.9.3 | py27_0 zlib-1.2.7 | 1 Proceed ([y]/n)?
You should note that any dependencies on the package that is being tried to install would be recognized, downloaded, and linked automatically. If any Python package needs to be installed or updated, you will have to use the following code:
conda install <package name> or conda update <package name>
Here is an example of package update from the command line using conda (to update matplotlib):
conda update matplotlib Fetching package metadata: .... Solving package specifications: . Package plan for installation in environment /Users/MacBook/anaconda: The following packages will be downloaded: package | build ---------------------------|----------------- freetype-2.5.2 | 0 691 KB conda-env-2.1.4 | py27_0 15 KB numpy-1.9.2 | py27_0 2.9 MB pyparsing-2.0.3 | py27_0 63 KB pytz-2015.2 | py27_0 175 KB setuptools-15.0 | py27_0 436 KB conda-3.10.1 | py27_0 164 KB python-dateutil-2.4.2 | py27_0 219 KB matplotlib-1.4.3 | np19py27_1 40.9 MB ------------------------------------------------------------ Total: 45.5 MB The following NEW packages will be INSTALLED: python-dateutil: 2.4.2-py27_0 The following packages will be UPDATED: conda: 3.10.0-py27_0 --> 3.10.1-py27_0 conda-env: 2.1.3-py27_0 --> 2.1.4-py27_0 freetype: 2.4.10-1 --> 2.5.2-0 matplotlib: 1.4.2-np19py27_0 --> 1.4.3-np19py27_1 numpy: 1.9.1-py27_0 --> 1.9.2-py27_0 pyparsing: 2.0.1-py27_0 --> 2.0.3-py27_0 pytz: 2014.9-py27_0 --> 2015.2-py27_0 setuptools: 14.3-py27_0 --> 15.0-py27_0 Proceed ([y]/n)?
In order to check the packages that are installed using Anaconda, navigate to the command line and enter the following command to quickly display the list of all the packages installed in the default environment:
conda list
In addition, you can always install a package with the usual means, for example, pip install
, or from the source using a setup.py
file. Although conda is the preferred packaging tool, there is nothing special about Anaconda that prevents the usage of a standard Python packaging tool (such as pip
).
IPython is not required, but it is highly recommended. IPython should be installed after Python, GNU Readline, and PyReadline are installed. Anaconda and Canopy do these things by default. There are Python packages that are used in all the examples in this book for a good reason. In the following section, we have updated this list.