Installing the Data Science Toolbox

The Data Science Toolbox (DST) is a virtual environment based on Ubuntu for data analysis using Python and R. Since DST is a virtual environment, we can install it on various operating systems. We will install DST locally, which requires VirtualBox and Vagrant. VirtualBox is a virtual machine application originally created by Innotek GmbH in 2007. Vagrant is a wrapper around virtual machine applications such as VirtualBox created by Mitchell Hashimoto.

Getting ready

You need to have in the order of 2 to 3 GB free for VirtualBox, Vagrant, and DST itself. This may vary by operating system.

How to do it...

Installing DST requires the following steps:

  1. Install VirtualBox by downloading an installer for your operating system and architecture from https://www.virtualbox.org/wiki/Downloads (retrieved July 2015) and running it. I installed VirtualBox 4.3.28-100309 myself, but you can just install whatever the most recent VirtualBox version at the time is.
  2. Install Vagrant by downloading an installer for your operating system and architecture from https://www.vagrantup.com/downloads.html (retrieved July 2015). I installed Vagrant 1.7.2 and again you can install a more recent version if available.
  3. Create a directory to hold the DST and navigate to it with a terminal. Run the following command:
    $ vagrant init data-science-toolbox/dst
    

    The first command creates a VagrantFile configuration file. Most of the content is commented out, but the file does contain links to documentation that might be useful. The second command creates the DST and initiates a download that could take a couple of minutes.

  4. Connect to the virtual environment as follows (on Windows use putty):
    $ vagrant ssh
    
  5. View the preinstalled Python packages with the following command:
    vagrant@data-science-toolbox:~$ pip freeze
    

    The list is quite long; in my case it contained 32 packages. The DST Python version as of July 2015 was 2.7.6.

  6. When you are done with the DST, log out and suspend (you can also halt it completely) the VM:
    vagrant@data-science-toolbox:~$ logout
    Connection to 127.0.0.1 closed.
    $ vagrant suspend
    ==> default: Saving VM state and suspending execution...
    

How it works...

Virtual machines (VMs) emulate computers in software. VirtualBox is an application that creates and manages VMs. VirtualBox stores its VMs in your home folder, and this particular VM takes about 2.2 GB of storage.

Ubuntu is an open source Linux operating system, and we are allowed by its license to create virtual machines. Ubuntu has several versions; we can get more info with the lsb_release command:

vagrant@data-science-toolbox:~$ lsb_release -a
No LSB modules are available.
Distributor ID:    Ubuntu
Description:    Ubuntu 14.04 LTS
Release:    14.04
Codename:    trusty

Vagrant used to only work with VirtualBox, but currently it also supports VMware, KVM, Docker, and Amazon EC2. Vagrant calls virtual machines boxes. Some of these boxes are available for everyone at http://www.vagrantbox.es/ (retrieved July 2015).

See also

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset