Useful resources

There are a vast number of resources on the topic of data analysis online, focused especially on Python. I have tried to compile a few here and hope that it will be of use to you. You will find a few sections under which I have listed resources, a short description, and a link where you can find more information.

General resources

General links to Python-related resources:

Continuum Analytics

https://www.continuum.io

Makers of Anaconda Python distribution. On their web page, you can find documentation and support.

Python and IPython

https://python.org and http://ipython.org

There's really no need for an explanation. We thank much in the world for these two projects.

Jupyter Notebook

https://jupyter.org

The Jupyter Notebook project web page where you can find more information, documentation, and help.

Python weekly newsletter

http://www.pythonweekly.com

A weekly (e-mail) newsletter to make it easier to keep up to date on what is going on in the world of Python.

Stack Overflow

http://stackoverflow.com

A question and answer page for basically everything. If you search online for any kind of Python programming problem, chances are high that you will land on one of their web pages. Register and ask or answer a question!

Enthought

https://www.enthought.com

Makers of Enthought Canopy that is, just like an Anaconda distribution, a full Python distribution. Enthought also has lots of courses and training for anyone interested.

PyPI

https://pypi.python.org/pypi

A repository of most Python packages and the first place that pip looks for packages.

Scipy-toolkits

https://www.scipy.org/scikits.html

The portal for the Scipy Toolkits (Scikits), affiliated packages for SciPy. The scikit-learn is a Scikit package.

GitHub

https://github.com

A repository for code that uses the famous Git versioning system to keep track of changes to the code. You can register and upload your own code for free as long as you make the code public. The code can be in Python or any other programming language.

Packages

This is a list of useful Python packages. Most of them can be installed via the conda or pip packaging systems.

PyMC

https://pymc-devs.github.io/pymc/

Alternatively, https://github.com/pymc-devs/pymc

A package for Bayesian inference/modeling analysis in Python; used in Chapter 6 , Bayesian Methods, in this book.

emcee

http://dan.iel.fm/emcee/

An alternative to PyMC, an MCMC package for Bayesian inference.

scikit-learn

http://scikit-learn.org

A tool for machine learning data analysis with Python; used in Chapter 7 , Supervised and Unsupervised Learning, of this book.

AstroML

http://www.astroml.org/

A package for machine learning, focusing on astronomical applications.

OpenAI Gym

https://gym.openai.com/

An open and publicly released toolkit to develop and test reinforcement learning algorithms.

Quandl

https://www.quandl.com/

A hub to access financial and economic data—they have a Python API that you can install and access large amounts of data with.

Seaborn

https://stanford.edu/~mwaskom/software/seaborn/

A package for statistical data visualization with Python. It has a few unique plotting functions that have not yet made it into the matplotlib package.

Data repositories

Here, I list some of the data repositories that are available online.

UCI Machine Learning Repository

http://archive.ics.uci.edu/ml

The University of California Irvine, Center for Machine Learning and Intelligent Systems repository of datasets, which is targeted at machine learning problems.

WHO - Global Health Observatory data repository

http://apps.who.int/gho/data/node.home

A large database of key health-related data from the whole world.

Eurostat

http://ec.europa.eu/eurostat

A database for various key statistics on all the countries in the European Union.

NTSB

http://www.ntsb.gov

The National Transportation Safety Board web page, which is a statistics database on automotive, rail, aviation, and marine accidents in USA.

OpenData by Socrata

https://opendata.socrata.com

A big database of various datasets (for example, airline accidents statistics for the whole world) that are easy to explore and find data.

General Social Survey (USA)

http://gss.norc.org

Yearly surveys in USA, with open and downloadable datasets and an online data exploration tool.

CDC

http://www.cdc.gov/datastatistics/

Centers for Disease Control and Prevention (CDC) have a lot of public data available on various diseases and health-related statistics.

Open Data Inception (+2500 sources)

http://opendatainception.io

A map showing the location and links to open data resources.

Data.gov.in

https://data.gov.in

The Indian government public data portal. It contains a rich and broad set of publicly available data to practice your data analysis skills.

Census.gov

http://www.census.gov

The United States census bureau has conducted surveys and collected data on various topics in USA.

Data.europa

https://data.europa.eu/euodp

The European Union Open Data Portal provides a single point of access to data from all the EU countries.

Visualization of data

The following is a list of some resources that are useful for visualization (overlapping here is Seaborn, which has been listed previously).

Fivethirtyeight

http://fivethirtyeight.com/

A great inspiration when it comes to the visualization of data. The site presents statistical analysis and presentation of data from around the world.

Plotly

https://plot.ly

Data analysis and visualization done online. Their tool for Python is now open source and free to use when self-hosted.

mpld3

http://mpld3.github.io/

Create interactive Python plots and export to the browser for others to explore.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset