There are a vast number of resources on the topic of data analysis online, focused especially on Python. I have tried to compile a few here and hope that it will be of use to you. You will find a few sections under which I have listed resources, a short description, and a link where you can find more information.
General links to Python-related resources:
Continuum Analytics
Makers of Anaconda Python distribution. On their web page, you can find documentation and support.
Python and IPython
https://python.org and http://ipython.org
There's really no need for an explanation. We thank much in the world for these two projects.
Jupyter Notebook
The Jupyter Notebook project web page where you can find more information, documentation, and help.
Python weekly newsletter
A weekly (e-mail) newsletter to make it easier to keep up to date on what is going on in the world of Python.
Stack Overflow
A question and answer page for basically everything. If you search online for any kind of Python programming problem, chances are high that you will land on one of their web pages. Register and ask or answer a question!
Enthought
Makers of Enthought Canopy that is, just like an Anaconda distribution, a full Python distribution. Enthought also has lots of courses and training for anyone interested.
PyPI
A repository of most Python packages and the first place that pip
looks for packages.
Scipy-toolkits
https://www.scipy.org/scikits.html
The portal for the Scipy Toolkits (Scikits), affiliated packages for SciPy. The scikit-learn
is a Scikit package.
GitHub
A repository for code that uses the famous Git versioning system to keep track of changes to the code. You can register and upload your own code for free as long as you make the code public. The code can be in Python or any other programming language.
This is a list of useful Python packages. Most of them can be installed via the conda
or pip
packaging systems.
PyMC
https://pymc-devs.github.io/pymc/
Alternatively, https://github.com/pymc-devs/pymc
A package for Bayesian inference/modeling analysis in Python; used in Chapter 6 , Bayesian Methods, in this book.
emcee
An alternative to PyMC, an MCMC package for Bayesian inference.
scikit-learn
A tool for machine learning data analysis with Python; used in Chapter 7 , Supervised and Unsupervised Learning, of this book.
AstroML
A package for machine learning, focusing on astronomical applications.
OpenAI Gym
An open and publicly released toolkit to develop and test reinforcement learning algorithms.
Quandl
A hub to access financial and economic data—they have a Python API that you can install and access large amounts of data with.
Seaborn
https://stanford.edu/~mwaskom/software/seaborn/
A package for statistical data visualization with Python. It has a few unique plotting functions that have not yet made it into the matplotlib package.
Here, I list some of the data repositories that are available online.
UCI Machine Learning Repository
The University of California Irvine, Center for Machine Learning and Intelligent Systems repository of datasets, which is targeted at machine learning problems.
WHO - Global Health Observatory data repository
http://apps.who.int/gho/data/node.home
A large database of key health-related data from the whole world.
Eurostat
A database for various key statistics on all the countries in the European Union.
NTSB
The National Transportation Safety Board web page, which is a statistics database on automotive, rail, aviation, and marine accidents in USA.
OpenData by Socrata
A big database of various datasets (for example, airline accidents statistics for the whole world) that are easy to explore and find data.
General Social Survey (USA)
Yearly surveys in USA, with open and downloadable datasets and an online data exploration tool.
CDC
http://www.cdc.gov/datastatistics/
Centers for Disease Control and Prevention (CDC) have a lot of public data available on various diseases and health-related statistics.
Open Data Inception (+2500 sources)
A map showing the location and links to open data resources.
Data.gov.in
The Indian government public data portal. It contains a rich and broad set of publicly available data to practice your data analysis skills.
Census.gov
The United States census bureau has conducted surveys and collected data on various topics in USA.
Data.europa
The European Union Open Data Portal provides a single point of access to data from all the EU countries.
The following is a list of some resources that are useful for visualization (overlapping here is Seaborn, which has been listed previously).
Fivethirtyeight
A great inspiration when it comes to the visualization of data. The site presents statistical analysis and presentation of data from around the world.
Plotly
Data analysis and visualization done online. Their tool for Python is now open source and free to use when self-hosted.
mpld3
Create interactive Python plots and export to the browser for others to explore.