Sometimes, the data is directly available as a URL. In such cases, read_csv can be directly used to read from these URLs:
pd.read_csv('http://bit.ly/2cLzoxH').head()
Alternatively, to work with URLs in order to get data, we can use a couple of Python packages that we haven't used so far, such as .csv and .urllib. It would suffice to know that .csv provides a range of methods for handling .csv files and that urllib is used to navigate to and access information from the URL. Here is how we can do this:
import csv import urllib2 url='http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' response=urllib2.urlopen(url) cr=csv.reader(response) for rows in cr: print rows
AWS S3 is a popular file-sharing and storage repository on the web. Many enterprises store their business operations data as files on S3, which needs to be read and processed directly or be moved to a database. Python allows us to directly read files from S3, as shown in the following code.
Python 3.4 and above use the s3fs package in addition to pandas to read files directly from S3. An AWS config file needs to be placed in the current working directory. The bucket name, as well as the path and filename, need to be passed for reading:
import os import pandas as pd from s3fs.core import S3FileSystem os.environ['AWS_CONFIG_FILE'] = 'aws_config.ini' s3 = S3FileSystem(anon=False) key = 'path oyour-csv.csv' bucket = 'your-bucket-name' df = pd.read_csv(s3.open('{}/{}'.format(bucket, key), mode='rb') )
A DataFrame can be written to a CSV file and saved directly in S3 as follows:
import s3fs bytes_to_write = df.to_csv(None).encode() fs = s3fs.S3FileSystem(key=key, secret=secret) with fs.open('s3://bucket/path/to/file.csv', 'wb') as f: f.write(bytes_to_write)