URL and S3

Sometimes, the data is directly available as a URL. In such cases, read_csv can be directly used to read from these URLs:

pd.read_csv('http://bit.ly/2cLzoxH').head()

Alternatively, to work with URLs in order to get data, we can use a couple of Python packages that we haven't used so far, such as .csv and .urllib. It would suffice to know that .csv provides a range of methods for handling .csv files and that urllib is used to navigate to and access information from the URL. Here is how we can do this:

import csv 
import urllib2 
 
url='http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' 
response=urllib2.urlopen(url) 
cr=csv.reader(response) 
 
for rows in cr: 
   print rows

AWS S3 is a popular file-sharing and storage repository on the web. Many enterprises store their business operations data as files on S3, which needs to be read and processed directly or be moved to a database. Python allows us to directly read files from S3, as shown in the following code.

Python 3.4 and above use the s3fs package in addition to pandas to read files directly from S3. An AWS config file needs to be placed in the current working directory. The bucket name, as well as the path and filename, need to be passed for reading:

import os 
import pandas as pd 
from s3fs.core import S3FileSystem 
 
os.environ['AWS_CONFIG_FILE'] = 'aws_config.ini' 
 
s3 = S3FileSystem(anon=False) 
key = 'path	oyour-csv.csv' 
bucket = 'your-bucket-name' 
 
df = pd.read_csv(s3.open('{}/{}'.format(bucket, key), 
                         mode='rb') 
                 )

A DataFrame can be written to a CSV file and saved directly in S3 as follows:

 import s3fs 
 
bytes_to_write = df.to_csv(None).encode() 
fs = s3fs.S3FileSystem(key=key, secret=secret) 
with fs.open('s3://bucket/path/to/file.csv', 'wb') as f: 
    f.write(bytes_to_write)

Table of Contents for URL and S3

Create new playlist

Sign In

Sign Up

Table of Contents for
URL and S3