Reading a SAS/Stata file

Pandas can read two file formats from SAS – SAS xports (.XPT) and SAS data files (.sas7bdat).

The read_sas() function helps read SAS files. Here, a SAS data file has been read and displayed as a pandas dataframe:

df = pd.read_sas('sample.sas7bdat')
df

This results in the following output:

Output of read_sas

The chunksize and iterator arguments help in reading the SAS file in groups of the same size. If the SAS data file that was used earlier is read with a chunksize of 10, then the 51 records will be divided into six groups, as shown in the following code:

rdr = pd.read_sas('sample.sas7bdat', chunksize=10)
for chunk in rdr:
print(chunk.shape)

Take a look at the following output:

Output of read_sas with chunksize

However, these SAS files cannot be written using pandas.

Pandas also provides support for reading and writing files that have been generated from Stata. Stata only supports limited datatypes: int8, int16, int32, float32, float64, and strings with a length less than 244. When writing a Stata data file through pandas, type conversion is applied wherever applicable.

Let's read a Stata datafile using pandas:

df = pd.read_stata('sample.dta')
df

Take a look at the following output:

Output of read_stata

The read_stata() function also has chunksize and iterator arguments to read data in smaller groups. The following arguments are the available stata reader functions:

convert_categoricals: Converts a suitable column into a categorical data type
index_col: Identifies the column to be defined as an index
convert_missing: Specifies whether to represent missing values as NaN or with a Stata missing value object
columns: Columns to select from the dataset

Table of Contents for Reading a SAS/Stata file

Create new playlist

Sign In

Sign Up

Table of Contents for
Reading a SAS/Stata file