Reading a SAS/Stata file

Pandas can read two file formats from SAS – SAS xports (.XPT) and SAS data files (.sas7bdat).

The read_sas() function helps read SAS files. Here, a SAS data file has been read and displayed as a pandas dataframe:

df = pd.read_sas('sample.sas7bdat')
df

This results in the following output:

Output of read_sas

The chunksize and iterator arguments help in reading the SAS file in groups of the same size. If the SAS data file that was used earlier is read with a chunksize of 10, then the 51 records will be divided into six groups, as shown in the following code:

rdr = pd.read_sas('sample.sas7bdat', chunksize=10)
for chunk in rdr:
print(chunk.shape)

Take a look at the following output:

Output of read_sas with chunksize

However, these SAS files cannot be written using pandas.

Pandas also provides support for reading and writing files that have been generated from Stata. Stata only supports limited datatypes: int8, int16, int32, float32, float64, and strings with a length less than 244. When writing a Stata data file through pandas, type conversion is applied wherever applicable.

Let's read a Stata datafile using pandas:

df = pd.read_stata('sample.dta')
df

Take a look at the following output:

Output of read_stata

The read_stata() function also has chunksize and iterator arguments to read data in smaller groups. The following arguments are the available stata reader functions: 

  • convert_categoricals: Converts a suitable column into a categorical data type
  • index_col: Identifies the column to be defined as an index
  • convert_missing: Specifies whether to represent missing values as NaN or with a Stata missing value object
  • columns: Columns to select from the dataset
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset