Pandas can read two file formats from SAS – SAS xports (.XPT) and SAS data files (.sas7bdat).
The read_sas() function helps read SAS files. Here, a SAS data file has been read and displayed as a pandas dataframe:
df = pd.read_sas('sample.sas7bdat')
df
This results in the following output:
The chunksize and iterator arguments help in reading the SAS file in groups of the same size. If the SAS data file that was used earlier is read with a chunksize of 10, then the 51 records will be divided into six groups, as shown in the following code:
rdr = pd.read_sas('sample.sas7bdat', chunksize=10)
for chunk in rdr:
print(chunk.shape)
Take a look at the following output:
However, these SAS files cannot be written using pandas.
Pandas also provides support for reading and writing files that have been generated from Stata. Stata only supports limited datatypes: int8, int16, int32, float32, float64, and strings with a length less than 244. When writing a Stata data file through pandas, type conversion is applied wherever applicable.
Let's read a Stata datafile using pandas:
df = pd.read_stata('sample.dta')
df
Take a look at the following output:
The read_stata() function also has chunksize and iterator arguments to read data in smaller groups. The following arguments are the available stata reader functions:
- convert_categoricals: Converts a suitable column into a categorical data type
- index_col: Identifies the column to be defined as an index
- convert_missing: Specifies whether to represent missing values as NaN or with a Stata missing value object
- columns: Columns to select from the dataset