Importing data into pandas from a flat file

Because healthcare data can often be in flat file format, such as .csv  or .fwf, it is important to know of the read_csv() and read_fwf() functions that import data into pandas from these two formats, respectively. Both of the functions take as mandatory arguments the full path of the flat file, along with over a dozen additional optional arguments that specify options including the data types of the columns, the header rows, the columns to include in the DataFrame, and so on (a full listing of the function arguments is available online). It is often easiest to import all the columns as string types and convert the columns to other data types later on. In the following example, a DataFrame called data is created by using the read_csv() function to read in a flat .csv file that contains one header row (row #0):

pt_data = pd.read_csv(data_full_path,header=0,dtype='str')

Because fixed-width files have no explicit character separator, the read_fwf() function needs an additional argument, widths, which is a list of integers specifying the column widths for each column. The length of widths should match the number of columns in the file. As an alternative, the colspecs argument takes in a list of tuples specifying the starting points and endpoints of each column:

pt_data = pd.read_fwf(source,widths=data_widths,header=None,dtype='str')
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset