Merging the HVBP tables

Let's try joining two of the tables:

df_master = dfs[files_of_interest[0]].merge(
dfs[files_of_interest[1]],
on='Provider Number',
how='left',

    copy=False
)

print(df_master.shape)

The output is as follows:

(2808, 43)

Our merge appears to have worked since the number of columns in df_master is the sum of the columns of first two DataFrames, minus one (the on column is not copied). Let's look at the columns of the new DataFrame:

print(df_master.columns)

The output is as follows:

Index(['Provider Number', 'Hospital Name_x', 'Address_x', 'City_x', 'State_x',
       'Zip Code', 'County Name_x',
       'Unweighted Normalized Clinical Care Domain Score',
       'Weighted Normalized Clinical Care Domain Score',
       'Unweighted Patient and Caregiver Centered Experience of Care/Care Coordination Domain Score',
       'Weighted Patient and Caregiver Centered Experience of Care/Care Coordination Domain Score',
       'Unweighted Normalized Safety Domain Score',
       'Weighted Safety Domain Score',
       'Unweighted Normalized Efficiency and Cost Reduction Domain Score',
       'Weighted Efficiency and Cost Reduction Domain Score',
       'Total Performance Score', 'Hospital Name_y', 'Address_y', 'City_y',
       'State_y', 'ZIP Code', 'County Name_y',
       'MORT-30-AMI Achievement Threshold', 'MORT-30-AMI Benchmark',
       'MORT-30-AMI Baseline Rate', 'MORT-30-AMI Performance Rate',
       'MORT-30-AMI Achievement Points', 'MORT-30-AMI Improvement Points',
       'MORT-30-AMI Measure Score', 'MORT-30-HF Achievement Threshold',
       'MORT-30-HF Benchmark', 'MORT-30-HF Baseline Rate',
       'MORT-30-HF Performance Rate', 'MORT-30-HF Achievement Points',
       'MORT-30-HF Improvement Points', 'MORT-30-HF Measure Score',
       'MORT-30-PN Achievement Threshold', 'MORT-30-PN Benchmark',
       'MORT-30-PN Baseline Rate', 'MORT-30-PN Performance Rate',
       'MORT-30-PN Achievement Points', 'MORT-30-PN Improvement Points',
       'MORT-30-PN Measure Score'],
      dtype='object')

The duplicate columns (Hospital Name, Address, City, and so on) had suffixes _x and _y added to their names in the joined table, to indicate which table they are from, confirming that the merge worked.

Let's use a for loop to join the three remaining tables to df_master:

for df in dfs.values():
df.columns = [col if col not in ['Provider_Number'] else 'Provider Number'
for col in df.columns]

for num in [2,3,4]:
df_master = df_master.merge(
dfs[files_of_interest[num]],
on='Provider Number',
how='left',
copy=False
)

print(df_master.shape)

The output is as follows:

(2808, 191)

In this cell, first we use a loop to rename all columns from Provider_Number to Provider Number, so we can join the tables cleanly.

Then we use a loop to join each remaining table to df_master. The number of columns in the resulting table equals the sum of the columns of the tables, minus four.

To confirm that the merge worked, we can print the columns of our new table:

for column in df_master.columns:
print(column)

Scrolling through the output confirms that all of the columns from the five tables are present.

We leave it to you to perform additional analyses on the dataset, using the code examples from the Comparing dialysis facilities section.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset