Query Set #4d – aggregating cardiac diagnoses using COUNT

In Query Sets #4b and #4c, we binned and then aggregated three diagnostic codes using the + operator on the column names individually. However, we may be interested in binning and aggregating dozens, hundreds, or even thousands of diagnostic codes. The method of Query Sets #4b and #4c quickly becomes impractical for large aggregations.

Here, we use the COUNT function and a supplemental table to aggregate the diagnostic codes listed in the table. We first use a CREATE TABLE statement to create a CARDIAC_DX table. The format of this CREATE TABLE statement is a bit different than that of Query Set #1. In that example, we simply created a table by copying columns from an existing table. Here, we create the table from scratch by including parentheses and the column name, variable type, and NOT NULL statement enclosed in parentheses. If there were more than one column, they would be separated by commas within the parentheses.

After creating the table, we insert our three diagnostic codes into it using an INSERT statement: I50.9, I10, and I20.9. Then we add a column to our MORT_FINAL_2 table called Num_cardiac_dx_v2.

The final query updates the Num_cardiac_dx_v2 column by adding the number of codes from the table that are present in the Pri_dx_icd or Sec_dx_icd column. It accomplishes that by using a SELECT-FROM-WHERE block for each column, inside the original UPDATE statement. Therefore, this type of query is called a nested query. Within each SELECT block, the COUNT(*) statement simply returns the number of rows of the resulting query as an integer. So for example, in Visit #10001, there is a cardiac code in the Pri_dx_icd column and there is also one match in the Sec_dx_icd column. The first SELECT block would return a value of 1, since the query without COUNT would have returned a table with 1 row. By wrapping COUNT around *, 1 is returned as an integer. The second SELECT block also detects a match and returns a value of 1. The + operator makes 2 the final result. By comparing the Num_cardiac_dx and Num_cardiac_dx_2 columns, we see the result is exactly the same. So, which method is better? For small, simple aggregations, the first method is easier, because one simply has to make a column for each code and then aggregate them in a single statement with the + operator. However, in practice, you may wish to edit which codes are aggregated together to create features quite frequently. In this case, the second method is easier:

sqlite> CREATE TABLE CARDIAC_DX(
Dx_icd TEXT NOT NULL);

sqlite> INSERT INTO CARDIAC_DX (Dx_icd)
VALUES ('I50.9'),('I10'),('I20.9');

sqlite> ALTER TABLE MORT_FINAL_2 ADD COLUMN Num_cardiac_dx_v2 INTEGER;

sqlite> UPDATE MORT_FINAL_2 SET Num_cardiac_dx_v2 =
(SELECT COUNT(*)
FROM CARDIAC_DX AS C
WHERE MORT_FINAL_2.Pri_dx_icd = C.Dx_icd) +
(SELECT COUNT(*)
FROM CARDIAC_DX AS C
WHERE MORT_FINAL_2.Sec_dx_icd = C.Dx_icd);
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset