Chapter 3: Implementing the CDISC SDTM with Base SAS

This chapter provides an illustrated example of how you can implement the CDISC SDTM using Base SAS programming and the SDTM metadata from the previous chapter. Various SAS macros are used to help you easily create SDTM datasets. Several domain conversion program examples are presented.

More Information

Appendix A - Source Data Programs

Appendix B - SDTM Metadata

Base SAS Macros and Tools for SDTM Conversions

Many of the tasks involved in transforming the clinical source data into the CDISC SDTM are repetitive in nature. The SAS Macro Language is an excellent tool for automating these repetitive tasks. This section of the chapter provides several SAS macros and a SAS format library program that will be used in the individual SDTM transformations that follow in the next section. The SAS code detailed in this section relies heavily on using the metadata files defined in the previous chapter. If you have additional repetitive SDTM tasks, then you can create additional SAS macros, in addition to the ones provided here.

A basic flow of the SDTM creation process would follow the Extract-Transform-Load (ETL) paradigm of data warehousing. You first need to get your data, and then manipulate it to meet your needs. Finally, you store the data where you need it. The following table summarizes the SAS macros that help with the data transformation process, in rough chronological order.

Table 3.1: Base SAS Programs for SDTM Conversions

SAS Program or Macro	Purpose
make_codelist_formats.sas	This program takes the controlled terminology and codelist information from the SDTM metadata in Chapter 2 and creates a permanent SAS format library from it. That format library is used later in the data transformation process when source data values are mapped to target SDTM data values using PUT statements in a DATA step.
make_empty_dataset.sas	This macro creates an empty dataset shell based on the SDTM metadata from Chapter 2. The macro uses the metadata from Chapter 2 to define which variables belong in a given SDTM domain, which is the beginning of the transformation step. This macro also creates a macro variable that can be used at the load step to keep only the variables necessary per metadata requirements.
make_dtc_date.sas	This macro creates an ISO8601 date string from SAS date or datetime component variables. It is used during the DATA step transformation process.
make_sdtm_dy.sas	This macro creates an SDTM study day (DY) variable when given two SDTM *DTC dates. It is used during the DATA step transformation process.
make_sort_order.sas	This macro creates a macro variable that can be used in the sort process for the final data load. The variable is based on the required sort order as specified in the metadata definition from Chapter 2. It is generally useful at the data loading stage after the data transformation has taken place.

Creating an SDTM Codelist SAS Format Catalog

Controlled terminology is a critical component of the SDTM. SDTM-controlled terminology is used within the SDTM datasets, and it is also included in the define.xml file. In this chapter, we need that controlled terminology to be applied to the source datasets in order to map the data records properly to the SDTM records. The codelist metadata file listed in the previous chapter is used here to create a permanent SAS format library that can be used in the SDTM conversions to follow. Here is the SAS macro program that will create the SAS format library for you.

*---------------------------------------------------------------*;

* make_codelist_formats.sas creates a permanent SAS format library

* stored to the libref LIBRARY from the codelist metadata file

* CODELISTS.xls. The permanent format library that is created

* contains formats that are named like this:

* CODELISTNAME_SOURCEDATASET_SOURCEVARIABLE

* where CODELISTNAME is the name of the SDTM codelist,

* SOURCEDATASET is the name of the source SAS dataset and

* SOURCEVARIABLE is the name of the source SAS variable.

*---------------------------------------------------------------*;

proc import

datafile="SDTM_METADATA.xls"

out=formatdata

dbms=excelcs

replace;

sheet="CODELISTS";

run;

** make a proc format control dataset out of the SDTM metadata;

data source.formatdata;

set formatdata(drop=type);

where sourcedataset ne "" and sourcevalue ne "";

keep fmtname start end label type;

length fmtname $ 32 start end $ 16 label $ 200 type $ 1;

fmtname = compress(codelistname || "_" || sourcedataset

|| "_" || sourcevariable);

start = left(sourcevalue);

end = left(sourcevalue);

label = left(codedvalue);

if upcase(sourcetype) = "NUMBER" then

type = "N";

else if upcase(sourcetype) = "CHARACTER" then

type = "C";

run;

** create a SAS format library to be used in SDTM conversions;

proc format

library=library

cntlin=source.formatdata

fmtlib;

run;

This program does assume that you have defined the SAS libref called library to store these permanent SAS formats. The name library is used so that SAS can automatically see the SAS formats without having to specify FMTSEARCH explicitly in the OPTIONS statement.

After you run the program, if you look at the resulting SAS format entry from the source TRT variable in source dataset DEMOGRAPHIC, you will see this:

FORMAT NAME: ARM_DEMOGRAPHIC_TRT LENGTH: 19 MIN LENGTH: 1 MAX LENGTH: 40 DEFAULT LENGTH 19 FUZZ: STD
START	END	LABEL (VER. 9.4 22 NOV 2015:07:11:36)
0 1	0 1	Placebo Analgezia HCL 30 mg

This gives us the ARM_DEMOGRAPHIC_TRT format that can be used to map the DEMOGRAPHIC TRT variable into the ARM variable in the SDTM DM dataset.

Creating an Empty SDTM Domain Dataset

Because we went to great lengths to define the SDTM domain-level metadata, it makes sense that we leverage that in our program code. The following SAS macro reads that domain-level metadata spreadsheet and creates an empty SAS dataset that you can populate. Here is that SAS macro program and details about how it works.

*----------------------------------------------------------------*;

* make_empty_dataset.sas creates a zero record dataset based on a dataset

* metadata spreadsheet. The dataset created is calledEMPTY_** where "**"

* is the name of the dataset. This macro also creates a global macro

* variable called **KEEPSTRING that holds the dataset variables desired

* and listed in the order they should appear. [The variable order is

* dictated by VARNUM in the metadata spreadsheet.]

* MACRO PARAMETERS:

* metadatafile = the MS Excel file containing VARIABLE_METADATA

* dataset = the dataset or domain name you want to extract

*-----------------------------------------------------------------------*;

%macro make_empty_dataset(metadatafile=,dataset=);

proc import ❶

datafile="&metadatafile"

out=_temp

dbms=excelcs

replace;

sheet="VARIABLE_METADATA";

run;

** sort the dataset by expected specified variable order; ❷

proc sort

data=_temp;

where domain = "&dataset";

by varnum;

run;

** create keepstring macro variable and load metadata

** information into macro variables; ❸

%global &dataset.KEEPSTRING;

data _null_;

set _temp nobs=nobs end=eof;

if _n_=1 then

call symput("vars", compress(put(nobs,3.)));

call symputx('var' || compress(put(_n_, 3.)),variable);

call symputx('label' || compress(put(_n_, 3.)), label);

call symputx('length' || compress(put(_n_, 3.)),

put(length, 3.));

** valid ODM types include TEXT, INTEGER, FLOAT, DATETIME,

** DATE, TIME and map to SAS numeric or character;

if upcase(type) in ("INTEGER", "FLOAT") then

call symputx('type' || compress(put(_n_, 3.)), "");

else if upcase(type) in ("TEXT", "DATE", "DATETIME",

"TIME") then

call symputx('type' || compress(put(_n_, 3.)), "$");

else

put "ERR" "OR: not using a valid ODM type. " type=;

** create **KEEPSTRING macro variable; ❹

length keepstring $ 32767;

retain keepstring;

keepstring = compress(keepstring) || "|" ||left(variable);

if eof then

call symputx(upcase(compress("&dataset"||'KEEPSTRING')),

left(trim(translate(keepstring," ","|"))));

run;

** create a 0-observation template data set used for assigning

** variable attributes to the actual data sets; ❺

data EMPTY_&dataset;

%do i=1 %to &vars;

attrib &&var&i label="&&label&i"

length=&&type&i.&&length&i...

;

%if &&type&i=$ %then

retain &&var&i '';

%else

retain &&var&i .;

;

%end;

if 0;

run;

%mend make_empty_dataset;

When the %make_empty_dataset macro is executed, the EMPTY_** dataset is created, and the **KEEPSTRING global macro variable is defined.

❶ The Microsoft Excel metadata file used here is the domain-level metadata file described in the previous chapter.

❷ The domain variables are sorted by the variable order specified in the VARNUM variable. This
sort order is also used to order the variables in the **KEEPSTRING global macro variable.

❸ This DATA step loads the domain metadata that we need into VAR*, LABEL*, LENGTH*, and TYPE* macro parameters for each variable in the domain to be used in the next step.

❹ This section is responsible for defining the **KEEPSTRING global macro variable, which will be used in the actual domain creation code later.

❺ This DATA step defines the SAS work EMPTY_** dataset, which is the shell of the domain that we will populate later.

Suppose you submit the macro like this for the DM domain:

%make_empty_dataset(metadatafile=SDTM_METADATA.xls,dataset=DM)

proc contents

data=work.empty_dm;

run;

You get a subsequent empty SAS dataset in SASWORK that looks like this when you run a PROC CONTENTS on it:

The CONTENTS Procedure

Data Set Name WORK.EMPTY_DM Observations 0

Member Type DATA Variables 23

Engine V9 Indexes 0

Created 11/27/2015 10:42:47 Observation Length 392

Last Modified 11/27/2015 10:42:47 Deleted Observations 0

Protection Compressed NO

Data Set Type Sorted NO

Label

Data Representation WINDOWS_64

Encoding wlatin1 Western (Windows)

Engine/Host Dependent Information

Data Set Page Size 65536

Number of Data Set Pages 1

First Data Page 1

Max Obs per Page 167

Obs in First Data Page 0

Number of Data Set Repairs 0

ExtendObsCounter YES

Filename empty_dm.sas7bdat

Release Created 9.0401M2

Host Created X64_7PRO

Alphabetic List of Variables and Attributes

# Variable Type Len Label

22 ACTARM Char 40 Description of Actual Arm

21 ACTARMCD Char 8 Actual Arm Code

15 AGE Num 8 Age

16 AGEU Char 10 Age Units

20 ARM Char 40 Description of Planned Arm

19 ARMCD Char 8 Planned Arm Code

14 BRTHDTC Char 16 Date/Time of Birth

23 COUNTRY Char 3 Country

2 DOMAIN Char 2 Domain Abbreviation

11 DTHDTC Char 16 Date/Time of Death

12 DTHFL Char 2 Subject Death Flag

18 RACE Char 80 Race

6 RFENDTC Char 16 Subject Reference End Date/Time

9 RFICDTC Char 16 Date/Time of Informed Consent

10 RFPENDTC Char 16 Date/Time of End of Participation

5 RFSTDTC Char 16 Subject Reference Start Date/Time

8 RFXENDTC Char 16 Date/Time of Last Study Treatment

7 RFXSTDTC Char 16 Date/Time of First Study Treatment

17 SEX Char 2 Sex

13 SITEID Char 7 Study Site Identifier

1 STUDYID Char 15 Study Identifier

4 SUBJID Char 7 Subject Identifier for the Study

3 USUBJID Char 25 Unique Subject Identifier

Creating an SDTM --DTC Date Variable

Dates and datetimes in the SDTM are all presented as ISO8601 character text strings. These date strings can be created from partial date information. Here is a SAS macro program that creates an SDTM --DTC date variable for you within a DATA step when given the numeric parts of a SAS date or datetime.

*---------------------------------------------------------------*;

* make_dtc_date.sas is a SAS macro that creates a SDTM --DTC date

* within a SAS datastep when provided the pieces of the date in

* separate SAS variables.

* NOTE: This macro must have SAS OPTIONS MISSING = ' ' set before

* it is called to handle missing date parts properly.
*

* MACRO PARAMETERS:

* dtcdate = SDTM --DTC date variable desired

* year = year variable

* month = month variable

* day = day variable

* hour = hour variable

* minute = minute variable

* second = second variable

*---------------------------------------------------------------*;

%macro make_dtc_date(dtcdate=, year=., month=., day=.,

hour=., minute=., second=.);

** in a series of if-then-else statements, determine where the

** smallest unit of date and time is present and then construct a DTC

** date based on the non-missing date variables.;

if (&second ne .) then

&dtcdate = put(&year,z4.) || "-" || put(&month,z2.) || "-"

|| put(&day,z2.) || "T" || put(&hour,z2.) || ":"

|| put(&minute,z2.) || ":" || put(&second,z2.);

else if (&minute ne .) then

&dtcdate = put(&year,z4.) || "-" || put(&month,z2.) || "-"

|| put(&day,z2.) || "T" || put(&hour,z2.) || ":"

|| put(&minute,z2.);

else if (&hour ne .) then

&dtcdate = put(&year,z4.) || "-" || put(&month,z2.) || "-"

|| put(&day,z2.) || "T" || put(&hour,z2.);

else if (&day ne .) then

&dtcdate = put(&year,z4.) || "-" || put(&month,z2.) || "-"

|| put(&day,z2.);

else if (&month ne .) then

&dtcdate = put(&year,z4.) || "-" || put(&month,z2.);

else if (&year ne .) then

&dtcdate = put(&year,z4.);

else if (&year = .) then

&dtcdate = "";

** remove duplicate blanks and replace space with a dash;

&dtcdate = translate(trim(compbl(&dtcdate)),'-',' ');

%mend make_dtc_date;

A sample call of this SAS macro for the EX domain might look like this:

data ex;

%make_dtc_date(dtcdate=exstdtc, year=startyy,

month=startmm, day=startdd)

%make_dtc_date(dtcdate=exendtc, year=endyy,

month=endmm, day=enddd,

hour=endhh, minute=endmi, second=endss)

run;

That macro call to %make_sdtm_date would create the variable EXSTDTC in the resulting EX dataset in the YYYY-MM-DD format and EXENDTC in the YYYY-MM-DDTHH:MM:SS format.

Creating an SDTM Study Day Variable

Throughout the SDTM, you will find that you need to create “study day” or SDTM --DY variables. Because the mechanics of doing this are the same everywhere, the task lends itself to using standardized SAS macro code. Here is the SAS macro program that creates an SDTM --DY variable for you.

*---------------------------------------------------------------*;

* make_sdtm_dy.sas is a SAS macro that takes two SDTM --DTC dates

* and calculates a SDTM study day (--DY) variable. It must be used

* in a datastep that has both the REFDATE and DATE variables

* specified in the macro parameters below.

* MACRO PARAMETERS:

* refdate = --DTC baseline date to calculate the --DY from.

* This should be DM.RFSTDTC for SDTM --DY variables.

* date = --DTC date to calculate the --DY to. The variable

* associated with the --DY variable.

*---------------------------------------------------------------*;

%macro make_sdtm_dy(refdate=RFSTDTC,date=);

if length(&date) >= 10 and length(&refdate) >= 10 then

do;

if input(substr(%substr("&date",2,%length(&date)-

3)dtc,1,10),yymmdd10.) >=

input(substr(%substr("&refdate",2,%length(&refdate)-

3)dtc,1,10),yymmdd10.) then

%upcase(%substr("&date",2,%length(&date)-3))DY =

input(substr(%substr("&date",2,%length(&date)-

3)dtc,1,10),yymmdd10.) -

input(substr(%substr("&refdate",2,%length(&refdate)-

3)dtc,1,10),yymmdd10.) + 1;

else

%upcase(%substr("&date",2,%length(&date)-3))DY =

input(substr(%substr("&date",2,%length(&date)-

3)dtc,1,10),yymmdd10.) -

input(substr(%substr("&refdate",2,%length(&refdate)-

3)dtc,1,10),yymmdd10.);

end;

%mend make_sdtm_dy;

A sample call of this SAS macro for the LB domain might look like this:

data lb;

merge lb(in=inlb) target.dm(keep=usubjid rfstdtc);

by usubjid;

if inlb;

%make_sdtm_dy(date=lbdtc)

run;

That macro call to %make_sdtm_dy would create the variable LBDY in the resulting LB dataset.

Sorting the Final SDTM Domain Dataset

In the Table of Contents section of the define.xml file is a field that defines how a domain is sorted. The following SAS macro takes the metadata for that sort order and creates a global SAS macro variable called **SORTSTRING, where **is the domain of interest. Keep in mind that this sort sequence is also what is likely used to define the --SEQ variable in the SDTM dataset. Here is that SAS macro program and details about how it works.

*----------------------------------------------------------------*;

* make_sort_order.sas creates a global macro variable called

* **SORTSTRING where ** is the name of the dataset that contains

* the KEYSEQUENCE metadata specified sort order for a given

* dataset. *;

* MACRO PARAMETERS:

* metadatafile = the file containing the dataset metadata

* dataset = the dataset or domain name

*----------------------------------------------------------------*;

%macro make_sort_order(metadatafile=,dataset=);

proc import ❶

datafile="&metadatafile"

out=_temp

dbms=excelcs

replace;

sheet="VARIABLE_METADATA";

run;

proc sort

data=_temp;

where keysequence ne . and domain="&dataset";

by keysequence;

run;

** create **SORTSTRING macro variable; ❷

%global &dataset.SORTSTRING;

data _null_;

set _temp end=eof;

length domainkeys $ 200;

retain domainkeys '';

domainkeys = trim(domainkeys) || ' ' || trim(put(variable,8.));

if eof then

call symputx(compress("&dataset" || "SORTSTRING"), domainkeys);

run;

%mend make_sort_order;

When the %make_sort_order macro is executed, the **SORTSTRING global macro variable is created.

❶ The Excel metadata here is the VARIABLE_METADATA metadata tab described in the previous chapter.

❷ The **SORTSTRING variable is created in this step, which is used in the final dataset sorting when the actual domain is created. This is based on the order specified in the KEYSEQUENCE column in the VARIABLE_METADATA tab of the metadata spreadsheet.

Building SDTM Datasets

Now that you have your metadata store and your SAS macro library at hand, you can build the SDTM domains in Base SAS. In this section, we look at creating six different types of SDTM data. First, we create the special-purpose DM domain because it is needed to create the study day (--DY) variables in the domain datasets. Then we create a supplemental qualifier, findings, events, and interventions, and finally some trial design model datasets.

Building the Special-Purpose DM and SUPPDM Domains

The following SAS code builds the DM and SUPPDM datasets from our source clinical trial data found in Appendix A, “Source Data Programs.” DM and SUPPDM represent the demographics domain in the SDTM. The following program assumes that the source datasets can be found under the libref source and that the permanent SAS formats can be found under the libref library.

*---------------------------------------------------------------*;

* DM.sas creates the SDTM DM and SUPPDM datasets and saves them

* as permanent SAS datasets to the target libref.

*---------------------------------------------------------------*;

**** CREATE EMPTY DM DATASET CALLED EMPTY_DM; ❶

%make_empty_dataset(metadatafile=SDTM_METADATA.xls,dataset=DM)

**** GET FIRST AND LAST DOSE DATE FOR RFSTDTC AND RFENDTC; ❷

proc sort

data=source.dosing(keep=subject startdt enddt)

out=dosing;

by subject startdt;

run;

**** FIRSTDOSE=FIRST DOSING AND LASTDOSE=LAST DOSING;

data dosing;

set dosing;

by subject;

retain firstdose lastdose;

if first.subject then

do;

firstdose = .;

lastdose = .;

end;

firstdose = min(firstdose,startdt,enddt);

lastdose = max(lastdose,startdt,enddt);

if last.subject;

run;

**** GET DEMOGRAPHICS DATA;

proc sort

data=source.demographic

out=demographic;

by subject;

run;

**** MERGE DEMOGRAPHICS AND FIRST DOSE DATE;

data demog_dose;

merge demographic

dosing;

by subject;

run;

**** DERIVE THE MAJORITY OF SDTM DM VARIABLES; ❸

options missing = ' ';

data dm;

set EMPTY_DM

demog_dose(rename=(race=_race));

studyid = 'XYZ123';

domain = 'DM';

usubjid = left(uniqueid); ❹

subjid = put(subject,3.);

rfstdtc = put(firstdose,yymmdd10.);

rfendtc = put(lastdose,yymmdd10.);

rfxstdtc = put(firstdose,yymmdd10.);

rfxendtc = put(lastdose,yymmdd10.);

rficdtc = put(icdate,yymmdd10.);

rfpendtc = put(lastdoc,yymmdd10.);

dthfl = 'N';

siteid = substr(subjid,1,1) || "00";

brthdtc = put(dob,yymmdd10.);

age = floor ((intck('month',dob,firstdose) -

(day(firstdose) < day(dob))) / 12);

if age ne . then

ageu = 'YEARS';

sex = put(gender,sex_demographic_gender.); ❺

race = put(_race,race_demographic_race.);

armcd = put(trt,armcd_demographic_trt.);

arm = put(trt,arm_demographic_trt.);

actarmcd = put(trt,armcd_demographic_trt.);

actarm = put(trt,arm_demographic_trt.);

country = "USA";

run;

**** DEFINE SUPPDM FOR OTHER RACE;

**** CREATE EMPTY SUPPDM DATASET CALLED EMPTY_SUPPDM; ❻

%make_empty_dataset(metadatafile=SDTM_METADATA.xls,dataset=SUPPDM)

data suppdm;

set EMPTY_SUPPDM

dm;

keep &SUPPDMKEEPSTRING; ❼

**** OUTPUT OTHER RACE AS A SUPPDM VALUE; ❽

if orace ne '' then

do;

rd1omain = 'DM';

qnam = 'RACEOTH';

qlabel = 'Race, Other';

qval = left(orace);

qorig = 'CRF Page 1';

output;

end;

**** OUTPUT RANDOMIZATION DATE AS SUPPDM VALUE;

if randdt ne . then

do;

rdomain = 'DM';

qnam = 'RANDDTC';

qlabel = 'Randomization Date';

qval = left(put(randdt,yymmdd10.));

qorig = 'CRF Page 1';

output;

end;

run;

**** SORT DM ACCORDING TO METADATA AND SAVE PERMANENT DATASET; ❾

%make_sort_order(metadatafile=SDTM_METADATA.xls,dataset=DM)

proc sort

data=dm(keep = &DMKEEPSTRING)

out=target.dm;

by &DMSORTSTRING;

run;

**** SORT SUPPDM ACCORDING TO METADATA AND SAVE PERMANENT DATASET;

%make_sort_order(metadatafile=SDTM_METADATA.xls,dataset=SUPPDM)

proc sort

data=suppdm

out=target.suppdm;

by &SUPPDMSORTSTRING;

run;

When the DM.sas program has been run, the DM and SUPPDM SDTM domain datasets are saved to the target libref.

❶ The first step in this program creates the empty SDTM DM dataset called EMPTY_DM, based on the domain metadata spreadsheet.

❷ The SDTM DM variables RFSTDTC and RFENDTC are defined here as the first day of study medication dosing and the last day of dosing, respectively. In this example, there are no dosing times, just dates. Keep in mind that the use of the variables RFSTDTC and RFENDTC as defined here, based on dosing dates, is just one way of doing this. See the “SDTM Implementation Guide” for additional details about how you can define RFSTDTC and RFENDTC.

❸ This is where the empty dataset EMPTY_DM with the defining variable attributes is set with the source data, and where the bulk of the SDTM DM variables are defined. Note the RENAME clause in the SET statement, which is required when you have source variables named the same as target variables with conflicting variable attributes. Also note that EMPTY_DM is set first so that the variable attributes are maintained for the DATA step regardless of the contents of demog_dose.

❹ In these examples, you see that there is a source variable called UNIQUEID in the source datasets, which is supposed to uniquely identify a patient within and across studies. In most clinical databases, a truly unique patient identifier UNIQUEID does not exist, and, in practice, many trials are submitted where USUBJID is set equal to SUBJID.

❺ For SEX, RACE, ARM, and ARMCD variables, the associated formats were created in the make_codelist_formats.sas program in the prior section.

❻ The %make_empty_dataset macro is called again. This time it defines the dataset structure for the SUPPDM dataset.

❼ The SUPPDMKEEPSTRING macro variable created by %make_empty_dataset lists only the SDTM variables that we want in SUPPDM.

❽ The supplemental qualifiers created here in SUPPDM are for other race and randomization date. This chunk of code creates those qualifier records. However, you typically have to define IDVAR and IDVARVAL explicitly. In this case, they just default to null. IDVAR defines the SDTM parent domain variable. IDVARVAL defines the value of that variable that can be used to join or merge these supplemental qualifier variables back onto the parent domain. Often, IDVAR=--SEQ and IDVARVAL is the sequence number because --SEQ tends to be the most exacting identifier that you can have in the SDTM because it identifies a single record. You should define IDVAR to be the highest-level identifier that you can in order to be clear about the qualifying relationship to the parent domain.

❾ At this point in the program, we call the %make_sort_order macro twice to get the DMSORTSTRING and SUPPDMSORTSTRING macro variables defined. Then we use that to sort and save our final DM and SUPPDM domain datasets.

Building the LB Findings Domain

The following SAS code builds the LB laboratory data dataset from source clinical trial data found in “Appendix A - Source Data Programs.” The following program assumes that the source datasets can be found under the libref source and that the permanent SAS formats can be found under the libref library.

*---------------------------------------------------------------*;

* LB.sas creates the SDTM LB dataset and saves it

* as a permanent SAS datasets to the target libref.

*---------------------------------------------------------------*;

**** CREATE EMPTY DM DATASET CALLED EMPTY_LB;

%make_empty_dataset(metadatafile=SDTM_METADATA.xls,dataset=LB) ❶

**** DERIVE THE MAJORITY OF SDTM LB VARIABLES;

options missing = ' ';

data lb;

set EMPTY_LB

source.labs;

studyid = 'XYZ123';

domain = 'LB';

usubjid = left(uniqueid);

lbcat = put(labcat,$lbcat_labs_labcat.); ❷

lbtest = put(labtest,$lbtest_labs_labtest.);

lbtestcd = put(labtest,$lbtestcd_labs_labtest.);

lborres = left(put(nresult,best.));

lborresu = left(colunits);

lbornrlo = left(put(lownorm,best.));

lbornrhi = left(put(highnorm,best.));

**** create standardized results; ❸

lbstresc = lborres;

lbstresn = nresult;

lbstresu = lborresu;

lbstnrlo = lownorm;

lbstnrhi = highnorm;

**** urine glucose adjustment;

if lbtest = 'Glucose' and lbcat = 'URINALYSIS' then

do;

lborres = left(put(nresult,uringluc_labs_labtest.));

lbornrlo = left(put(lownorm,uringluc_labs_labtest.));

lbornrhi = left(put(highnorm,uringluc_labs_labtest.));

lbstresc = lborres;

lbstresn = .;

lbstnrlo = .;

lbstnrhi = .;

end;

if lbtestcd = 'GLUC' and lbcat = 'URINALYSIS' and

lborres = 'POSITIVE' then

lbnrind = 'HIGH';

else if lbtestcd = 'GLUC' and lbcat = 'URINALYSIS' and

lborres = 'NEGATIVE' then

lbnrind = 'NORMAL';

else if lbstnrlo ne . and lbstresn ne . and

round(lbstresn,.0000001) < round(lbstnrlo,.0000001) then

lbnrind = 'LOW';

else if lbstnrhi ne . and lbstresn ne . and

round(lbstresn,.0000001) > round(lbstnrhi,.0000001) then

lbnrind = 'HIGH';

else if lbstnrhi ne . and lbstresn ne . then

lbnrind = 'NORMAL';

visitnum = month;

visit = put(month,visit_labs_month.);

if visit = 'Baseline' then ❹

lbblfl = 'Y';

else

lbblfl = ' ';

if visitnum < 0 then

epoch = 'SCREENING';

else

epoch = 'TREATMENT';

lbdtc = put(labdate,yymmdd10.);

run;

proc sort

data=lb;

by usubjid;

run;

**** CREATE SDTM STUDYDAY VARIABLES; ❺

data lb;

merge lb(in=inlb) target.dm(keep=usubjid rfstdtc);

by usubjid;

if inlb;

%make_sdtm_dy(date=lbdtc)

run;

**** CREATE SEQ VARIABLE;

proc sort

data=lb;

by studyid usubjid lbcat lbtestcd visitnum;

run;

data lb;

retain &LBKEEPSTRING; ❻

set lb(drop=lbseq);

by studyid usubjid lbcat lbtestcd visitnum;

if not (first.visitnum and last.visitnum) then

put "WARN" "ING: key variables do not define an unique record. "

usubjid=;

retain lbseq 0;

lbseq = lbseq + 1;

label lbseq = "Sequence Number";

run;

**** SORT LB ACCORDING TO METADATA AND SAVE PERMANENT DATASET; ❼

%make_sort_order(metadatafile=SDTM_METADATA.xls,dataset=LB)

proc sort

data=lb(keep = &LBKEEPSTRING)

out=target.lb;

by &LBSORTSTRING;

run;

When the LB.sas program has been run, the LB SDTM domain datasets are saved to the target libref.

❶ The first step in this program creates the empty SDTM LB dataset called EMPTY_LB based on the domain metadata spreadsheet. That is then set with the source labs dataset, and variable population begins.

❷ Note here that these are character SAS formats that are created by the make_codelist_formats.sas program and that hold the controlled terminology for LBCAT, LBTEST, and LBTESTCD.

❸ This is a very simplistic example where the collected results are mapped directly over to the standardized results. This is not very likely. Hopefully you will at least be given source data, perhaps in the CDISC LAB format, that has both standardized and collected results already provided for you. If not, mapping lab data from collected data into standardized results can be a labor-intensive task.

❹ LBBLFL is your baseline lab flag. Once again, the approach taken here is fairly simplistic in that records where VISIT='Baseline' are flagged as the baseline record. In practice, this might not be the case. For example, it might be that a given patient does not have a VISIT= 'Baseline' record, and you need to go to that patient’s screening visit to pull an appropriate baseline record for flagging. For reasons like this, some in the industry would argue that the flagging of baseline records is a task best left to the analysis datasets in ADaM.

❺ At this point, the LB file is mostly built, and now we want to create the LBDY variable. To create LBDY, we must compare the dates stored in variables RFSTDTC and LBDTC. The DM and draft LB datasets are merged, and the %make_sdtm_dy macro is called to define LBDY.

❻ This DATA step is designed to create LBSEQ. There are some interesting features in this step. First, we need to DROP the old LBSEQ variable and re-create it because of the inherent data retention feature of the SAS DATA step. Because we have to drop and re-create the LBSEQ variable, we need that RETAIN statement to ensure that the variables appear in LB from left to right in the expected order. We also need the LABEL statement to redefine the label for LBSEQ. The warning message in that DATA step is there to ensure the uniqueness of the SDTM record. Keep in mind that although --SEQ and KEYSEQUENCE (found in the VARIABLE_METADATA metadata tab) are probably analogous for any domain, they do not have to be. It is very possible that the KEYSEQUENCE list of variables is not granular enough or specific enough to define --SEQ. Because each value of --SEQ must identify a unique record, it might require a more comprehensive list of variables than what is found in KEYSEQUENCE.

❼ The LB program ends with getting the prescribed sort order from the %make_sort_order macro and saving the final LB dataset. The final dataset is sorted and contains only the SDTM variables that we want.

Building a Custom XP Findings Domain

The following SAS code builds the XP dataset from our source clinical trial data found in “Appendix A - Source Data Programs.” Because there is no "headache pain" domain already defined in the SDTM, XP is a user-generated domain designed to capture the pain efficacy measurements for this study. For more information about how to build customized user-written domains, see Section 2.6 of the “CDISC SDTM Implementation Guide.” It is worth noting that there are new therapeutic area CDISC standards that extend the general SDTM into specific disease implementations, and there is in fact a draft pain assessment SDTM standard. However, that standard doesn’t specifically address pain intensity, and we wanted to illustrate the creation of a custom domain here, so the pain standard is not used here. The following program assumes that the source datasets can be found under the libref source and that the permanent SAS formats can be found under the libref library.

*---------------------------------------------------------------*;

* XP.sas creates the SDTM XP dataset and saves it

* as a permanent SAS datasets to the target libref.

*---------------------------------------------------------------*;

**** CREATE EMPTY DM DATASET CALLED EMPTY_XP;

%make_empty_dataset(metadatafile=SDTM_METADATA.xls,dataset=XP) ❶

proc format;

value pain

0='None'

1='Mild'

2='Moderate'

3='Severe';

run;

**** DERIVE THE MAJORITY OF SDTM XP VARIABLES;

options missing = ' ';

data xp;

set EMPTY_XP

source.pain;

studyid = 'XYZ123';

domain = 'XP';

usubjid = left(uniqueid);

xptest = 'Pain Score'; ❷

xptestcd = 'XPPAIN';

epoch = 'TREATMENT';

**** transpose pain data; ❸

array dates {3} randomizedt month3dt month6dt;

array scores {3} painbase pain3mo pain6mo;

do i = 1 to 3;

visitnum = i - 1;

visit = put(visitnum,visit_labs_month.);

if visit = 'Baseline' then

xpblfl = 'Y';

else

xpblfl = ' ';

if scores{i} ne . then

do;

xporres = left(put(scores{i},pain.));

xpstresc = xporres;

xpstresn = scores{i};

xpdtc = put(dates{i},yymmdd10.);

output;

end;

run;

proc sort

data=xp;

by usubjid;

run;

**** CREATE SDTM STUDYDAY VARIABLES; ❹

data xp;

merge xp(in=inxp) target.dm(keep=usubjid rfstdtc);

by usubjid;

if inxp;

%make_sdtm_dy(date=xpdtc)

run;

**** CREATE SEQ VARIABLE; ❺

proc sort

data=xp;

by studyid usubjid xptestcd visitnum;

run;

data xp;

retain &XPKEEPSTRING;

set xp(drop=xpseq);

by studyid usubjid xptestcd visitnum;

if not (first.visitnum and last.visitnum) then

put "WARN" "ING: key variables do not define an unique record. "

usubjid=;

retain xpseq;

if first.usubjid then

xpseq = 1;

else

xpseq = xpseq + 1;

label xpseq = "Sequence Number";

run;

**** SORT XP ACCORDING TO METADATA AND SAVE PERMANENT DATASET; ❻

%make_sort_order(metadatafile=SDTM_METADATA.xls,dataset=XP)

proc sort

data=xp(keep = &XPKEEPSTRING)

out=target.xp;

by &XPSORTSTRING;

run;

When the XP.sas program has been run, the XP SDTM domain dataset is saved to the target libref.

❶ The first step in this program creates the empty SDTM XP dataset called EMPTY_XP based on the domain metadata spreadsheet. EMPTY_XP is then set with the source pain dataset, and variable population begins.

❷ Because there is only one measurement of Pain Score in this data, XPTEST and XPTESTCD can have simple assignment statements here.

❸ This section of the DATA step transposes the data so that there is one record output per visit.

❹ At this point, the XP file is mostly built. Now we want to create the XPDY variable. To create XPDY, we must compare the variables RFSTDTC and XPDTC. The DM and draft XP datasets are merged, and the %make_sdtm_dy macro is called to define XPDY.

❺ XPSEQ is created at this point.

❻ The XP program ends with getting the prescribed sort order from the %make_sort_order macro and saving the final XP dataset. The final dataset is sorted and contains only the SDTM variables that we want.

Building the AE Events Domain

The following SAS code builds the AE adverse events domain dataset from our source clinical trial data found in “Appendix A - Source Data Programs.” The following program assumes that the source datasets can be found under the libref source and that the permanent SAS formats can be found under the libref library.

*---------------------------------------------------------------*;

* AE.sas creates the SDTM AE dataset and saves it

* as permanent SAS datasets to the target libref.

*---------------------------------------------------------------*;

**** CREATE EMPTY DM DATASET CALLED EMPTY_AE; ❶

%make_empty_dataset(metadatafile=SDTM_METADATA.xls,dataset=AE)

**** DERIVE THE MAJORITY OF SDTM AE VARIABLES;

options missing = ' ';

data ae;

set EMPTY_AE

source.adverse(rename=(aerel=_aerel aesev=_aesev));

studyid = 'XYZ123';

domain = 'AE';

usubjid = left(uniqueid);

aeterm = left(aetext);

aedecod = left(prefterm);

aeptcd = left(ptcode);

aellt = left(llterm);

aelltcd = left(lltcode);

aehlt = left(hlterm);

aehltcd = left(hltcode);

aehlgt = left(hlgterm);

aehlgtcd = left(hlgtcod);

aesoc = left(bodysys);

aebodsys = aesoc;

aebdsycd = left(soccode);

aesoccd = aebdsycd;

aesev = put(_aesev,aesev_adverse_aesev.); ❷

aeacn = put(aeaction,acn_adverse_aeaction.);

aerel = put(_aerel,aerel_adverse_aerel.);

aeser = put(serious,$ny_adverse_serious.);

aestdtc = put(aestart,yymmdd10.);

aeendtc = put(aeend,yymmdd10.);

epoch = 'TREATMENT';

if aeser = 'Y' then

aeslife = 'Y';

run;

proc sort

data=ae;

by usubjid;

run;

**** CREATE SDTM STUDYDAY VARIABLES; ❸

data ae;

merge ae(in=inae) target.dm(keep=usubjid rfstdtc);

by usubjid;

if inae;

%make_sdtm_dy(date=aestdtc);

%make_sdtm_dy(date=aeendtc);

run;

**** CREATE SEQ VARIABLE; ❹

proc sort

data=ae;

by studyid usubjid aedecod aestdtc aeendtc;

run;

data ae;

retain &AEKEEPSTRING;

set ae(drop=aeseq);

by studyid usubjid aedecod aestdtc aeendtc;

if not (first.aeendtc and last.aeendtc) then

put "WARN" "ING: key variables do not define an unique record. "

usubjid=;

retain aeseq 0;

aeseq = aeseq + 1;

label aeseq = "Sequence Number";

run;

**** SORT AE ACCORDING TO METADATA AND SAVE PERMANENT DATASET; ❺

%make_sort_order(metadatafile=SDTM_METADATA.xls,dataset=AE)

proc sort

data=ae(keep = &AEKEEPSTRING)

out=target.ae;

by &AESORTSTRING;

run;

When the AE.sas program has been run, the AE SDTM domain dataset is saved to the target libref.

❶ The first step in this program creates the empty SDTM AE dataset called EMPTY_AE based on the domain metadata spreadsheet.

❷ Note here that these are SAS formats that are created by the make_codelist_formats.sas program and that hold the controlled terminology for AESEV, AEACN, AEREL, and AESER. Because the AESEV and AEREL variables existed already in the source adverse dataset, they had to be renamed in the SET statement.

❸ The %make_sdtm_dy macro is called twice here to create AESTDY and AEENDY.

❹ The sequence variable AESEQ is created in a fashion similar to what was done for previous domains.

❺ The program calls %make_sort_order here and saves the permanent domain dataset as was done before.

Building the EX Exposure Interventions Domain

The following SAS code builds the EX exposure domain dataset from our source clinical trial data found in “Appendix A - Source Data Programs.” The following program assumes that the source datasets can be found under the libref source and that the permanent SAS formats can be found under the libref library.

*---------------------------------------------------------------*;

* EX.sas creates the SDTM EX dataset and saves it

* as a permanent SAS datasets to the target libref.

*---------------------------------------------------------------*;

**** CREATE EMPTY EX DATASET CALLED EMPTY_EX;

%make_empty_dataset(metadatafile=VARIABLE_METADATA.xlsx,dataset=EX)

**** DERIVE THE MAJORITY OF SDTM EX VARIABLES;

options missing = ' ';

data ex;

set EMPTY_EX

source.dosing;

studyid = 'XYZ123';

domain = 'EX';

usubjid = left(uniqueid);

exdose = dailydose;

exdostot = dailydose;

exdosu = 'mg';

exdosfrm = 'TABLET, COATED';

%make_dtc_date(dtcdate=exstdtc, year=startyy, month=startmm,❶

day=startdd)

%make_dtc_date(dtcdate=exendtc, year=endyy, month=endmm,

day=enddd)

run;

proc sort

data=ex;

by usubjid;

run;

**** CREATE SDTM STUDYDAY VARIABLES AND INSERT EXTRT;

data ex;

merge ex(in=inex) target.dm(keep=usubjid rfstdtc arm);

by usubjid;

if inex;

%make_sdtm_dy(date=exstdtc);

%make_sdtm_dy(date=exendtc);

**** in this simplistic case all subjects received the

**** treatment they were randomized to; ❷

extrt = arm;

run;

**** CREATE SEQ VARIABLE;

proc sort

data=ex;

by studyid usubjid extrt exstdtc;

run;

data ex;

retain &EXKEEPSTRING;

set ex(drop=exseq);

by studyid usubjid extrt exstdtc;

if not (first.exstdtc and last.exstdtc) then

put "WARN" "ING: key variables do not define an unique"

" record. " usubjid=;

retain exseq;

if first.usubjid then

exseq = 1;

else

exseq = exseq + 1;

label exseq = "Sequence Number";

run;

**** SORT EX ACCORDING TO METADATA AND SAVE PERMANENT DATASET;

%make_sort_order(metadatafile=TOC_METADATA.xlsx,dataset=EX)

proc sort

data=ex(keep = &EXKEEPSTRING)

out=target.ex;

by &EXSORTSTRING;

run;

When the EX.sas program has been run, the EX SDTM domain dataset is saved to the target libref. Similar methods that were seen in previous domains were used here to create the EX domain.

❶ To keep things a bit simple and clean, our date sources have been complete dates so far. At this point, we use the %make_dtc_date SAS macro that creates EXSTDTC and EXENDTC from partial dates. If you look at the patient with USUBJID="UNI712" as the first record, the patient has only a start year, so EXSTDTC="2010" and EXSTDY are missing. For the second record, the patient has both a year and month present for the dosing stop date, so EXENDTC="2010-12" and EXENDY are missing.

❷ The way that EXTRT is derived here is perhaps overly simplistic. EXTRT should contain the name of the actual treatment that the subject received. Here we assume that the subject received what they were randomized to, which is present in the DM ARM variable.

Building Trial Design Model (TDM) Domains

The following SAS code builds the TA, TD, TE, TI, TS, and TV datasets. These TDM domains consist entirely of trial metadata and contain no actual patient data. Because this is the case, you will not typically find this data in your underlying clinical data management systems. For this chapter, this metadata was entered manually into an Excel file called trialdesign.xlsx. Until clinical trial data systems mature significantly, you might find that you have to create this study metadata manually as well. See Chapter 13 for a discussion of the related Protocol Representation Model. There are two other TDM datasets that are based on the subject clinical data called SE and SV, but those are not presented or created here.

*---------------------------------------------------------------*;

* TDM.sas creates the SDTM TA, TD, TE, TI, TS, and TV datasets and

* saves them as a permanent SAS datasets to the target libref.

*---------------------------------------------------------------*;

**** CREATE EMPTY TA DATASET CALLED EMPTY_TA; ❶

%make_empty_dataset(metadatafile=SDTM_METADATA.xls,dataset=TA)

proc import ❷

datafile="trialdesign.xls"

out=ta

dbms=excelcs

replace;

sheet='TA';

run;

**** SET EMPTY DOMAIN WITH ACTUAL DATA; ❸

data ta;

set EMPTY_TA

ta;

run;

**** SORT DOMAIN ACCORDING TO METADATA AND SAVE PERMANENT DATASET;

%make_sort_order(metadatafile=SDTM_METADATA.xls,dataset=TA) ❹

proc sort

data=ta(keep = &TAKEEPSTRING)

out=target.ta;

by &TASORTSTRING;

run;

**** CREATE EMPTY TD DATASET CALLED EMPTY_TD;

%make_empty_dataset(metadatafile=SDTM_METADATA.xls,dataset=TD)

proc import

datafile="trialdesign.xls"

out=td

dbms=excelcs

replace;

sheet='TD';

run;

**** SET EMPTY DOMAIN WITH ACTUAL DATA;

data td;

set EMPTY_TD

td;

run;

**** SORT DOMAIN ACCORDING TO METADATA AND SAVE PERMANENT DATASET;

%make_sort_order(metadatafile=SDTM_METADATA.xls,dataset=TD)

proc sort

data=td(keep = &TDKEEPSTRING)

out=target.td;

by &TDSORTSTRING;

run;

**** CREATE EMPTY TE DATASET CALLED EMPTY_TE;

%make_empty_dataset(metadatafile=SDTM_METADATA.xls,dataset=TE)

proc import

datafile="trialdesign.xls"

out=te

dbms=excelcs

replace;

sheet='TE';

run;

**** SET EMPTY DOMAIN WITH ACTUAL DATA;

data te;

set EMPTY_TE

te;

run;

**** SORT DOMAIN ACCORDING TO METADATA AND SAVE PERMANENT DATASET;

%make_sort_order(metadatafile=SDTM_METADATA.xls,dataset=TE)

proc sort

data=te(keep = &TEKEEPSTRING)

out=target.te;

by &TESORTSTRING;

run;

**** CREATE EMPTY TI DATASET CALLED EMPTY_TI;

%make_empty_dataset(metadatafile=SDTM_METADATA.xls,dataset=TI)

proc import

datafile="trialdesign.xls"

out=ti

dbms=excelcs

replace;

sheet='TI';

run;

**** SET EMPTY DOMAIN WITH ACTUAL DATA;

data ti;

set EMPTY_TI

ti;

run;

**** SORT DOMAIN ACCORDING TO METADATA AND SAVE PERMANENT DATASET;

%make_sort_order(metadatafile=SDTM_METADATA.xls,dataset=TI)

proc sort

data=ti(keep = &TIKEEPSTRING)

out=target.ti;

by &TISORTSTRING;

run;

**** CREATE EMPTY TS DATASET CALLED EMPTY_TS;

%make_empty_dataset(metadatafile=SDTM_METADATA.xls,dataset=TS)

proc import

datafile="trialdesign.xls"

out=ts

dbms=excelcs

replace;

sheet='TS';

run;

**** SET EMPTY DOMAIN WITH ACTUAL DATA;

data ts;

set EMPTY_TS

ts;

run;

**** SORT DOMAIN ACCORDING TO METADATA AND SAVE PERMANENT DATASET;

%make_sort_order(metadatafile=SDTM_METADATA.xls,dataset=TS)

proc sort

data=ts(keep = &TSKEEPSTRING)

out=target.ts;

by &TSSORTSTRING;

run;

**** CREATE EMPTY TV DATASET CALLED EMPTY_TV;

%make_empty_dataset(metadatafile=SDTM_METADATA.xls,dataset=TV)

proc import

datafile="trialdesign.xls"

out=tv

dbms=excelcs

replace;

sheet='TV';

run;

**** SET EMPTY DOMAIN WITH ACTUAL DATA;

data tv;

set EMPTY_TV

tv;

run;

**** SORT DOMAIN ACCORDING TO METADATA AND SAVE PERMANENT DATASET;

%make_sort_order(metadatafile=SDTM_METADATA.xls,dataset=TV)

proc sort

data=tv(keep = &TVKEEPSTRING)

out=target.tv;

by &TVSORTSTRING;

run;

When the TDM.sas program has been run, the TA, TE, TI, TS, and TV SDTM domain datasets are saved to the target libref.

❶ For each of the TDM domains, the %make_empty_dataset macro is run to create the empty dataset for populating.

❷ PROC IMPORT is used to import the trial design dataset metadata spreadsheet from Excel.

❸ The empty domain dataset structure is set with the actual data here.

❹ The proper sort order is obtained from the %make_sort_order macro, and then the permanent domain dataset is stored.

Chapter Summary

The best approach to implementing the SDTM in Base SAS is through the heavy use of metadata and SAS macros for repetitive task automation.

Build a set of SAS macros and programs that handle SDTM domain construction repetitive tasks. Some examples in this chapter include an empty dataset creator, a dataset sorter, a --DTC date creator, a study day calculator, and an automated format library generator. You can add more SAS macro-based tools as you see fit.

Keep in mind that the examples in this chapter were made somewhat simple to keep the size of the book manageable and easily understood. When you convert data to the SDTM, it can get fairly complicated fairly fast, so the examples in this chapter might provide only superficial treatment to some complicated SDTM data conversion issue. This chapter assumes that standardized lab results are the same as original results, that there is one record per nominal visit, that defining baseline flags --BLFL is simple, and generally that the incoming data is somewhat well behaved, which is rarely the case in clinical study data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 3: Implementing the CDISC SDTM with Base SAS

Create new playlist

Sign In

Sign Up