Chapter 9: CDISC Validation Using Pinnacle 21 Community

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 9: CDISC Validation Using Pinnacle 21 Community

Getting Started with Pinnacle 21 Community

Running Pinnacle 21 Community (Graphical User Interface)

Evaluating the Report

Modifying the Configuration Files

A Note about Controlled Terminology

Running Pinnacle 21 Community (Command-Line Mode)

ADaM Validation with Pinnacle 21 Community

ADaM Traceability Checks with SAS

Define.xml Validation with Pinnacle 21 Community

Chapter Summary

Pinnacle 21 Community, formerly OpenCDISC Community, is a free tool based on Java. Pinnacle 21 Community has evolved from the original OpenCDISC Validator tool that was first released in 2008 via the website https://www.pinnacle21.net/. Originally, the functionality of the OpenCDISC was focused on performing, summarizing, and reporting on the draft compliance checks that were made publicly available by the FDA in 2008 for the purpose of loading CDISC data into the Janus Clinical Trial Repository. (See http://www.fda.gov/downloads/ForIndustry/DataStandards/StudyDataStandards/UCM190628.pdf.) Although today’s version of the tool has many new features, it is still primarily used for validating CDISC data to help ensure compliance with the respective CDISC models and regulatory expectations.

From the first release to the current version, the tool has been remarkably successful within the pharmaceutical industry for three primary reasons: It is easy to obtain and install, it is very easy to use once installed, and it is free. Some of these features are intertwined. The fact that it is free makes it very easy to acquire and install, because there is no licensing to procure beforehand. But the installation procedure itself is also quite simple. Once the tool is installed, the intuitive user interface makes generating a validation report very easy. Once generated, the report itself is organized in an intuitive and easy to comprehend spreadsheet. As of Version 2.1.0 of the tool, validation is available for SDTM data, ADaM data, SEND data, and define.xml files.

In this chapter, we will demonstrate how to use the Pinnacle 21 Community tool using the SDTM, ADaM data, and the define.xml files that we created in previous chapters.

There are still SAS elements within this chapter. We will first cover how SAS can be used to run Pinnacle 21 Community. We will also introduce you to a SAS macro, %adamtrace, that is used to ensure that records that trace back to one another are, in fact, consistent with respect to their values.

Getting Started with Pinnacle 21 Community

Pinnacle 21 Community can be downloaded via the Pinnacle 21 website (https://www.pinnacle21.net/downloads). Instructions on how to install the software are available at https://www.pinnacle21.net/projects/installing-opencdisc-community, and some additional installation details are below.

Running Pinnacle 21 Community (Graphical User Interface)

Pinnacle 21 Community is available for both PCs and Macs. On a PC, you can install the software pretty much anywhere you wish (provided you have Write access to the intended location). In the root installation directory, simply double-click the pinnacle21-community.exe file, and the application will start. The “home” screen displays the four basic functions of the tool: the Validator, the define.xml Generator, the Data Converter, and the ClinicalTrials.gov Miner.

We will focus primarily on the Validator, shown in the next screenshot.

The straightforward interface hardly deserves any explanation, but we’ll provide one anyway. First, select the standard that corresonds to what you want to have validated (SDTM, ADaM, SEND, or Define.xml). Next, select the configuration. The available drop-down list depends on the standard chosen. For SDTM, many different versions of the standard are available. Some are designated as FDA checks, and some are designated as PMDA checks (Pharmaceuticals and Medical Device Agency—essentially the Japanese FDA). Typically, SAS transport files will be chosen from the Source Format options, but other options are also available. The Source Data can either be dragged to the white window or selected manually via the Browse button.

For added insurance that your data are consistent with your metadata, there is also the option to select a define.xml file for cross-validation. Note that the findings from a validation that includes a corresponding define.xml file may differ from those that do not include a define.xml file. So it’s recommended to do separate validations for both scenarios.

Lastly, the CDISC CT pertains to the appropriate version of the CDISC controlled terminology that was used for your data. These versions are stored electronically by the National Cancer Institute (NCI). (See http://www.cancer.gov/research/resources/terminology/cdisc.)

When you are running the Pinnacle 21 Community GUI, the options for the format of the validation report are Microsoft Excel file or a CSV file. Note that you do not necessarily need Excel to open the Excel file. Many common spreadsheet programs can be used to read the Excel file results.

In the following example, the SDTM data that you first created in Chapter 3 are selected for evaluation, and the define.xml file that you created in Chapter 2 is also selected. When all of your inputs have been set, you are ready to start the validation process by clicking the Validate button. When the validation is done, you should see a window similar to the following screenshot.

Evaluating the Report

The following screenshot shows a sample of the Excel report early in the validation process. There are four tabs, or worksheets, in the Excel report: Dataset Summary, Issue Summary, Details, and Rules. The data set summary provides information about each data set in a quick, snapshot view. Each processed data set is listed in a separate row with columns for the data set label, the SDTM class, filename, and the number of records, warnings, and notices (informational messages) relating to the domain.

The following screenshot shows a sample of the issue summary. This worksheet lists all errors found and provides a count of each error type. Errors are the most serious issues (unless using PMDA rules, which include a Reject status). A short description of the error is provided as well as a link to the RuleID. Each RuleID is listed in the Rules worksheet, so clicking the link brings you to that worksheet where full details of the rule are provided. These rules are defined with the configuration file that was specified on the start screen, which, in this case, is the SDTM 3.2.xml file. We will cover these configuration files in greater detail in a later section.

After errors, a list of all warnings is provided (within each data set). Warnings are considered less severe. But they should be evaluated to determine whether they are issues that require some sort of fix either to the data, the metadata, or perhaps both.

After the warnings are listed, notices are provided. These often include the checks that could not be made due to a lack of data.

The next worksheet is titled Details and is shown in the following screenshot. Here you can find the details pertaining to each rejection, error, warning, or notice (as applicable). These results are listed by domain and provide information about the record number; the affected variables; the pertinent values; the rule ID (with a link to the rule); the message; the category of the rule (Format, Limit, Metadata, Presence, System, or Terminology); and the type of result (Rejection, Error, Warning, or Notice). If you chose to have your report produced in CSV format, only the information seen in the Details worksheet of the Excel report is provided in the resulting CSV report file.

The final worksheet entitled Rules contains a list of each RuleID. The links to each rule ID in the Issues Summary and Details worksheets take you to this worksheet:

Sometimes certain results, such as warnings (but, in some cases, errors as well) are not pertinent to your particular set of data or for your interpretation or your organization’s interpretation of the standard. If this is the case, you can either choose to do one of the following: 1) ignore the result and address it in your reviewer’s guide with an explanation as to why the message does not need to be addressed or 2) modify or de-activate the rules. In many cases, the former option would be the preferred one. But in other situations, such as those that might involve a change to the standard that has not yet been reflected in the configuration file, an erroneous check, or a change to rules themselves, the latter option might be preferred. The next section explains how to make changes to the configuration file and related data.

Modifying the Configuration Files

As mentioned earlier, the configuration files that Pinnacle 21 Community uses for performing validation checks are XML files located in the componentsconfig subdirectory of the software’s installation directory. With the style sheets already a part of the installation, these files should be viewable in a web browser, as shown in the following screenshot. You can also view the compliance check repository on the Pinnacle 21 website: (https://www.pinnacle21.net/validation-rules).

As you review the results of the compliance checks, you might come across checks that, for whatever reason, do not apply to your data, your study, your development program, or you just consider them erroneous. As an alternative to performing these checks and presenting them in your validation report, you could identify them within the XML configuration file and either comment them out or set their active status to No. Note that editing of the checks cannot be done via a web browser because web browsers only enable you to view the files. You would first need to open the files in either a text editor or an XML editor. Then you would have to search on checks by the ID and make the appropriate modifications. You should consider saving the validation file under a different name so as not to overwrite the original. If the new file is saved in the same config directory, then it should appear in the drop-down list of configuration files in the GUI.

Deactivating rules by editing the XML configuration file is very straightforward. Let’s use one in our validation report as an example. Many of them pertain to check SD1082, “Variable length is too long for actual data.” In Chapter 11, we have a macro to address this . But, nonetheless, to call this an actual error might be considered a “stretch” of some valid FDA advice, which in spirit is a preference against gross variable length settings that can inflate data set sizes unnecessarily (like setting the length for an SDTM DOMAIN variable to $200). To deactivate this rule, first simply find it in the configuration file, as shown here:

<val:ValidationRuleRef RuleID="SD1082" Active="Yes"/>

Unfortunately, this appears in 39 different places within the SDTM 3.2.xml configuration file. But with a text-editor with a global search-and-replace function, you could easily change this:

<val:ValidationRuleRef RuleID="SD1082" Active="No"/>

Be sure to keep track of the changes made so that they can be communicated to other organizations, including regulatory agencies, who may be using default configuration files for their own internal validation.

Adding new compliance checks is an option, too, although the syntax and logic involved in writing new checks is not a straightforward exercise. An overview of the syntax for the validation rules can be found online at https://www.pinnacle21.net/projects/validator/opencdisc-validation-framework. For example, suppose we want to add one of the SDTM 3.1.2 Amendment 1 variables to the DM domain, RFXSTDTC. This variable represents the date/time of first study treatment exposure. (Note that this and many other Amendment 1 variables were added to SDTM version 3.1.3 and the corresponding Pinnacle 21 configuration files.) We would first need to add this variable to the list of DM variables. So as not to have to renumber all of the existing variables, the variable will be added after the last DM variable in the SDTM 3.1.2 configuration file, DM.RFENDY, as shown in the following code:

Next, we want to define the label for this variable. Because it is similar to RFSTDTC, we will insert it after that:

Finally, we will add a check similar to the check for RFSTDTC. The ID for this check will come sequentially after the last check in the list (SD2265):

<val:Required ID="SD0087" Variable="RFSTDTC" When="ARMCD @neqic 'SCRNFAIL' @and ARMCD @neqic 'NOTASSGN'" Message="RFSTDTC cannot be null for randomized subject" Description="Subject Reference Start Date/Time (RFSTDTC) is required for all randomized subjects, those where Planned Arm Code (ARMCD) is not equal to 'SCRNFAIL' or 'NOTASSGN'." Category="Consistency" Type="Warning"/>

<val:Required ID="SD2265" Variable="RFXSTDTC" When="EXSTDTC != ''" Message="RFXSTDTC cannot be null for subjects who received study drug" Description="Subject First Study Drug Exposure Date/Time (RFXSTDTC) is required for all subjects who received protocol-specified treatment or therapy." Category="Consistency" Type="Warning"/>

If this check is done correctly, you should be able to find it in your validation report.

If the task of writing your own compliance checks looks too daunting, consider communicating with Pinnacle 21 developers via their online web forums. This way you can get additional details about how best to either create your own checks or have the developers consider new checks for general distribution. Another option is to contact CDISC’s SDS Compliance Subteam.

One last note about creating your own customized configuration files: You might want to share them with other organizations, including regulatory agencies, who might be doing their own validation. This can be easier than having to repeatedly explain checks from one submission to the next that are not, for whatever reason, pertinent. Fortunately, the Pinnacle 21 validator software is very intelligent in that it will identify in its Configuration drop-down menu any file that exists in the proper directory. So providing the customized configuration file along with instructions as to where to put it makes it very easy for other organizations to use your configuration for their own validation. See the following screenshot for our SDTM 3.2-custom file as an example.

A Note about Controlled Terminology

One common set of warnings that users might experience is that relating to terminology. Staying up-to-date with the most recent controlled terminology is important, but not always practical. When a define.xml file is included when validating SDTM data, the terminology specified in the define file is used to check against values in the data. The configuration files that Pinnacle 21 Community uses for performing validation checks are XML files. The XML files that contain the controlled terminology are located in the componentsconfigdataCDISC subdirectory of the Pinnacle 21 Community installation directory. From there, additional subdirectories exist for SEND, SDTM, and ADaM data. And, within each of those, exists the last branch of subdirectories that are named after the dates that correspond to the version of the controlled terminology that your submission or study data follow. See the following screenshot for an example.

In earlier versions of the software, you could manually add updated XML files to these directories when new controlled terminology versions were added to the National Cancer Institute (NCI) archive at http://evs.nci.nih.gov/ftp1/CDISC/SDTM/ (for SDTM data, for example). Starting with version 2.0 of the software, however, these files are now pushed by Pinnacle 21. This is because there is some pre-processing that has to be done to the XML files in order to make them compatible with the software.

Running Pinnacle 21 Community (Command-Line Mode)

A lesser-known feature of Pinnacle 21 Community is the ability to run it in command-line mode. In command-line mode, a third output option is available for the report file: XML. To the SAS programmer, a possibly better reason to use the tool through the command line is that it provides the ability to run it as a part of a batch process. Before demonstrating this, let’s first look at how the tool can be run from the command line in a Windows environment.

To start, open a Command Prompt window. From the window, navigate to the folder where your Pinnacle 21 Community files were installed (or unzipped). Then navigate two folders down to the componentslib directory. A good place to start is to use the -help option. To view this, enter the following command:

>java -jar validator-cli-2.1.0.jar -help

Note that the filenames are specific to the version of the software being used. So if you are using a newer version of the software, then the filename above will need to be updated accordingly.

The resulting output looks like this:

The -source, -config, and -report parameters are required if you are not using the validator to construct a define.xml file.

Now let’s look at each option needed to validate a set of version 3.2 SDTM transport files that are contained in one directory. Table 9.1 provides a summary of these options.

Table 9.1: Selected Pinnacle 21 Community Command-Line Parameters

Option Name	Option Value	Purpose
-type=	SDTM\|ADaM\|SEND\|Define\|Custom	Indicates what type of validation is being requested, that of SDTM, ADaM, or SEND data sets or a define.xml file.
-source=	" [dataset -path]*.xpt"	Provides the location of the data sets to be validated. An asterisk (*) can be used as a wildcard to validate all XPT files in the directory.
-config=	" [config-file-path]config-sdtm-3.2.xml"	Provides the path of the XML configuration file. Unless a customized file is stored elsewhere, this will typically be the location in the componentsconfig subdirectory to where the software was installed or unzipped.
-config:define=	"[define-path]define.xml"	It is always best to have the data sets validated against the metadata. This option specifies the location of the define.xml file (usually in the same directory as the SDTM data sets).
-report=	" [report-file-path]sdtm-validation-report-[Date:Time].XLS"	Specifies the location and name of the output Excel file. As a convention, the creation date and time can be appended to the filename.
-report:overwrite=	"yes"	With the date and time appended to the end of the filename, this option might not be needed. Otherwise, with the value of “yes,” the old file will be overwritten.

By putting these options together, you can run the Pinnacle 21 Community Validator just as you would through the GUI. However, manually entering all of these commands and file paths is always prone to error. The advantage of using the command-line interface is that, rather than entering the commands yourself each time you want to submit a job, you can enter them in a SAS program and have the SAS program submit the job via the X command. The X command allows operating system commands to be executed during execution of your SAS program. In a Windows environment, this is done via the Command Prompt or DOS window. These operating-system commands can either be run concurrently, while SAS continues to run (when the NOXSYNC option is active), or sequentially, where SAS waits for the Command Prompt window to exit first (when the XSYNC option is active).

The following code is a demonstration of this implementation, specific to an SDTM validation:

** Run the Pinnacle 21 Community in batch mode;

** set the appropriate system options;

options symbolgen xsync noxwait ; ❶

** specify the path and the file for running the validator in command-line mode;

%let p21path=c:softwarepinnacle21-community;

%let validatorjar=componentslibvalidator-cli-2.1.0.jar;

** specify the location of the data sets to be validated. ;

%let sourcepath=g:2nd-editiondefine2sdtm;

** specify the file to be validated;

** to validate all files in a directory use the * wildcard;

%let files=*.xpt; ❷

** specify the config file path;

%let config=&p21pathcomponentsconfigconfig-sdtm-3.1.2.xml;

** specify the name and location of the codelist file;

%let codelists=&p21pathcomponentsconfigdataCDISCSDTM2014-09-26SDTM Terminology.odm.xml;

** specify the output report path;

%let reportpath=&sources;

** specify the name of the validation report;

** append the file name with the date and time; ❸

%let reportfile=sdtm-validation-report-&sysdate.T%sysfunc(tranwrd(&systime,:,-)).xls;

** run the report;

x java -jar "&p21path&validatorjar" –typ=SDTM -source="&sourcepath&files " -config="&config"

-config:define="&sourcesdefine.xml" –config:codelists=”&codelists” -report="&reportpath&reportfile" -report:overwrite="yes" ; ❹

❶ When the XSYNC option is active, SAS waits for the command to finish executing before returning to the SAS commands. With NOXWAIT active, an EXIT command does not have to be entered in order to return control to SAS.

❷ The Pinnacle 21 command line allows the use of a wildcard to evaluate all XPT files in a directory (for example, *.XPT). Alternatively, an individual file can be entered.

❸ An advantage of the command line submission is the ability to specify the name of the report file. We will, however, maintain the GUI convention of appending the date and time to the base filename in order to ensure uniqueness of the filenames.

❹ Finally, all of the options are put together into one X command. Note the use of the config:define option, where it is assumed that the define.xml file exists in the same directory as the data sets themselves. This could easily be changed by adding an additional macro variable for the location of the define file.

For anyone with experience going through the SDTM validation process, the advantages of this approach are rather clear. Rarely is the validation process a simple one- or two-step approach of running a report, addressing the issues, and then running the report again to show that the issues have disappeared. Often this is much more an iterative process where you must whittle away at issues. With this in mind, the advantages of this example are much more apparent. Rather than having to manually validate at each iteration, the validation can be done automatically, as part of the iterative process. Hence, the same rationale for writing and saving SAS programs can be applied to the validation process.

After a version of the code used in this example is customized for your specific project, it could be further generalized for any project. Whatever internal process you might have for setting up new studies and (presumably) specifying data libraries and file paths in a central location can easily be applied to the code.

To help you get started, a SAS macro for automated batch submissions (called %run_p21v) can be found in Appendix D and with the book’s online materials under the authors’ pages (http://support.sas.com/publishing/authors/index.html). To get the macro to work in your environment, you will likely need to make some updates such as to the software’s installation path. Such changes should be minimal, and the macro can be put to use with few modifications. The details of the macro are a bit too involved to include within this chapter, but the example above gives you an idea of the key parameters needed for most common situations. An example macro call is shown below.

** validate the data sets;

%run_p21v(type=SDTM, sources=&pathSDTM, files=*.xpt, define=Y,

ctdatadate=2014-09-26);

ADaM Validation with Pinnacle 21 Community

Shortly after the release of version 1.0 of the ADaM IG, a CDISC ADaM validation sub-team was charged with the task of creating a list of ADaM compliance checks. Seeking to avoid the issue with the SDTM compliance checks—having many different sets of checks published by many different organizations—the ADaM team thought it best that checks first be published by the group of individuals who know the standard best—the ADaM team members. As of this writing, Version 1.3 of these checks (released on March 16, 2015) are the most recent and include over 250 checks They are available for download from http://www.cdisc.org/standards/foundational/adam.

While the ADaM team has been working on their own checks, the developers for Pinnacle 21 were also working on a list of additional checks that could be run in their software, similar to the SDTM checks. Efforts have been made to harmonize these two sets of checks into one.

The process by which ADaM validation is done in Pinnacle 21 Community is identical to that by which SDTM checks can be run. You open the tool; you select your standard, source data, and configuration file; and simply click Validate.

The next section will address a validation step that is outside the scope of the Pinnacle 21 Community software: ADaM traceability checks.

ADaM Traceability Checks with SAS

As discussed in the previous section, compliance checks of ADaM data follow a process similar to SDTM compliance checks. A component of ADaM validation that is more unique to ADaM data is traceability. With the traceability features built into ADaM, it is possible to do automated checks between the ADaM AVAL in a BDS-structured data set (or AEDECOD in an ADAE file) and a corresponding value in a predecessor data set (for example, an SDTM data set). Doing such checks helps ensure the accuracy of your derivations and of the information that you provide for making those derivations traceable, which can be particularly important for legacy data conversions, as noted in the FDA’s Study Data Technical Conformance Guide (http://www.fda.gov/downloads/ForIndustry/DataStandards/StudyDataStandards/UCM384744.pdf).

There are two primary ways that you can reference specific source records in a predecessor data set: 1) by using the SRCDOM, SRCVAR, and SRCSEQ variables or 2) by using an SDTM --SEQ variable. The %adamtrace macro below identifies when these methods have been used and, if they have, checks to see that AVAL in the given ADaM data set matches the source value in the predecessor data set.

If Method 1 has been used (use of SRC--- variables), then the value of the variable that is identified by SRCVAR (in the domain or data set identified by SRCDOM and for the row or record identified by SRCSEQ) is compared to the AVAL for the given data set.

If Method 2 has been used, then the source domain is identified by the prefix of the --SEQ variable present in the data set. The variable used for comparisons is the --STRESN variable. Therefore, if ADLB contains LBSEQ, then the value of AVAL is compared to LBSTRESN for the given subject and the specified LBSEQ value in the LB SDTM domain.

Note, however, that it is assumed that a data set named with a matching prefix exists. If, for example, the lab data were broken down into separate data sets, one for chemistry data named LBC and one for hematology data named LBH, then the macro would not work because of the non-existence of a data set named LB.

If the ADaM data set has ADAE in the name, then both the source variable and the analysis variable in the current data set are both assumed to be AEDECOD, and the values of AEDECOD are compared to one another. Note that the macro has not yet been extended to the new ADaM class OCCDS (Occurrence Data Structure).

The full code to a macro to perform such checks can be found under the authors’ pages on the SAS website (http://support.sas.com/publishing/authors/index.html). The macro has only three parameters: ADAMLIB, SRCLIB, and DATASETS. The ADAMLIB parameter should be a libref to a set of ADaM data sets (or, at least, one ADaM data set). It works only on native SAS data sets as opposed to SAS transport files (suggestions on how to easily convert between the two formats can be found in Chapter 11). Similarly, the SRCLIB parameter should refer to a libref for a set of source data sets (typically SDTM data sets). These two parameters are required. The final parameter, DATASETS, is optional. If it is left blank, then every data set in the ADAMLIB is checked to determine whether it contains the traceability variables. If so, the traceability check is performed. Optionally, you can specify one or more data sets (separated by a pipe or | character), and only those specified data sets will be evaluated.

An example macro call to test all applicable data sets and analysis variables (AVAL or AEDECOD) in a given ADaM library with a libref of ADAM and an SDTM source library called SDTM would be as follows:

%adamtrace(adamlib=adam, srclib=sdtm);

It should be pointed out that the “immediate predecessor” data set, to use the parlance of the ADaM IG, will not always be an SDTM data set. As shown with our example data, where ADTTE was derived from ADAE, the immediate predecessor could be another ADaM data set. If this is the case, then those specific data sets that have SRCDOM pointing to another ADaM data set should be explicitly included in the DATASETS parameter, and the SRCLIB parameter should point to the same library as the ADAMLIB parameter. This is shown in the following example macro call:

%adamtrace(adamlib=adam, srclib=adam, datasets=ADTTE);

Similarly, you would then have to be careful not to do an additional run of the macro where the DATASETS parameter was left blank and the SRCLIB parameter pointed to the SDTM library. This would cause all ADaM data sets to be checked, including those that reference another ADaM data set via SRCDOM, which would, in turn, cause an error if the referenced ADaM data did not exist in the SDTM library. So for our example, a second macro call would look like this:

%adamtrace(adamlib=adam, srclib=sdtm, datasets=ADAE|ADEF);

Because ADSL does not have the required traceability variables, it is not listed.

When you use slightly modified versions of the ADaM data created in Chapter 7, results of the %adamtrace macro (from the SAS data set it creates) appear in the following screenshot. An intentional error was added so that there would be a result to view. (However, it should be pointed out that the macro turned out to be quite helpful in identifying unintentional errors when the data sets for this book were first being developed.)

As you can see, results are fairly straightforward. There are columns for the following information:

● The unique subject ID (USUBJID)

● The applicable sequence variable (note that one column is added for each unique sequence variable)

● The analysis variables (AEDECOD and AVAL)

● The source variables

● The source domain (SRCDOM)

● Indicators as to whether a record exists in either the source or the ADaM data

● A code for the type of problem found (PROBTYPE)

● A description of the problem (PROBLEM)

Note that in the case of AE data, the analysis variable is also the source variable. In order to differentiate the two, AEDECOD in the source data (AE) is renamed to AAEDECOD.

There are only two types of problems that are pointed out. The codelist for PROBTYPE and the corresponding values of PROBLEM are as follows:

1 = ADaM record not found in source data

2 = ADaM analysis variable does not match source data

Where the true problem lies tends to involve some detective work. Experience with the macro has shown that adding traceability to the data is not just an added burden. Used in conjunction with an automated tool such as the %adamtrace macro, checking source values can help you uncover problems in your code.

Define.xml Validation with Pinnacle 21 Community

Although some may be content by simply developing a define.xml file that can render in a web browser, the unfortunate news is that getting your file to render is sometimes only half the battle. Ensuring that your file is compliant both with the ODM schema and the Define 2.0 specification is another matter, as is ensuring that your metadata are consistent. How important this type of validation is to a regulatory reviewer is not totally clear, since many of the findings from a Pinnacle 21 Community validation report pertain more to the nuts and bolts of the define file than to the actual data. There is however, always the potential to uncover metadata inconsistencies or missing elements that certainly could contribute to a less efficient regulatory review.

The following screenshot shows the entries for an ADaM define file validation. You will notice many similarities to an SDTM or ADaM data set validation and some differences. The choices for the configuration files are narrowed down to only two options: “Define” and “Define (PMDA).” The number of checks performed between these two options is the same. The primary difference is that some PMDA validation rules result in a “Reject” status rather than just an “Error” or “Warning” in the base Define. There is no need to explicitly indicate whether you are validating a define.xml file for SDTM or ADaM data. This is determined somewhat implicitly by indicating the CT Standard, which has two options, SDTM or ADaM, that in turn determine the CDISC CT versions that are available.

If all goes well, clicking the Validate button will generate a validation report that is very similar to those you have seen with SDTM and ADaM data set validation. If by chance your define file somehow causes the entire software to crash before a report can be generated, consider trying an older version of the software, such as the OpenCDISC Community Validator.

An example Issue Summary from an early version of the validation report is shown in the following screen shot. As you may be able to see from the issues, there are a few that have to do with XML metadata. Many of these are actually related to the fact that the configuration file used for validation does not yet support Analysis Results Metadata. These are good examples where one could either alter existing checks, create new ones, or de-activate some in the configuration file.

Other checks may be found to be more worthwhile and may even help you to discover inconsistencies between your data sets and metadata that earlier validation exercises did not reveal. Ensuring compliant ODM and a define.xml file that complies with the specifications can also help with your piece of mind prior to a regulatory submission.

Chapter Summary

Pinnacle 21 Community is a free, easy-to-use, open-source software application that provides an alternative to the validation tools provided by SAS. It produces thorough and user-friendly validation reports that can be used to check your SDTM, ADaM, and SEND data as well as your define.xml file. Since it is open-source it is, to some extent, customizable. With its command-line functionality, you can theoretically generate and validate both your data and metadata in one batch process.

A part of the validation process that is unique to ADaM is checking the accuracy of the traceability provided in ADaM data. For this task, the %adamtrace macro was introduced and its use demonstrated. Together with the other validation tools, this piece can be added to your ADaM validation process.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 9: CDISC Validation Using Pinnacle 21 Community

Create new playlist

Sign In

Sign Up

Chapter 9: CDISC Validation Using Pinnacle 21 Community

Getting Started with Pinnacle 21 Community

Running Pinnacle 21 Community (Graphical User Interface)

Evaluating the Report

Modifying the Configuration Files

A Note about Controlled Terminology

Running Pinnacle 21 Community (Command-Line Mode)

ADaM Validation with Pinnacle 21 Community

ADaM Traceability Checks with SAS

Define.xml Validation with Pinnacle 21 Community

Chapter Summary

Table of Contents for
Chapter 9: CDISC Validation Using Pinnacle 21 Community