🏠 Home
Author: Bhanoji Duppada | 2025-07-26 13:33:49

Creating Analysis-Ready Datasets (ADaM-style logic)

Introduction

Transitioning from Clinical SAS to R programming involves adapting to new tools and methodologies while retaining the core principles of clinical data analysis. One of the most critical tasks in clinical programming is creating Analysis-Ready Datasets (ADaM), which follow strict regulatory and standardization guidelines.

This chapter will guide you through replicating ADaM-style logic in R, covering data transformations, derivations, and integrity checks. By the end, you’ll be able to:
- Understand the parallels between SAS and R for ADaM dataset creation.
- Apply common ADaM derivations (e.g., ADSL, ADAE, ADLBC) in R.
- Implement metadata and traceability features.
- Validate datasets for regulatory compliance.

Let’s dive in!


Understanding ADaM-Style Datasets

ADaM (Analysis Data Model) datasets are designed to support statistical analysis with clear metadata, traceability, and standard variables. Key characteristics include:

Key ADaM Principles in R

Concept SAS Approach R Equivalent
Data Steps DATA step dplyr/data.table
Merge/Join PROC SQL/MERGE merge()/join()
Metadata PROC CONTENTS str()/attributes()
Variable Labels LABEL statement Hmisc::label()

Data Transformations in R

1. Creating ADSL (Subject-Level Dataset)

SAS Example:

DATA ADSL;  
  SET SDTM.DM;  
  /* Derive AGE */  
  AGE = floor((input(RFSTDTC, yymmdd10.) - input(BRTHDTC, yymmdd10.)) / 365.25);  
RUN;  

R Equivalent:

library(dplyr)  
library(lubridate)  

ADSL <- SDTM_DM %>%  
  mutate(  
    RFSTDTC = ymd(RFSTDTC),  
    BRTHDTC = ymd(BRTHDTC),  
    AGE = floor(as.numeric(RFSTDTC - BRTHDTC) / 365.25)  
  )  

2. Deriving Adverse Event Datasets (ADAE)

Common Tasks:
- Flagging treatment-emergent adverse events (TEAEs).
- Calculating duration.
- Adding severity grades.

R Code Example:

ADAE <- SDTM_AE %>%  
  left_join(ADSL %>% select(USUBJID, TRTSDT), by = "USUBJID") %>%  
  mutate(  
    AESTDT = ymd(AESTDTC),  
    TEAE_FL = ifelse(AESTDT >= TRTSDT, "Y", "N"),  
    AEDUR = as.numeric(AEENDT - AESTDT)  
  )  

3. Lab Data Transformations (ADLBC)

Key Steps:
- Calculate baseline values.
- Flag abnormal values (ANRIND).
- Derive shift tables.

Example:

ADLBC <- SDTM_LB %>%  
  group_by(USUBJID, LBTESTCD) %>%  
  mutate(  
    BASETYPE = ifelse(VISIT == "SCREENING", "BASELINE", NA),  
    BASE = ifelse(BASETYPE == "BASELINE", LBSTRESN, NA),  
    CHG = LBSTRESN - BASE  
  )  

Implementing Metadata and Traceability

Variable Labels and Formats

In SAS:

LABEL AGE = "Age (Years)";  

In R:

library(Hmisc)  
label(ADSL$AGE) <- "Age (Years)"  

Dataset-Level Metadata

Store metadata in attributes:

attr(ADSL, "creation_date") <- Sys.Date()  
attr(ADSL, "programmer") <- "Your Name"  

Validating ADaM Datasets

Common Checks in R

  1. Subject Consistency:
setdiff(ADSL$USUBJID, ADAE$USUBJID)  
  1. Missing Values:
sapply(ADSL, function(x) sum(is.na(x)))  
  1. Traceability Verification:
all(ADAE$USUBJID %in% ADSL$USUBJID)  

Key Takeaways

  1. ADaM Logic Translation:
  2. Use dplyr for data manipulation (equivalent to SAS DATA steps).
  3. Replace PROC SQL with merge() or join().

  4. Metadata Matters:

  5. Track variable labels, derivations, and dataset history.

  6. Validation is Critical:

  7. Replicate SAS checks in R with functions like setdiff(), is.na().

  8. Regulatory Readiness:

  9. Ensure datasets follow CDISC standards (e.g., variable naming, structure).

Next Steps

Practice converting a full ADaM dataset from SAS to R. Start with ADSL, then move to ADAE or ADLBC. Use real clinical data (anonymized) to build confidence!

In the next chapter, we’ll explore Statistical Outputs (Tables, Listings, and Graphs) in R.


This chapter equips you with foundational skills to create ADaM-style datasets in R. Keep experimenting, and soon the transition will feel seamless!