Date and time is collected in SDTM as character values using the
extended ISO 8601
format. For example, "2019-10-9T13:42:00". It allows that
some parts of the date or time are missing, e.g., "2019-10"
if the day and the time is unknown.
The ADaM timing variables like ADTM (Analysis Datetime)
or ADY (Analysis Relative Day) are numeric variables. They
can be derived only if the date or datetime is complete. Therefore
{admiral} provides imputation functions which fill in
missing date or time parts according to certain imputation rules.
In {admiral} we use only two functions derive_vars_dt()
and derive_vars_dtm() for date and datetime imputations
respectively. In all other functions where dates can be passed as an
argument, we expect full dates or datetimes (unless otherwise
specified), so if any possibility of partials then these functions
should be used as a first step to make the required imputation.
The examples of this vignette require the following packages.
library(admiral)
library(lubridate)
library(tibble)
library(dplyr, warn.conflicts = FALSE)In {admiral} we don’t allow users to pick any single part of the date/time to impute, we only enable to impute up to a highest level, i.e. you couldn’t choose to say impute months, but not days.
The simplest imputation rule is to set the missing parts to a fixed value. For example
impute_dtc_dtm(
  "2019-10",
  highest_imputation = "M",
  date_imputation = "01-01",
  time_imputation = "00:00:00"
)
#> [1] "2019-10-01T00:00:00"Sometimes this does not work as it would result in invalid dates, e.g.,
impute_dtc_dtm(
  "2019-02",
  highest_imputation = "M",
  date_imputation = "02-31",
  time_imputation = "00:00:00"
)
#> [1] "2019-02-31T00:00:00"Therefore the keywords "first" or "last"
can be specified to request that missing parts are replaced by the first
or last possible value:
impute_dtc_dtm(
  "2019-02",
  highest_imputation = "M",
  date_imputation = "last",
  time_imputation = "00:00:00"
)
#> [1] "2019-02-28T00:00:00"For dates, there is the additional option to use keyword
"mid" to impute missing day to 15 or missing
day and month to 06-30, but note the different behaviour
below depending on preserve argument for case when month
only is missing:
dates <- c(
  "2019-02",
  "2019",
  "2019---01"
)
impute_dtc_dtm(
  dates,
  highest_imputation = "M",
  date_imputation = "mid",
  time_imputation = "00:00:00",
  preserve = FALSE
)
#> [1] "2019-02-15T00:00:00" "2019-06-30T00:00:00" "2019-06-30T00:00:00"
impute_dtc_dtm(
  dates,
  highest_imputation = "M",
  date_imputation = "mid",
  time_imputation = "00:00:00",
  preserve = TRUE
)
#> [1] "2019-02-15T00:00:00" "2019-06-30T00:00:00" "2019-06-01T00:00:00"If you wanted to achieve a similar result by replacing any missing
part of the date with a fixed value 06-15, this is also
possible, but note the difference in days for cases when month is
missing:
dates <- c(
  "2019-02",
  "2019",
  "2019---01"
)
impute_dtc_dtm(
  dates,
  highest_imputation = "M",
  date_imputation = "06-15",
  time_imputation = "00:00:00"
)
#> [1] "2019-02-15T00:00:00" "2019-06-15T00:00:00" "2019-06-15T00:00:00"The imputation level, i.e., which components are imputed if they are
missing, is controlled by the highest_imputation argument.
All components up to the specified level are imputed.
dates <- c(
  "2019-02-03T12:30:15",
  "2019-02-03T12:30",
  "2019-02-03",
  "2019-02",
  "2019"
)
# Do not impute
impute_dtc_dtm(
  dates,
  highest_imputation = "n"
)
#> [1] "2019-02-03T12:30:15" NA                    NA                   
#> [4] NA                    NA
# Impute seconds only
impute_dtc_dtm(
  dates,
  highest_imputation = "s"
)
#> [1] "2019-02-03T12:30:15" "2019-02-03T12:30:00" NA                   
#> [4] NA                    NA
# Impute time (hours, minutes, seconds) only
impute_dtc_dtm(
  dates,
  highest_imputation = "h"
)
#> [1] "2019-02-03T12:30:15" "2019-02-03T12:30:00" "2019-02-03T00:00:00"
#> [4] NA                    NA
# Impute days and time
impute_dtc_dtm(
  dates,
  highest_imputation = "D"
)
#> [1] "2019-02-03T12:30:15" "2019-02-03T12:30:00" "2019-02-03T00:00:00"
#> [4] "2019-02-01T00:00:00" NA
# Impute date (months and days) and time
impute_dtc_dtm(
  dates,
  highest_imputation = "M"
)
#> [1] "2019-02-03T12:30:15" "2019-02-03T12:30:00" "2019-02-03T00:00:00"
#> [4] "2019-02-01T00:00:00" "2019-01-01T00:00:00"For imputation of years (highest_imputation = "Y") see
next section.
In some scenarios the imputed date should not be before or after
certain dates. For example an imputed date after data cut off date or
death date is not desirable. The {admiral} imputation
functions provide the min_dates and max_dates
argument to specify those dates. For example:
impute_dtc_dtm(
  "2019-02",
  highest_imputation = "M",
  date_imputation = "last",
  time_imputation = "last",
  max_dates = list(ymd("2019-01-14"), ymd("2019-02-25"))
)
#> [1] "2019-02-25T23:59:59"It is ensured that the imputed date is not after any of the specified dates. Only dates which are in the range of possible dates of the dtc value are considered. The possible dates are defined by the missing parts of the dtc date, i.e., for “2019-02” the possible dates range from “2019-02-01” to “2019-02-28”. Thus “2019-01-14” is ignored. This ensures that the non-missing parts of the dtc date are not changed.
If the min_dates or max_dates argument is
specified, it is also possible to impute completely missing dates. For
date_imputation = "first" the min_dates
argument must be specified and for date_imputation = "last"
the max_dates argument. For other imputation rules imputing
the year is not possible.
# Impute year to first
impute_dtc_dtm(
  c("2019-02", NA),
  highest_imputation = "Y",
  min_dates = list(ymd("2019-01-14"), ymd("2019-02-25"))
)
#> [1] "2019-02-25T00:00:00" "2019-02-25T00:00:00"
# Impute year to last
impute_dtc_dtm(
  c("2019-02", NA),
  highest_imputation = "Y",
  date_imputation = "last",
  time_imputation = "last",
  max_dates = list(ymd("2019-01-14"), ymd("2019-02-25"))
)
#> [1] "2019-02-25T23:59:59" "2019-01-14T23:59:59"ADaM requires that date or datetime variables for which imputation
was used are accompanied by date and/or time imputation flag variables
(*DTF and *TMF, e.g., ADTF and
ATMF for ADTM). These variables indicate the
highest level that was imputed, e.g., if minutes and seconds were
imputed, the imputation flag is set to "M". The
{admiral} functions which derive imputed variables are also
adding the corresponding imputation flag variables.
Note: The {admiral} datetime imputation function
provides the ignore_seconds_flag argument which can be set
to TRUE in cases where seconds were never collected. This
is due to the following from ADaM IG: For a given SDTM DTC variable, if
only hours and minutes are ever collected, and seconds are imputed in
*DTM as 00, then it is not necessary to set
*TMF to "S".
{admiral} provides the following functions for
imputation:
derive_vars_dt(): Adds a date variable and a date
imputation flag variable (optional) based on a –DTC variable and
imputation rules.derive_vars_dtm(): Adds a datetime variable, a date
imputation flag variable, and a time imputation flag variable (both
optional) based on a –DTC variable and imputation rules.impute_dtc_dtm(): Returns a complete ISO 8601 datetime
or NA based on a partial ISO 8601 datetime and imputation
rules.impute_dtc_dt(): Returns a complete ISO 8601 date
(without time) or NA based on a partial ISO 8601 date(time)
and imputation rules.convert_dtc_to_dt(): Returns a date if the input ISO
8601 date is complete. Otherwise, NA is returned.convert_dtc_to_dtm(): Returns a datetime if the input
ISO 8601 date is complete (with missing time replaced by
"00:00:00" as default). Otherwise, NA is returned.compute_dtf(): Returns the date imputation flag.compute_tmf(): Returns the time imputation flag.The derive_vars_dtm() function derives an imputed
datetime variable and the corresponding date and time imputation flags.
The imputed date variable can be derived by using the
derive_vars_dtm_to_dt() function. It is not necessary and
advisable to perform the imputation for the date variable if it was
already done for the datetime variable. CDISC considers the datetime and
the date variable as two representations of the same date. Thus the
imputation must be the same and the imputation flags are valid for both
the datetime and the date variable.
ae <- tribble(
  ~AESTDTC,
  "2019-08-09T12:34:56",
  "2019-04-12",
  "2010-09",
  NA_character_
) %>%
  derive_vars_dtm(
    dtc = AESTDTC,
    new_vars_prefix = "AST",
    highest_imputation = "M",
    date_imputation = "first",
    time_imputation = "first"
  ) %>%
  derive_vars_dtm_to_dt(vars(ASTDTM))| AESTDTC | ASTDTM | ASTDTF | ASTTMF | ASTDT | 
|---|---|---|---|---|
| 2019-08-09T12:34:56 | 2019-08-09 12:34:56 | NA | NA | 2019-08-09 | 
| 2019-04-12 | 2019-04-12 00:00:00 | NA | H | 2019-04-12 | 
| 2010-09 | 2010-09-01 00:00:00 | D | H | 2010-09-01 | 
| NA | NA | NA | NA | NA | 
If an imputed date variable without a corresponding datetime variable
is required, it can be derived by the derive_vars_dt()
function.
ae <- tribble(
  ~AESTDTC,
  "2019-08-09T12:34:56",
  "2019-04-12",
  "2010-09",
  NA_character_
) %>%
  derive_vars_dt(
    dtc = AESTDTC,
    new_vars_prefix = "AST",
    highest_imputation = "M",
    date_imputation = "first"
  )| AESTDTC | ASTDT | ASTDTF | 
|---|---|---|
| 2019-08-09T12:34:56 | 2019-08-09 | NA | 
| 2019-04-12 | 2019-04-12 | NA | 
| 2010-09 | 2010-09-01 | D | 
| NA | NA | NA | 
If the time should be imputed but not the date, the
highest_imputation argument should be set to
"h". This results in NA if the date is
partial. As no date is imputed the date imputation flag is not
created.
ae <- tribble(
  ~AESTDTC,
  "2019-08-09T12:34:56",
  "2019-04-12",
  "2010-09",
  NA_character_
) %>%
  derive_vars_dtm(
    dtc = AESTDTC,
    new_vars_prefix = "AST",
    highest_imputation = "h",
    time_imputation = "first"
  )| AESTDTC | ASTDTM | ASTTMF | 
|---|---|---|
| 2019-08-09T12:34:56 | 2019-08-09 12:34:56 | NA | 
| 2019-04-12 | 2019-04-12 00:00:00 | H | 
| 2010-09 | NA | NA | 
| NA | NA | NA | 
Usually the adverse event start date is imputed as the earliest date
of all possible dates when filling the missing parts. The result may be
a date before treatment start date. This is not desirable because the
adverse event would not be considered as treatment emergent and excluded
from the adverse event summaries. This can be avoided by specifying the
treatment start date variable (TRTSDTM) for the
min_dates argument.
Please note that TRTSDTM is used as imputed date only if
the non missing date and time parts of AESTDTC coincide
with those of TRTSDTM. Therefore 2019-10 is
not imputed as 2019-11-11 12:34:56. This ensures that
collected information is not changed by the imputation.
ae <- tribble(
  ~AESTDTC,              ~TRTSDTM,
  "2019-08-09T12:34:56", ymd_hms("2019-11-11T12:34:56"),
  "2019-10",             ymd_hms("2019-11-11T12:34:56"),
  "2019-11",             ymd_hms("2019-11-11T12:34:56"),
  "2019-12-04",          ymd_hms("2019-11-11T12:34:56")
) %>%
  derive_vars_dtm(
    dtc = AESTDTC,
    new_vars_prefix = "AST",
    highest_imputation = "M",
    date_imputation = "first",
    time_imputation = "first",
    min_dates = vars(TRTSDTM)
  )| AESTDTC | TRTSDTM | ASTDTM | ASTDTF | ASTTMF | 
|---|---|---|---|---|
| 2019-08-09T12:34:56 | 2019-11-11 12:34:56 | 2019-08-09 12:34:56 | NA | NA | 
| 2019-10 | 2019-11-11 12:34:56 | 2019-10-01 00:00:00 | D | H | 
| 2019-11 | 2019-11-11 12:34:56 | 2019-11-11 12:34:56 | D | H | 
| 2019-12-04 | 2019-11-11 12:34:56 | 2019-12-04 00:00:00 | NA | H | 
If a date is imputed as the latest date of all possible dates when
filling the missing parts, it should not result in dates after data cut
off or death. This can be achieved by specifying the dates for the
max_dates argument.
Please note that non missing date parts are not changed. Thus
2019-12-04 is imputed as 2019-12-04 23:59:59
although it is after the data cut off date. It may make sense to replace
it by the data cut off date but this is not part of the imputation. It
should be done in a separate data cleaning or data cut off step.
ae <- tribble(
  ~AEENDTC,              ~DTHDT,            ~DCUTDT,
  "2019-08-09T12:34:56", ymd("2019-11-11"), ymd("2019-12-02"),
  "2019-11",             ymd("2019-11-11"), ymd("2019-12-02"),
  "2019-12",             NA,                ymd("2019-12-02"),
  "2019-12-04",          NA,                ymd("2019-12-02")
) %>%
  derive_vars_dtm(
    dtc = AEENDTC,
    new_vars_prefix = "AEN",
    highest_imputation = "M",
    date_imputation = "last",
    time_imputation = "last",
    max_dates = vars(DTHDT, DCUTDT)
  )| AEENDTC | DTHDT | DCUTDT | AENDTM | AENDTF | AENTMF | 
|---|---|---|---|---|---|
| 2019-08-09T12:34:56 | 2019-11-11 | 2019-12-02 | 2019-08-09 12:34:56 | NA | NA | 
| 2019-11 | 2019-11-11 | 2019-12-02 | 2019-11-11 23:59:59 | D | H | 
| 2019-12 | NA | 2019-12-02 | 2019-12-02 23:59:59 | D | H | 
| 2019-12-04 | NA | 2019-12-02 | 2019-12-04 23:59:59 | NA | H | 
If imputation is required without creating a new variable the
convert_dtc_to_dt() function can be called to obtain a
vector of imputed dates. It can be used for example in conditions:
mh <- tribble(
  ~MHSTDTC,     ~TRTSDT,
  "2019-04",    ymd("2019-04-15"),
  "2019-04-01", ymd("2019-04-15"),
  "2019-05",    ymd("2019-04-15"),
  "2019-06-21", ymd("2019-04-15")
) %>%
  filter(
    convert_dtc_to_dt(
      MHSTDTC,
      highest_imputation = "M",
      date_imputation = "first"
    ) < TRTSDT
  )| MHSTDTC | TRTSDT | 
|---|---|
| 2019-04 | 2019-04-15 | 
| 2019-04-01 | 2019-04-15 | 
Using different imputation rules depending on the observation can be
done by using slice_derivation().
vs <- tribble(
  ~VSDTC,                ~VSTPT,
  "2019-08-09T12:34:56", NA,
  "2019-10-12",          "PRE-DOSE",
  "2019-11-10",          NA,
  "2019-12-04",          NA
) %>%
  slice_derivation(
    derivation = derive_vars_dtm,
    args = params(
      dtc = VSDTC,
      new_vars_prefix = "A"
    ),
    derivation_slice(
      filter = VSTPT == "PRE-DOSE",
      args = params(time_imputation = "first")
    ),
    derivation_slice(
      filter = TRUE,
      args = params(time_imputation = "last")
    )
  )| VSDTC | VSTPT | ADTM | ATMF | 
|---|---|---|---|
| 2019-08-09T12:34:56 | NA | 2019-08-09 12:34:56 | NA | 
| 2019-11-10 | NA | 2019-11-10 23:59:59 | H | 
| 2019-12-04 | NA | 2019-12-04 23:59:59 | H | 
| 2019-10-12 | PRE-DOSE | 2019-10-12 00:00:00 | H |