Type: Package
Title: Open Source Diabetes Classifier for Danish Registers
Version: 0.9.17
Description: The algorithm first identifies a population of individuals from Danish register data with any type of diabetes as individuals with two or more inclusion events. Then, it splits this population into individuals with either type 1 diabetes or type 2 diabetes by identifying individuals with type 1 diabetes and classifying the remainder of the diabetes population as having type 2 diabetes.
License: MIT + file LICENSE
URL: https://github.com/steno-aarhus/osdc, https://steno-aarhus.github.io/osdc/
BugReports: https://github.com/steno-aarhus/osdc/issues
Depends: R (≥ 4.2.0)
Imports: checkmate, cli, codeCollection, dbplyr, dplyr, duckplyr, fabricatr, lifecycle, lubridate, purrr, rlang, rvest, stats, tidyselect, utils
Suggests: glue, knitr, quarto, rmarkdown, spelling, stringr, testthat (≥ 3.0.0), tidyr, tibble
VignetteBuilder: quarto
Config/testthat/edition: 3
Encoding: UTF-8
Language: en-US
RoxygenNote: 7.3.3
NeedsCompilation: no
Packaged: 2025-12-04 21:14:55 UTC; luke
Author: Signe Kirk Brødbæk ORCID iD [aut], Anders Aasted Isaksen ORCID iD [aut], Luke William Johnston ORCID iD [aut, cre], Steno Diabetes Center Aarhus [cph], Aarhus University [cph]
Maintainer: Luke William Johnston <lwjohnst@gmail.com>
Repository: CRAN
Date/Publication: 2025-12-10 21:20:02 UTC

osdc: Open Source Diabetes Classifier for Danish Registers

Description

logo

The algorithm first identifies a population of individuals from Danish register data with any type of diabetes as individuals with two or more inclusion events. Then, it splits this population into individuals with either type 1 diabetes or type 2 diabetes by identifying individuals with type 1 diabetes and classifying the remainder of the diabetes population as having type 2 diabetes.

Author(s)

Maintainer: Luke William Johnston lwjohnst@gmail.com (ORCID)

Authors:

Other contributors:

See Also

Useful links:


Add columns for information about insulin drug purchases

Description

Add columns for information about insulin drug purchases

Usage

add_insulin_purchases_cols(gld_hba1c_after_drop_steps)

Arguments

gld_hba1c_after_drop_steps

The GLD and HbA1c data after drop steps

Value

The same type as the input data, as a duckplyr::duckdb_tibble(). Three new columns are added:

See Also

See the vignette("algorithm") for the logic used to filter these patients.


Add columns related to type 1 diabetes diagnoses

Description

This function evaluates whether an individual has a majority of type 1 diabetes-specific hospital diagnoses (DE10) among all type-specific diabetes primary diagnoses (DE10 & DE11) from endocrinology departments. If an individual doesn't have any type-specific diabetes diagnoses from endocrinology departments, the majority is determined by diagnoses from medical departments.

It also adds a column indicating whether an individual has at least one primary diagnosis related to type 1 diabetes.

This output is passed to the join_inclusions() function, where the dates variable is used for the final step of the inclusion process. The variables for whether the majority of diagnoses are for type 1 diabetes is used for later classification of type 1 diabetes.

Usage

add_t1d_diagnoses_cols(data)

Arguments

data

Data from keep_diabetes_diagnoses() function.

Value

The same type as the input data, as a duckplyr::duckdb_tibble(), with the following added columns and up to two rows per individual:

See Also

See the vignette("algorithm") for the logic used to filter these patients.


A list of the algorithmic logic underlying osdc.

Description

This nested list contains the logic details of the algorithm.

Usage

algorithm()

Format

Is a list with nested lists that have these named elements:

register

Optional. The register used for this logic

title

The title to use when displaying the logic in tables.

logic

The logic itself.

comments

Some additional comments on the logic.

Value

A nested list with the algorithmic logic. Contains fields register, title, logic, and comments.

See Also

See the vignette("algorithm") for the logic used to filter these patients.

Examples

algorithm()$is_hba1c_over_threshold
algorithm()$is_gld_code$logic

Translate to SQL for datetime conversion to eventually date

Description

DuckDB doesn't support using lubridate::as_date(), so this uses dbplyr::sql() to directly use DuckDB's strptime to convert strings to datetimes. Afterwards, it can be converted to dates.

Usage

as_sql_datetime(x)

Arguments

x

A character (or date) column, in quotes.

Value

A Datetime column.


Check data types of the register variables

Description

Check data types of the register variables

Usage

check_data_types(data, register, call = rlang::caller_env())

Value

Outputs the register with only the required variables, and with column names in lower case.


Classify diabetes status using Danish registers.

Description

This function requires that each source of register data is represented as a single DuckDB object in R (e.g. a connection to Parquet files). Each DuckDB object must contain a single table covering all years of that data source, or at least the years you have and are interested in.

Usage

classify_diabetes(
  kontakter,
  diagnoser,
  lpr_diag,
  lpr_adm,
  sysi,
  sssy,
  lab_forsker,
  bef,
  lmdb,
  stable_inclusion_start_date = "1998-01-01"
)

Arguments

kontakter

The contacts information table from the LPR3 patient register

diagnoser

The diagnoses information table from the LPR3 patient register

lpr_diag

The diagnoses information table from the LPR2 patient register

lpr_adm

The administrative information table from the LPR2 patient register

sysi

The SYSI table from the health service register

sssy

The SSSY table from the health service register

lab_forsker

The register for laboratory results for research

bef

The BEF table from the civil register

lmdb

The LMDB table from the prescription register

stable_inclusion_start_date

Cutoff date after which inclusion events are considered true incident diabetes cases. Defaults to "1998-01-01", which is one year after the data on pregnancy events from the Patient Register are considered valid for dropping gestational diabetes-related purchases of glucose-lowering drugs. This default assumes that the user is using LPR and LMDB data from at least Jan 1 1997 onward. If the user only has access to LPR and LMDB data from a later date, this parameter should be set to one year after the beginning of the user's data coverage.

Value

The same object type as the input data, which would be a duckplyr::duckdb_tibble() type object.

See Also

See the osdc vignette for a detailed description of the internal implementation of this classification function.

Examples

# Can't run this multiple times, will cause an error as the table
# has already been created in the DuckDB connection.
register_data <- registers() |>
  names() |>
  simulate_registers() |>
  purrr::map(duckplyr::as_duckdb_tibble) |>
  purrr::map(duckplyr::as_tbl)

classify_diabetes(
  kontakter = register_data$kontakter,
  diagnoser = register_data$diagnoser,
  lpr_diag = register_data$lpr_diag,
  lpr_adm = register_data$lpr_adm,
  sysi = register_data$sysi,
  sssy = register_data$sssy,
  lab_forsker = register_data$lab_forsker,
  bef = register_data$bef,
  lmdb = register_data$lmdb
)

After filtering, classify those with type 1 diabetes.

Description

After filtering, classify those with type 1 diabetes.

Usage

classify_t1d(data)

Arguments

data

Joined data output from the filtering steps.

Value

The same object type as the input data, which would be a duckplyr::duckdb_tibble() type object.


Convert column names to lower case

Description

Convert column names to lower case

Usage

column_names_to_lower(data)

Arguments

data

An data frame type object.

Value

The same object type given.


Create a vector with random ATC codes

Description

Anatomical Therapeutic Chemical (ATC) codes are unique medicine codes based on on what organ or system it works on and how it works.

Usage

create_fake_atc(n)

Arguments

n

The number of random ATC codes to generate.

Value

A character vector of ATC codes.


Create fake dates

Description

Create fake dates

Usage

create_fake_date(n, from = "1977-01-01", to = lubridate::today())

Arguments

n

The number of dates to generate.

from

A date determining the first date in the interval to sample from.

to

A date determining the last date in the interval to sample from.

Value

A vector of dates.


Create a vector of random department specialties

Description

Create a vector of random department specialties

Usage

create_fake_hovedspeciale_ans(n)

Arguments

n

The number of department specialties to create.

Value

A character vector.


Create a vector with random ICD-8 or -10 diagnoses

Description

Create a vector with random ICD-8 or -10 diagnoses

Usage

create_fake_icd(n, date = NULL)

Arguments

n

The number of ICD-8 or -10 diagnoses to generate.

date

A date determining whether the diagnoses should be ICD-8 or ICD-10. If null, a random date will be sampled to determine which ICD revision the diagnosis should be from. In the Danish registers, ICD-10 is used after 1994.

Value

A character vector of ICD-10 diagnoses.


Create a vector of random ICD-10 diagnoses

Description

ICD-10 is the 10th revision of the International Classification of Diseases.

Usage

create_fake_icd10(n)

Arguments

n

An integer determining how many diagnoses to create.

Value

A character vector of ICD-10 diagnoses.

Source

The stored CSV is downloaded from the Danish Health Data Authority's website at medinfo.dk


Create a vector of random ICD-8 diagnoses

Description

ICD-8 is the 8th revision of the International Classification of Diseases.

Usage

create_fake_icd8(n)

Arguments

n

The number of ICD-8 diagnoses to generate.

Value

A character vector of ICD-8 diagnoses.


Create a vector of random NPU codes

Description

Nomenclature for Properties and Units (NPUs) are codes that identifies laboratory results.

Usage

create_fake_npu(n)

Arguments

n

The number of NPUs to create.

Value

A character vector.


Create inclusion dates from all the inclusion events

Description

This function takes the output from join_inclusions() and defines the final inclusion dates, raw and stable based on all inclusion event types. Since inclusion requires at least two events (can be multiple events of the same type or any combination of different types), this function keeps only those with 2 or more events. E.g., an individual with two elevated HbA1c tests followed by a glucose-lowering drug purchase is included at the latest elevated HbA1c test. Had the second HbA1c test not been performed (or had it returned a result below the diagnostic threshold), this person would instead have been included at the date of the first purchase of glucose-lowering drugs.

Usage

create_inclusion_dates(inclusions, stable_inclusion_start_date = "1998-01-01")

Arguments

inclusions

Output from join_inclusions().

stable_inclusion_start_date

Cutoff date after which inclusion events are considered reliable (e.g., after changes in drug labeling or data entry practices). Defaults to "1998-01-01" which is one year after obstetric codes are reliable in the GLD data (since we use LPR data to drop rows related to gestational diabetes). This limits the included cohort to individuals with inclusion dates after this cutoff date.

Value

The same type as the input data, as a duckplyr::duckdb_tibble(), with the pnr and date columns along with the columns from the input that's needed to classify T1D. It also creates two new columns:


Create a vector of reproducible, random zero-padded integers.

Description

For a given number of generated integers that are the same length, they will always be identical. This makes it easier to do joining by values that represent people, e.g. in pnr, cpr, recnum and dw_ek_kontakt.

Usage

create_padded_integer(n, length)

Arguments

n

The number of integer strings to generate.

length

The length of the padded integer strings.

Value

A character vector of integers.


Simulate data based on simulation definitions

Description

Simulate data based on simulation definitions

Usage

create_simulated_data(data, n)

Arguments

data

A tibble with simulation definitions.

n

Number of observations to simulate.

Value

A tibble with simulated data.


Drop rows with metformin purchases for the treatment of PCOS

Description

Takes the output from keep_gld_purchases() and bef (information on sex and date of birth) to drop rows with metformin purchases that are potentially for the treatment of polycystic ovary syndrome. This function only performs a filtering operation so it outputs the same structure and variables as the input from keep_gld_purchases(), except the addition of a logical helper variable no_pcos that is used in later functions. After these rows have been dropped, the output is used by drop_pregnancies().

Usage

drop_pcos(gld_purchases, bef)

Arguments

gld_purchases

The output from keep_gld_purchases().

bef

The bef register.

Value

The same type as the input data, as a duckplyr::duckdb_tibble(). It also has the same columns as keep_gld_purchases(), except for a logical helper variable no_pcos that is used in later functions.

See Also

See the vignette("algorithm") for the logic used to filter these patients.


Drop pregnancy events that could be gestational diabetes

Description

This function takes the combined outputs from keep_pregnancy_dates(), keep_hba1c(), and drop_pcos() and uses diagnoses from LPR2 or LPR3 to drop both elevated HbA1c tests and GLD purchases during pregnancy, as these may be due to gestational diabetes, rather than type 1 or type 2 diabetes. The aim is to identify pregnancies based on diagnosis codes specific to pregnancy-ending events (e.g. live births or miscarriages), and then use the dates of these events to remove inclusion events in the preceding months that may be related to gestational diabetes (e.g. elevated HbA1c tests or purchases of glucose-lowering drugs during pregnancy).

After these drop functions have been applied, the output serves as input to the add_insulin_purchases_cols() function.

Usage

drop_pregnancies(dropped_pcos, pregnancy_dates, included_hba1c)

Arguments

dropped_pcos

Output from drop_pcos().

pregnancy_dates

Output from keep_pregnancy_dates().

included_hba1c

Output from keep_hba1c().

Value

The same type as the input data, as a duckplyr::duckdb_tibble(). Has the same output data as the input drop_pcos(), except for a helper logical variable no_pregnancy that is used in later functions.

See Also

See the vignette("algorithm") for the logic used to filter these patients.


Create a synthetic dataset of edge case inputs

Description

This function generates a list of tibbles representing the Danish health registers and the data necessary to run the algorithm. The dataset contains 23 individual cases (pnrs), each designed to test a specific logical branch of the diabetes classification algorithm, including inclusion, exclusion, censoring, and type classification rules.

The generated data is used in testthat tests to ensure the algorithm behaves as expected under a wide range of conditions, but it is also intended to be explored by users to better understand how the algorithm logic works.

Usage

edge_cases()

Value

A named list of 9 tibble::tibble() objects, each representing a different health register: bef, lmdb, lpr_adm, lpr_diag, kontakter, diagnoser, sysi, sssy, and lab_forsker.

Examples

edge_cases()

Convert all factor variables to character variables.

Description

Convert all factor variables to character variables.

Usage

fct_to_chr(data)

Arguments

data

A tibble or data frame.

Value

A duckplyr::duckdb_tibble().


Get the algorithmic logic and convert to an R logic condition.

Description

Get the algorithmic logic and convert to an R logic condition.

Usage

get_algorithm_logic(logic_name, algorithm = NULL)

Arguments

logic_name

The name of the logic to use.

algorithm

The list of algorithmic logic, one list for each.

Value

A character string.


Get a list of the registers' abbreviations.

Description

Get a list of the registers' abbreviations.

Usage

get_register_abbrev()

Value

A character string.


Get a list of required variables from a specific register.

Description

Get a list of required variables from a specific register.

Usage

get_required_variables(register)

Arguments

register

The abbreviation of the register name. See list of abbreviations in get_register_abbrev().

Value

A character vector of variable names.


Insert additional analysis codes for HbA1c

Description

Insert additional analysis codes for HbA1c

Usage

insert_analysiscode(data, proportion = 0.3)

Arguments

data

A tibble.

proportion

Proportion to re-sample. Defaults to 0.3.

Value

A tibble. If a column is named analysiscode, a proportion of the values are replaced by codes for HbA1c.


Insert cases where metformin is used for other purposes than diabetes

Description

This function uses the variable indo which is the code for the underlying condition treated by the prescribed medication.

Usage

insert_false_metformin(data, proportion = 0.05)

Arguments

data

A tibble.

proportion

Proportion to re-sample. Defaults to 0.05.

Value

A tibble. If all column names in the tibble is either atc, a proportion of observations is re-sampled as metformin.


Insert specific ATC codes based on a proportion

Description

Insert specific ATC codes based on a proportion

Usage

insert_specific_atc(data, proportion = 0.3)

Arguments

data

A tibble.

proportion

Proportion to be resampled. Defaults to 0.3.

Value

A tibble with a proportion of resampled ATC codes for columns named 'atc'


Generate logic based on a probability

Description

Generate logic based on a probability

Usage

insertion_rate(proportion)

Arguments

proportion

A double between 0 and 1.

Value

A logic vector. TRUE if the random number is less than the proportion, otherwise FALSE.


Join kept inclusion events.

Description

This function joins the outputs from all the filtering functions, by pnr and dates. Input datasets:

Usage

join_inclusions(
  diabetes_diagnoses,
  podiatrist_services,
  gld_hba1c_after_drop_steps
)

Arguments

diabetes_diagnoses

Output from keep_diabetes_diagnoses().

podiatrist_services

Output from keep_podiatrist_services().

gld_hba1c_after_drop_steps

Output from drop_pregnancies() and drop_pcos().

Value

The same type as the input data, as a duckplyr::duckdb_tibble(), with the joined columns from the output of keep_diabetes_diagnoses(), keep_podiatrist_services(), drop_pcos(), and drop_pregnancies(). There will be 1-8 rows per pnr.

See Also

See the vignette("algorithm") for the logic used to filter these patients.


Keep rows with diabetes diagnoses.

Description

This function uses the hospital contacts from LPR2 and LPR3 to include all dates of diabetes diagnoses to use for inclusion, as well as additional information needed to classify diabetes type. Diabetes diagnoses from both ICD-8 and ICD-10 are included.

Usage

keep_diabetes_diagnoses(lpr2, lpr3)

Arguments

lpr2

The output from prepare_lpr2().

lpr3

The output from prepare_lpr3().

Value

The same type as the input data, as a duckplyr::duckdb_tibble(), with less rows after filtering.

See Also

See the vignette("algorithm") for the logic used to filter these patients.


Keep rows with purchases of glucose lowering drugs (GLD)

Description

This function doesn't keep glucose-lowering drugs that may be used for other conditions than diabetes like GLP-RAs or dapagliflozin/empagliflozin drugs. Since the diagnosis code data on pregnancies (see below) is insufficient to perform censoring prior to 1997, keep_gld_purchases() only extracts dates from 1997 onward by default (if Medical Birth Register data is available to use for censoring, the extraction window can be extended).

Usage

keep_gld_purchases(lmdb)

Arguments

lmdb

The lmdb register.

Value

The same type as the input data, as a duckplyr::duckdb_tibble(). Only rows with glucose lowering drug purchases are kept, plus some columns are renamed.

See Also

See the vignette("algorithm") for the logic used to filter these patients.


Keep rows with HbA1c above the required threshold.

Description

In the lab_forsker register, NPU27300 is HbA1c in the modern units (IFCC) while NPU03835 is HbA1c in old units (DCCT). Multiple elevated results on the same day within each individual are deduplicated, to account for the same test result often being reported twice (one for IFCC, one for DCCT units).

Usage

keep_hba1c(lab_forsker)

Arguments

lab_forsker

The lab_forsker register.

Details

The output is passed to the drop_pregnancies() function for filtering of elevated results due to potential gestational diabetes (see below).

Value

An object of the same input type, as a duckplyr::duckdb_tibble(), with three columns:


Keep rows with diabetes-specific podiatrist services.

Description

This function uses the sysi or sssy registers as input to extract the dates of all diabetes-specific podiatrist services. Removes duplicate services on the same date.

The output is passed to the join_inclusions() function for the final step of the inclusion process.

Usage

keep_podiatrist_services(sysi, sssy)

Arguments

sysi

The SYSI register.

sssy

The SSSY register.

Value

The same type as the input data, as a duckplyr::duckdb_tibble().

See Also

See the vignette("algorithm") for the logic used to filter these patients.


Simple function to get only the pregnancy event dates.

Description

Simple function to get only the pregnancy event dates.

Usage

keep_pregnancy_dates(lpr2, lpr3)

Arguments

lpr2

Output from prepare_lpr2().

lpr3

Output from prepare_lpr3().

Value

The same type as the input data, as a duckplyr::duckdb_tibble().

See Also

See the vignette("algorithm") for the logic used to filter these patients.


Keep two earliest events per PNR

Description

Since the classification date is based on the second instance of an inclusion criteria, we need to keep the earliest two events per PNR per inclusion "stream".

This function is applied to each "stream", diabetes_diagnoses, podiatrist_services, and gld_hba1c_after_drop_steps, in the classify_diabetes() function after the keep and drop steps, right before they are joined.

Usage

keep_two_earliest_events(data)

Arguments

data

Data including at least a date and pnr column.

Value

The same type as the input data.


Parse the logic strings into R expressions

Description

Parse the logic strings into R expressions

Usage

logic_as_expression(logic)

Arguments

logic

The name of the logic to use.

Value

An R expression.


List of non-cases to test the diabetes classification algorithm

Description

This function generates a list of tibbles representing the Danish health registers and the data necessary to run the algorithm. The dataset contains individuals who should not be included in the final classified cohort.

Usage

non_cases()

Details

The generated data is used in testthat tests to ensure the algorithm behaves as expected under a wide range of conditions, but it is also intended to be explored by users to better understand how the algorithm logic works and to be shown in the documentation.

Value

A named list of 9 tibble::tibble() objects, each representing a different health register: bef, lmdb, lpr_adm, lpr_diag, kontakter, diagnoser, sysi, sssy, and lab_forsker.

Examples

non_cases()

Description of the different non-cases included in non_cases()

Description

All cases, aside from what would exclude them from being classified as described in the metadata here, would otherwise be classified as having diabetes.

Usage

non_cases_metadata()

Value

A named list of character strings, where each name corresponds to a non-case PNR in the dataset generated by non_cases().

Examples

non_cases_metadata()

Zero pad an integer to a specific length

Description

Zero pad an integer to a specific length

Usage

pad_integers(x, width)

Arguments

x

An integer or vector of integers.

width

An integer describing the final width of the zero-padded integer.

Value

A character vector of integers.


Prepare and join the two LPR2 registers to extract diabetes and pregnancy diagnoses.

Description

The output is used as inputs to keep_diabetes_diagnoses() and to keep_pregnancy_dates().

Usage

prepare_lpr2(lpr_adm, lpr_diag)

Arguments

lpr_adm

The LPR2 register containing hospital admissions.

lpr_diag

The LPR2 register containing diabetes diagnoses.

Value

The same type as the input data, as a duckplyr::duckdb_tibble(), with the following columns:

See Also

See the vignette("algorithm") for the logic used to filter these patients.


Prepare and join the two LPR3 registers to extract diabetes and pregnancy diagnoses.

Description

The output is used as inputs to keep_diabetes_diagnoses() and to keep_pregnancy_dates().

Usage

prepare_lpr3(kontakter, diagnoser)

Arguments

kontakter

The LPR3 register containing hospital contacts/admissions.

diagnoser

The LPR3 register containing diabetes diagnoses.

Value

The same type as the input data, as a duckplyr::duckdb_tibble(), with the following columns:

See Also

See the vignette("algorithm") for the logic used to filter these patients.


Register variables (with descriptions) required for the osdc algorithm.

Description

Register variables (with descriptions) required for the osdc algorithm.

Usage

registers()

Value

Outputs a list of registers and variables required by osdc. Each list item contains the official Danish name of the register, the start year, the end year, and the variables with their descriptions. The variables item is a data frame with 4 columns:

name

The official name of the variable found in the register.

danish_description

The official Danish description of the variable.

english_description

The translated English description of the variable.

data_type

The data type, e.g. "character" of the variable. Could have multiple options (e.g. "Date" or "character").

Source

Many of the details within the registers() metadata come from the full official list of registers from Statistics Denmark (DST): https://www.dst.dk/extranet/forskningvariabellister/Oversigt%20over%20registre.html

Examples

registers()

Select the required variables from the register

Description

This function selects only the required variables, convert to lower case, and then check that the data types are as expected.

Usage

select_required_variables(data, register, call = rlang::caller_env())

Arguments

data

The register to select columns from.

register

The abbreviation of the register name. See list of abbreviations in get_register_abbrev().

call

The environment where the function is called, so that the error traceback gives a more meaningful location.

Value

Outputs the register with only the required variables, and with column names in lower case.


Simulate a fake data frame of one or more Danish registers

Description

Simulate a fake data frame of one or more Danish registers

Usage

simulate_registers(registers, n = 1000)

Arguments

registers

The name of the register you want to simulate.

n

The number of rows to simulate for the resulting register.

Value

A list with simulated register data, as a tibble::tibble().

Examples

simulate_registers(c("bef", "sysi"))
simulate_registers("bef")
simulate_registers("diagnoser")

Transform date(s) to the format yyww

Description

Transform date(s) to the format yyww

Usage

to_yyww(x)

Arguments

x

A date or a vector of dates.

Value

A vector of dates in the format yyww.


Transform date(s) to the format yyyymmdd

Description

Transform date(s) to the format yyyymmdd

Usage

to_yyyymmdd(x)

Arguments

x

A date or a vector of dates.

Value

A vector of dates in the format yyyymmdd.


Convert date format YYWW to YYYY-MM-DD

Description

Since the exact date isn't given in the input, this function will set the date to Monday of the week. As a precaution, a leading zero is added if it has been removed. This can e.g., happen if the input was "0107" and has been converted to a numeric 107.

Usage

yyww_to_yyyymmdd(yyww)

Arguments

yyww

Character(s) of the format YYWW.

Value

Date(s) in the format YYYY-MM-DD.