| Type: | Package |
| Title: | Open Source Diabetes Classifier for Danish Registers |
| Version: | 0.9.17 |
| Description: | The algorithm first identifies a population of individuals from Danish register data with any type of diabetes as individuals with two or more inclusion events. Then, it splits this population into individuals with either type 1 diabetes or type 2 diabetes by identifying individuals with type 1 diabetes and classifying the remainder of the diabetes population as having type 2 diabetes. |
| License: | MIT + file LICENSE |
| URL: | https://github.com/steno-aarhus/osdc, https://steno-aarhus.github.io/osdc/ |
| BugReports: | https://github.com/steno-aarhus/osdc/issues |
| Depends: | R (≥ 4.2.0) |
| Imports: | checkmate, cli, codeCollection, dbplyr, dplyr, duckplyr, fabricatr, lifecycle, lubridate, purrr, rlang, rvest, stats, tidyselect, utils |
| Suggests: | glue, knitr, quarto, rmarkdown, spelling, stringr, testthat (≥ 3.0.0), tidyr, tibble |
| VignetteBuilder: | quarto |
| Config/testthat/edition: | 3 |
| Encoding: | UTF-8 |
| Language: | en-US |
| RoxygenNote: | 7.3.3 |
| NeedsCompilation: | no |
| Packaged: | 2025-12-04 21:14:55 UTC; luke |
| Author: | Signe Kirk Brødbæk
|
| Maintainer: | Luke William Johnston <lwjohnst@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2025-12-10 21:20:02 UTC |
osdc: Open Source Diabetes Classifier for Danish Registers
Description
The algorithm first identifies a population of individuals from Danish register data with any type of diabetes as individuals with two or more inclusion events. Then, it splits this population into individuals with either type 1 diabetes or type 2 diabetes by identifying individuals with type 1 diabetes and classifying the remainder of the diabetes population as having type 2 diabetes.
Author(s)
Maintainer: Luke William Johnston lwjohnst@gmail.com (ORCID)
Authors:
Signe Kirk Brødbæk signekb@clin.au.dk (ORCID)
Anders Aasted Isaksen andaas@rm.dk (ORCID)
Other contributors:
Steno Diabetes Center Aarhus [copyright holder]
Aarhus University [copyright holder]
See Also
Useful links:
Report bugs at https://github.com/steno-aarhus/osdc/issues
Add columns for information about insulin drug purchases
Description
Add columns for information about insulin drug purchases
Usage
add_insulin_purchases_cols(gld_hba1c_after_drop_steps)
Arguments
gld_hba1c_after_drop_steps |
The GLD and HbA1c data after drop steps |
Value
The same type as the input data, as a duckplyr::duckdb_tibble().
Three new columns are added:
-
has_two_thirds_insulin: A logical variable used in classifying type 1 diabetes. Seealgorithm()for more details. -
has_only_insulin_purchases: A logical variable used in classifying type 1 diabetes. Seealgorithm()for more details. -
has_insulin_purchases_within_180_days: A logical variable used in classifying type 1 diabetes. Seealgorithm()for more details.
See Also
See the vignette("algorithm") for the logic used to filter these
patients.
Add columns related to type 1 diabetes diagnoses
Description
This function evaluates whether an individual has a majority of type 1 diabetes-specific hospital diagnoses (DE10) among all type-specific diabetes primary diagnoses (DE10 & DE11) from endocrinology departments. If an individual doesn't have any type-specific diabetes diagnoses from endocrinology departments, the majority is determined by diagnoses from medical departments.
It also adds a column indicating whether an individual has at least one primary diagnosis related to type 1 diabetes.
This output is passed to the join_inclusions() function, where the
dates variable is used for the final step of the inclusion process.
The variables for whether the majority of diagnoses are for type 1 diabetes
is used for later classification of type 1 diabetes.
Usage
add_t1d_diagnoses_cols(data)
Arguments
data |
Data from |
Value
The same type as the input data, as a duckplyr::duckdb_tibble(),
with the following added columns and up to two rows per individual:
-
has_majority_t1d_diagnoses: A logical vector indicating whether the majority of primary diagnoses are related to type 1 diabetes. -
has_any_t1d_primary_diagnosis: A logical vector indicating whether there is at least one primary diagnosis related to type 1 diabetes.
See Also
See the vignette("algorithm") for the logic used to filter these
patients.
A list of the algorithmic logic underlying osdc.
Description
This nested list contains the logic details of the algorithm.
Usage
algorithm()
Format
Is a list with nested lists that have these named elements:
- register
Optional. The register used for this logic
- title
The title to use when displaying the logic in tables.
- logic
The logic itself.
- comments
Some additional comments on the logic.
Value
A nested list with the algorithmic logic. Contains
fields register, title, logic, and comments.
See Also
See the vignette("algorithm") for the logic used to filter these
patients.
Examples
algorithm()$is_hba1c_over_threshold
algorithm()$is_gld_code$logic
Translate to SQL for datetime conversion to eventually date
Description
DuckDB doesn't support using lubridate::as_date(), so this
uses dbplyr::sql() to directly use DuckDB's strptime to
convert strings to datetimes. Afterwards, it can be converted
to dates.
Usage
as_sql_datetime(x)
Arguments
x |
A character (or date) column, in quotes. |
Value
A Datetime column.
Check data types of the register variables
Description
Check data types of the register variables
Usage
check_data_types(data, register, call = rlang::caller_env())
Value
Outputs the register with only the required variables, and with column names in lower case.
Classify diabetes status using Danish registers.
Description
This function requires that each source of register data is represented as a single DuckDB object in R (e.g. a connection to Parquet files). Each DuckDB object must contain a single table covering all years of that data source, or at least the years you have and are interested in.
Usage
classify_diabetes(
kontakter,
diagnoser,
lpr_diag,
lpr_adm,
sysi,
sssy,
lab_forsker,
bef,
lmdb,
stable_inclusion_start_date = "1998-01-01"
)
Arguments
kontakter |
The contacts information table from the LPR3 patient register |
diagnoser |
The diagnoses information table from the LPR3 patient register |
lpr_diag |
The diagnoses information table from the LPR2 patient register |
lpr_adm |
The administrative information table from the LPR2 patient register |
sysi |
The SYSI table from the health service register |
sssy |
The SSSY table from the health service register |
lab_forsker |
The register for laboratory results for research |
bef |
The BEF table from the civil register |
lmdb |
The LMDB table from the prescription register |
stable_inclusion_start_date |
Cutoff date after which inclusion events are considered true incident diabetes cases. Defaults to "1998-01-01", which is one year after the data on pregnancy events from the Patient Register are considered valid for dropping gestational diabetes-related purchases of glucose-lowering drugs. This default assumes that the user is using LPR and LMDB data from at least Jan 1 1997 onward. If the user only has access to LPR and LMDB data from a later date, this parameter should be set to one year after the beginning of the user's data coverage. |
Value
The same object type as the input data, which would be a
duckplyr::duckdb_tibble() type object.
See Also
See the osdc vignette for a detailed description of the internal implementation of this classification function.
Examples
# Can't run this multiple times, will cause an error as the table
# has already been created in the DuckDB connection.
register_data <- registers() |>
names() |>
simulate_registers() |>
purrr::map(duckplyr::as_duckdb_tibble) |>
purrr::map(duckplyr::as_tbl)
classify_diabetes(
kontakter = register_data$kontakter,
diagnoser = register_data$diagnoser,
lpr_diag = register_data$lpr_diag,
lpr_adm = register_data$lpr_adm,
sysi = register_data$sysi,
sssy = register_data$sssy,
lab_forsker = register_data$lab_forsker,
bef = register_data$bef,
lmdb = register_data$lmdb
)
After filtering, classify those with type 1 diabetes.
Description
After filtering, classify those with type 1 diabetes.
Usage
classify_t1d(data)
Arguments
data |
Joined data output from the filtering steps. |
Value
The same object type as the input data, which would be a
duckplyr::duckdb_tibble() type object.
Convert column names to lower case
Description
Convert column names to lower case
Usage
column_names_to_lower(data)
Arguments
data |
An data frame type object. |
Value
The same object type given.
Create a vector with random ATC codes
Description
Anatomical Therapeutic Chemical (ATC) codes are unique medicine codes based on on what organ or system it works on and how it works.
Usage
create_fake_atc(n)
Arguments
n |
The number of random ATC codes to generate. |
Value
A character vector of ATC codes.
Create fake dates
Description
Create fake dates
Usage
create_fake_date(n, from = "1977-01-01", to = lubridate::today())
Arguments
n |
The number of dates to generate. |
from |
A date determining the first date in the interval to sample from. |
to |
A date determining the last date in the interval to sample from. |
Value
A vector of dates.
Create a vector of random department specialties
Description
Create a vector of random department specialties
Usage
create_fake_hovedspeciale_ans(n)
Arguments
n |
The number of department specialties to create. |
Value
A character vector.
Create a vector with random ICD-8 or -10 diagnoses
Description
Create a vector with random ICD-8 or -10 diagnoses
Usage
create_fake_icd(n, date = NULL)
Arguments
n |
The number of ICD-8 or -10 diagnoses to generate. |
date |
A date determining whether the diagnoses should be ICD-8 or ICD-10. If null, a random date will be sampled to determine which ICD revision the diagnosis should be from. In the Danish registers, ICD-10 is used after 1994. |
Value
A character vector of ICD-10 diagnoses.
Create a vector of random ICD-10 diagnoses
Description
ICD-10 is the 10th revision of the International Classification of Diseases.
Usage
create_fake_icd10(n)
Arguments
n |
An integer determining how many diagnoses to create. |
Value
A character vector of ICD-10 diagnoses.
Source
The stored CSV is downloaded from the Danish Health Data Authority's
website at medinfo.dk
Create a vector of random ICD-8 diagnoses
Description
ICD-8 is the 8th revision of the International Classification of Diseases.
Usage
create_fake_icd8(n)
Arguments
n |
The number of ICD-8 diagnoses to generate. |
Value
A character vector of ICD-8 diagnoses.
Create a vector of random NPU codes
Description
Nomenclature for Properties and Units (NPUs) are codes that identifies laboratory results.
Usage
create_fake_npu(n)
Arguments
n |
The number of NPUs to create. |
Value
A character vector.
Create inclusion dates from all the inclusion events
Description
This function takes the output from
join_inclusions() and defines the final inclusion dates, raw and stable
based on all inclusion event types. Since inclusion requires at least two
events (can be multiple events of the same type or any combination of
different types), this function keeps only those with 2 or more events. E.g.,
an individual with two elevated HbA1c tests followed by a glucose-lowering
drug purchase is included at the latest elevated HbA1c test. Had the second
HbA1c test not been performed (or had it returned a result below the
diagnostic threshold), this person would instead have been included at the
date of the first purchase of glucose-lowering drugs.
Usage
create_inclusion_dates(inclusions, stable_inclusion_start_date = "1998-01-01")
Arguments
inclusions |
Output from |
stable_inclusion_start_date |
Cutoff date after which inclusion events are considered reliable (e.g., after changes in drug labeling or data entry practices). Defaults to "1998-01-01" which is one year after obstetric codes are reliable in the GLD data (since we use LPR data to drop rows related to gestational diabetes). This limits the included cohort to individuals with inclusion dates after this cutoff date. |
Value
The same type as the input data, as a duckplyr::duckdb_tibble(),
with the pnr and date columns along with the columns from the input
that's needed to classify T1D.
It also creates two new columns:
-
raw_inclusion_date: Date of raw inclusion, the second earliest recorded event for each individual. -
stable_inclusion_date: Same as raw inclusion date, but set toNAif the raw inclusion date is before the stable inclusion start date.
Create a vector of reproducible, random zero-padded integers.
Description
For a given number of generated integers that are the same length, they will
always be identical. This makes it easier to do joining by
values that represent people, e.g. in pnr, cpr, recnum and
dw_ek_kontakt.
Usage
create_padded_integer(n, length)
Arguments
n |
The number of integer strings to generate. |
length |
The length of the padded integer strings. |
Value
A character vector of integers.
Simulate data based on simulation definitions
Description
Simulate data based on simulation definitions
Usage
create_simulated_data(data, n)
Arguments
data |
A tibble with simulation definitions. |
n |
Number of observations to simulate. |
Value
A tibble with simulated data.
Drop rows with metformin purchases for the treatment of PCOS
Description
Takes the output from keep_gld_purchases() and bef (information on
sex and date of birth) to drop rows with metformin purchases that are
potentially for the treatment of polycystic ovary syndrome. This function
only performs a filtering operation so it outputs the same structure and
variables as the input from keep_gld_purchases(), except the
addition of a logical helper variable no_pcos that is used in later
functions. After these rows have been dropped, the output is used by
drop_pregnancies().
Usage
drop_pcos(gld_purchases, bef)
Arguments
gld_purchases |
The output from |
bef |
The |
Value
The same type as the input data, as a duckplyr::duckdb_tibble().
It also has the same columns as keep_gld_purchases(), except for a
logical helper variable no_pcos that is used in later functions.
See Also
See the vignette("algorithm") for the logic used to filter these
patients.
Drop pregnancy events that could be gestational diabetes
Description
This function takes the combined outputs from
keep_pregnancy_dates(), keep_hba1c(), and
drop_pcos() and uses diagnoses from LPR2 or LPR3 to drop both
elevated HbA1c tests and GLD purchases during pregnancy, as these may be due
to gestational diabetes, rather than type 1 or type 2 diabetes. The aim is to
identify pregnancies based on diagnosis codes specific to pregnancy-ending
events (e.g. live births or miscarriages), and then use the dates of these
events to remove inclusion events in the preceding months that may be related
to gestational diabetes (e.g. elevated HbA1c tests or purchases of
glucose-lowering drugs during pregnancy).
After these drop functions have been applied, the output serves as
input to the add_insulin_purchases_cols() function.
Usage
drop_pregnancies(dropped_pcos, pregnancy_dates, included_hba1c)
Arguments
dropped_pcos |
Output from |
pregnancy_dates |
Output from |
included_hba1c |
Output from |
Value
The same type as the input data, as a duckplyr::duckdb_tibble().
Has the same output data as the input drop_pcos(), except
for a helper logical variable no_pregnancy that is used in later
functions.
See Also
See the vignette("algorithm") for the logic used to filter these
patients.
Create a synthetic dataset of edge case inputs
Description
This function generates a list of tibbles representing the Danish health
registers and the data necessary to run the algorithm. The dataset contains
23 individual cases (pnrs), each designed to test a specific logical branch
of the diabetes classification algorithm, including inclusion, exclusion,
censoring, and type classification rules.
The generated data is used in testthat tests to ensure the algorithm
behaves as expected under a wide range of conditions, but it is also intended
to be explored by users to better understand how the algorithm logic works.
Usage
edge_cases()
Value
A named list of 9 tibble::tibble() objects, each representing a
different health register: bef, lmdb, lpr_adm, lpr_diag,
kontakter, diagnoser, sysi, sssy, and lab_forsker.
Examples
edge_cases()
Convert all factor variables to character variables.
Description
Convert all factor variables to character variables.
Usage
fct_to_chr(data)
Arguments
data |
A tibble or data frame. |
Value
Get the algorithmic logic and convert to an R logic condition.
Description
Get the algorithmic logic and convert to an R logic condition.
Usage
get_algorithm_logic(logic_name, algorithm = NULL)
Arguments
logic_name |
The name of the logic to use. |
algorithm |
The list of algorithmic logic, one list for each. |
Value
A character string.
Get a list of the registers' abbreviations.
Description
Get a list of the registers' abbreviations.
Usage
get_register_abbrev()
Value
A character string.
Get a list of required variables from a specific register.
Description
Get a list of required variables from a specific register.
Usage
get_required_variables(register)
Arguments
register |
The abbreviation of the register name. See list of
abbreviations in |
Value
A character vector of variable names.
Insert additional analysis codes for HbA1c
Description
Insert additional analysis codes for HbA1c
Usage
insert_analysiscode(data, proportion = 0.3)
Arguments
data |
A tibble. |
proportion |
Proportion to re-sample. Defaults to 0.3. |
Value
A tibble. If a column is named analysiscode, a proportion of the
values are replaced by codes for HbA1c.
Insert cases where metformin is used for other purposes than diabetes
Description
This function uses the variable indo which is the code for the underlying
condition treated by the prescribed medication.
Usage
insert_false_metformin(data, proportion = 0.05)
Arguments
data |
A tibble. |
proportion |
Proportion to re-sample. Defaults to 0.05. |
Value
A tibble. If all column names in the tibble is either atc, a
proportion of observations is re-sampled as metformin.
Insert specific ATC codes based on a proportion
Description
Insert specific ATC codes based on a proportion
Usage
insert_specific_atc(data, proportion = 0.3)
Arguments
data |
A tibble. |
proportion |
Proportion to be resampled. Defaults to 0.3. |
Value
A tibble with a proportion of resampled ATC codes for columns named 'atc'
Generate logic based on a probability
Description
Generate logic based on a probability
Usage
insertion_rate(proportion)
Arguments
proportion |
A double between 0 and 1. |
Value
A logic vector. TRUE if the random number is less than the proportion, otherwise FALSE.
Join kept inclusion events.
Description
This function joins the outputs from all the filtering
functions, by pnr and dates. Input datasets:
-
diabetes_diagnoses: Dates are the first and second hospital diabetes diagnosis. -
podiatrist_services: Dates are the first and second diabetes-specific podiatrist record. -
gld_hba1c_after_drop_steps: Dates are the first and second elevated HbA1c test results (after excluding results potentially influenced by gestational diabetes), and the first and second purchase of a glucose-lowering drug (after excluding purchases potentially related to polycystic ovary syndrome or gestational diabetes).
Usage
join_inclusions(
diabetes_diagnoses,
podiatrist_services,
gld_hba1c_after_drop_steps
)
Arguments
diabetes_diagnoses |
Output from |
podiatrist_services |
Output from |
gld_hba1c_after_drop_steps |
Output from |
Value
The same type as the input data, as a duckplyr::duckdb_tibble(),
with the joined columns from the output of keep_diabetes_diagnoses(),
keep_podiatrist_services(), drop_pcos(), and
drop_pregnancies(). There will be 1-8 rows per pnr.
See Also
See the vignette("algorithm") for the logic used to filter these
patients.
Keep rows with diabetes diagnoses.
Description
This function uses the hospital contacts from LPR2 and LPR3 to include all dates of diabetes diagnoses to use for inclusion, as well as additional information needed to classify diabetes type. Diabetes diagnoses from both ICD-8 and ICD-10 are included.
Usage
keep_diabetes_diagnoses(lpr2, lpr3)
Arguments
lpr2 |
The output from |
lpr3 |
The output from |
Value
The same type as the input data, as a duckplyr::duckdb_tibble(),
with less rows after filtering.
See Also
See the vignette("algorithm") for the logic used to filter these
patients.
Keep rows with purchases of glucose lowering drugs (GLD)
Description
This function doesn't keep glucose-lowering drugs that may be used for other
conditions than diabetes like GLP-RAs or dapagliflozin/empagliflozin drugs.
Since the diagnosis code data on pregnancies (see below) is insufficient to
perform censoring prior to 1997, keep_gld_purchases() only extracts
dates from 1997 onward by default (if Medical Birth Register data is
available to use for censoring, the extraction window can be extended).
Usage
keep_gld_purchases(lmdb)
Arguments
lmdb |
The |
Value
The same type as the input data, as a duckplyr::duckdb_tibble().
Only rows with glucose lowering drug purchases are kept, plus some columns are renamed.
See Also
See the vignette("algorithm") for the logic used to filter these
patients.
Keep rows with HbA1c above the required threshold.
Description
In the lab_forsker register, NPU27300 is HbA1c in the modern units (IFCC)
while NPU03835 is HbA1c in old units (DCCT). Multiple elevated results on the
same day within each individual are deduplicated, to account for the same
test result often being reported twice (one for IFCC, one for DCCT units).
Usage
keep_hba1c(lab_forsker)
Arguments
lab_forsker |
The |
Details
The output is passed to the drop_pregnancies() function for
filtering of elevated results due to potential gestational diabetes (see
below).
Value
An object of the same input type, as a duckplyr::duckdb_tibble(),
with three columns:
-
pnr: Personal identification variable. -
dates: The dates of all elevated HbA1c test results.
Keep rows with diabetes-specific podiatrist services.
Description
This function uses the sysi or sssy registers as input to extract the
dates of all diabetes-specific podiatrist services. Removes duplicate
services on the same date.
The output is passed to the join_inclusions() function for the final
step of the inclusion process.
Usage
keep_podiatrist_services(sysi, sssy)
Arguments
sysi |
The SYSI register. |
sssy |
The SSSY register. |
Value
The same type as the input data, as a duckplyr::duckdb_tibble().
-
pnr: Identifier variable -
date: The dates of the first and second diabetes-specific podiatrist record
See Also
See the vignette("algorithm") for the logic used to filter these
patients.
Simple function to get only the pregnancy event dates.
Description
Simple function to get only the pregnancy event dates.
Usage
keep_pregnancy_dates(lpr2, lpr3)
Arguments
lpr2 |
Output from |
lpr3 |
Output from |
Value
The same type as the input data, as a duckplyr::duckdb_tibble().
See Also
See the vignette("algorithm") for the logic used to filter these
patients.
Keep two earliest events per PNR
Description
Since the classification date is based on the second instance of an inclusion criteria, we need to keep the earliest two events per PNR per inclusion "stream".
This function is applied to each "stream", diabetes_diagnoses,
podiatrist_services, and gld_hba1c_after_drop_steps, in the
classify_diabetes() function after the keep and drop steps, right before
they are joined.
Usage
keep_two_earliest_events(data)
Arguments
data |
Data including at least a date and pnr column. |
Value
The same type as the input data.
Parse the logic strings into R expressions
Description
Parse the logic strings into R expressions
Usage
logic_as_expression(logic)
Arguments
logic |
The name of the logic to use. |
Value
An R expression.
List of non-cases to test the diabetes classification algorithm
Description
This function generates a list of tibbles representing the Danish health registers and the data necessary to run the algorithm. The dataset contains individuals who should not be included in the final classified cohort.
Usage
non_cases()
Details
The generated data is used in testthat tests to ensure the algorithm
behaves as expected under a wide range of conditions, but it is also intended
to be explored by users to better understand how the algorithm logic works
and to be shown in the documentation.
Value
A named list of 9 tibble::tibble() objects, each representing a
different health register: bef, lmdb, lpr_adm, lpr_diag,
kontakter, diagnoser, sysi, sssy, and lab_forsker.
Examples
non_cases()
Description of the different non-cases included in non_cases()
Description
All cases, aside from what would exclude them from being classified as described in the metadata here, would otherwise be classified as having diabetes.
Usage
non_cases_metadata()
Value
A named list of character strings, where each name corresponds to a
non-case PNR in the dataset generated by non_cases().
Examples
non_cases_metadata()
Zero pad an integer to a specific length
Description
Zero pad an integer to a specific length
Usage
pad_integers(x, width)
Arguments
x |
An integer or vector of integers. |
width |
An integer describing the final width of the zero-padded integer. |
Value
A character vector of integers.
Prepare and join the two LPR2 registers to extract diabetes and pregnancy diagnoses.
Description
The output is used as inputs to keep_diabetes_diagnoses() and to
keep_pregnancy_dates().
Usage
prepare_lpr2(lpr_adm, lpr_diag)
Arguments
lpr_adm |
The LPR2 register containing hospital admissions. |
lpr_diag |
The LPR2 register containing diabetes diagnoses. |
Value
The same type as the input data, as a duckplyr::duckdb_tibble(),
with the following columns:
-
pnr: The personal identification variable. -
date: The date of all the recorded diagnosis (renamed fromd_inddtoordato_start). -
is_primary_diagnosis: Whether the diagnosis was a primary diagnosis. -
is_diabetes_code: Whether the diagnosis was any type of diabetes. -
is_t1d_code: Whether the diagnosis was T1D-specific. -
is_t2d_code: Whether the diagnosis was T2D-specific. -
is_pregnancy_code: Whether the person has an event related to pregnancy like giving birth or having a miscarriage at the given date. -
is_endocrinology_dept: Whether the diagnosis was made by an endocrinology medical department. -
is_medical_dept: Whether the diagnosis was made by a non-endocrinology medical department.
See Also
See the vignette("algorithm") for the logic used to filter these
patients.
Prepare and join the two LPR3 registers to extract diabetes and pregnancy diagnoses.
Description
The output is used as inputs to keep_diabetes_diagnoses() and to
keep_pregnancy_dates().
Usage
prepare_lpr3(kontakter, diagnoser)
Arguments
kontakter |
The LPR3 register containing hospital contacts/admissions. |
diagnoser |
The LPR3 register containing diabetes diagnoses. |
Value
The same type as the input data, as a duckplyr::duckdb_tibble(),
with the following columns:
-
pnr: The personal identification variable. -
date: The date of all the recorded diagnosis (renamed fromd_inddtoordato_start). -
is_primary_diagnosis: Whether the diagnosis was a primary diagnosis. -
is_diabetes_code: Whether the diagnosis was any type of diabetes. -
is_t1d_code: Whether the diagnosis was T1D-specific. -
is_t2d_code: Whether the diagnosis was T2D-specific. -
is_pregnancy_code: Whether the person has an event related to pregnancy like giving birth or having a miscarriage at the given date. -
is_endocrinology_dept: Whether the diagnosis was made by an endocrinology medical department. -
is_medical_dept: Whether the diagnosis was made by a non-endocrinology medical department.
See Also
See the vignette("algorithm") for the logic used to filter these
patients.
Register variables (with descriptions) required for the osdc algorithm.
Description
Register variables (with descriptions) required for the osdc algorithm.
Usage
registers()
Value
Outputs a list of registers and variables required by osdc. Each list item contains the official Danish name of the register, the start year, the end year, and the variables with their descriptions. The variables item is a data frame with 4 columns:
- name
The official name of the variable found in the register.
- danish_description
The official Danish description of the variable.
- english_description
The translated English description of the variable.
- data_type
The data type, e.g. "character" of the variable. Could have multiple options (e.g. "Date" or "character").
Source
Many of the details within the registers() metadata come
from the full official list of registers from Statistics Denmark (DST):
https://www.dst.dk/extranet/forskningvariabellister/Oversigt%20over%20registre.html
Examples
registers()
Select the required variables from the register
Description
This function selects only the required variables, convert to lower case, and then check that the data types are as expected.
Usage
select_required_variables(data, register, call = rlang::caller_env())
Arguments
data |
The register to select columns from. |
register |
The abbreviation of the register name. See list of
abbreviations in |
call |
The environment where the function is called, so that the error traceback gives a more meaningful location. |
Value
Outputs the register with only the required variables, and with column names in lower case.
Simulate a fake data frame of one or more Danish registers
Description
Simulate a fake data frame of one or more Danish registers
Usage
simulate_registers(registers, n = 1000)
Arguments
registers |
The name of the register you want to simulate. |
n |
The number of rows to simulate for the resulting register. |
Value
A list with simulated register data, as a tibble::tibble().
Examples
simulate_registers(c("bef", "sysi"))
simulate_registers("bef")
simulate_registers("diagnoser")
Transform date(s) to the format yyww
Description
Transform date(s) to the format yyww
Usage
to_yyww(x)
Arguments
x |
A date or a vector of dates. |
Value
A vector of dates in the format yyww.
Transform date(s) to the format yyyymmdd
Description
Transform date(s) to the format yyyymmdd
Usage
to_yyyymmdd(x)
Arguments
x |
A date or a vector of dates. |
Value
A vector of dates in the format yyyymmdd.
Convert date format YYWW to YYYY-MM-DD
Description
Since the exact date isn't given in the input, this function will set the date to Monday of the week. As a precaution, a leading zero is added if it has been removed. This can e.g., happen if the input was "0107" and has been converted to a numeric 107.
Usage
yyww_to_yyyymmdd(yyww)
Arguments
yyww |
Character(s) of the format YYWW. |
Value
Date(s) in the format YYYY-MM-DD.