This document explains the rationale behind the development of this algorithm. Many of these text were taken from Anders Aasted Isaksen’s PhD Thesis as well as the validation paper (1). This document is a shorter and more concise version of those documents. We cover the:
Many individual-level data (e.g. civil registration, public healthcare contacts, and drug prescriptions) are automatically collected on all residents in Denmark and stored in nationwide Danish registers by Statistics Denmark (www.dst.dk, URL often hits redirect limits, so we can’t link directly) and the Danish Health Data Authority. These agencies are legally allowed to give access to the register data for research purposes, which provides (authorized) researchers a set of common, extensive data sources to use for studies. Any researcher associated with an approved Danish research institute (mainly Danish universities) can apply for access, but fees and conditions apply.
Register data is generally accessed and processed by approved researchers on remote servers operated by Statistics Denmark and the Danish Health Data Authority. The same raw data used by all researchers, coupled with a common virtual working environment, has the potential to enable reproducible research. This means that any data processing workflow could be transferable and reusable between research projects if the underlying code is designed with reproducibility in mind and the code is shared (“open-sourced”) (2). While reproducibility in research relates to transparent reporting of methods to enable others to reproduce analyses and experiments, this also applies to a diabetes classification program, which - if reproducible - could be reused by any researcher with access to the necessary register data to dynamically identify a study population of individuals with diabetes for their research needs (3).
In Denmark, the National Diabetes Register, established in 2006, was the first resource readily available to researchers to use for identifying diabetes cases through register data (4) . However, it was discontinued in 2012.
The next resource is the Register of Selected Chronic Diseases (RSCD), which was launched in 2014. It is currently the only publicly available resource to identify diabetes cases through Danish register data (by application to the Danish Health Data Authority).
General-purpose registers and other administrative databases often provide the basis of diabetes epidemiology, but they rarely contain validated diabetes-specific data, which may introduce bias in studies using this data. It is important to have an accurate tool to identify individuals with diabetes in the registers, as findings may differ with various diabetes definitions (5,6). Considerable efforts have been made towards establishing such a tool for diabetes research in several countries, including Denmark (7–9).
In a general population, classification algorithms (classifiers) need to not only identify type 1 diabetes as well as type 2 diabetes, but also account for events that might lead to inclusion of non-cases, such as the use of glucose-lowering drugs in the treatment of other conditions. Currently, no type-specific diabetes classifier has been validated in a general population, which leaves register-based studies in this area vulnerable to biases.
In Denmark, a limitation (or flaw) of the RSCD is that it has not been publicly validated and the source code behind the algorithm has not been made publicly available. Notably, the algorithm lacks inclusion based on elevated HbA1c levels (10). Likewise, the National Diabetes Register, since discontinued in 2012, had a validation study question its validity and called for future registers to adopt inclusion based on elevated HbA1c levels (11).
Since the launch of the RSCD, nationwide laboratory data on HbA1c testing has become available in the Danish register ecosystem (12), but this data is yet to be incorporated into available diabetes classifiers.
The currently available register-based diabetes classifiers have yet to incorporate the emerging register data on routine HbA1c testing. Wishing to take advantage of this data, we developed the Open Source Diabetes Classifier (OSDC). Detailed discussion of the advantages and disadvantages of it’s design is found in Anders Aasted Isaksen’s thesis, in the chapter on discussing the methods.
We aimed on developing this algorithm to: