key: cord-0874747-y7pwj7x1
authors: Chatzimichail, Theodora; Hatjimihail, Aristides T.
title: A Software Tool for Exploring the Relation between Diagnostic Accuracy and Measurement Uncertainty
date: 2020-08-19
journal: Diagnostics (Basel)
DOI: 10.3390/diagnostics10090610
sha: 618e92639a08d12634da3c97d786c125fe2105b6
doc_id: 874747
cord_uid: y7pwj7x1

Screening and diagnostic tests are used to classify people with and without a disease. Diagnostic accuracy measures are used to evaluate the correctness of a classification in clinical research and practice. Although this depends on the uncertainty of measurement, there has been limited research on their relation. The objective of this work was to develop an exploratory tool for the relation between diagnostic accuracy measures and measurement uncertainty, as diagnostic accuracy is fundamental to clinical decision-making, while measurement uncertainty is critical to quality and risk management in laboratory medicine. For this reason, a freely available interactive program was developed for calculating, optimizing, plotting and comparing various diagnostic accuracy measures and the corresponding risk of diagnostic or screening tests measuring a normally distributed measurand, applied at a single point in time in non-diseased and diseased populations. This is done for differing prevalence of the disease, mean and standard deviation of the measurand, diagnostic threshold, standard measurement uncertainty of the tests and expected loss. The application of the program is illustrated with a case study of glucose measurements in diabetic and non-diabetic populations. The program is user-friendly and can be used as an educational and research tool in medical decision-making.

An increasing number of in vitro screening and diagnostic tests are extensively used as binary classifiers in medicine, to classify people into the non-overlapping classes of populations with and without a disease, which are categorized as quantitative and qualitative. The quantitative and many qualitative screening or diagnostic tests are based on measurements. There is a probability distribution of the measurements in each of the diseased and non-diseased populations. To classify the patients with and without a disease, using a test based on a measurement, a diagnostic threshold or cutoff point is defined. If the measurement is above the threshold, the patient is classified as test-positive; otherwise, the patient is classified as test-negative ( Figure 1 ) or inversely. The possible test results are summarized in Table 1 . From the large number of diagnostic accuracy measures (DAM) appearing in the literature, only a few are used for evaluating the diagnostic accuracy in clinical research and practice [1] . These include the following:

1. Sensitivity (Se), specificity (Sp), diagnostic odds ratio (DOR), likelihood ratio for positive or negative result (LR + and LR −, respectively), which are defined conditionally on the true disease status [2] and are prevalence invariant. 2. Overall diagnostic accuracy (ODA), which is defined conditionally on the true disease status and is prevalence-dependent. 3 . Positive predictive and negative predictive value (PPV and NPV), which are defined conditionally on the test outcome and are prevalence-dependent.

The natural frequency and the equivalent probability definitions of the diagnostic accuracy measures derived from Table 1 and analyzed by the program are presented in Table 2 . The symbols are explained in Appendix A.

Receiver operating characteristic (ROC) curves are also used for the evaluation of the diagnostic performance of a screening or diagnostic test [3] . ROC curves are plots of Se against 1-Sp of the test.

A related summary measure of diagnostic accuracy is the area under a ROC curve (AUC) [4, 5] . The area over a ROC curve (AOC) has been proposed as a complementary summary measure of the diagnostic inaccuracy [6] .

Recently, the predictive receiver operating characteristic (PROC) curves have also been proposed. PROC curves are plots of PPV against 1-NPV of the test [2] .

For the optimization of binary classifiers, objective or loss functions have been proposed. They are based on diagnostic accuracy measures that can be maximized or minimized by finding the optimal diagnostic threshold. These measures include Youden's index (J) [7] , Euclidean distance of a ROC curve point from the point (0, 1) (ED) [8] and the concordance probability measure (CZ) [9] . The abovementioned measures are defined conditionally on the true disease status and are prevalence invariant. Their respective probability and natural frequency definitions are presented in Table 2 .

The risk of a diagnostic or screening test is related to its diagnostic accuracy and is defined as its expected loss. Therefore, it depends upon the following ( Table 2) : From the large number of diagnostic accuracy measures (DAM) appearing in the literature, only a few are used for evaluating the diagnostic accuracy in clinical research and practice [1] . These include the following:

1.

Sensitivity (Se), specificity (Sp), diagnostic odds ratio (DOR), likelihood ratio for positive or negative result (LR + and LR −, respectively), which are defined conditionally on the true disease status [2] and are prevalence invariant.

Overall diagnostic accuracy (ODA), which is defined conditionally on the true disease status and is prevalence-dependent.

Positive predictive and negative predictive value (PPV and NPV), which are defined conditionally on the test outcome and are prevalence-dependent.

The natural frequency and the equivalent probability definitions of the diagnostic accuracy measures derived from Table 1 and analyzed by the program are presented in Table 2 . The symbols are explained in Appendix A.

Receiver operating characteristic (ROC) curves are also used for the evaluation of the diagnostic performance of a screening or diagnostic test [3] . ROC curves are plots of Se against 1-Sp of the test.

A related summary measure of diagnostic accuracy is the area under a ROC curve (AUC) [4, 5] . The area over a ROC curve (AOC) has been proposed as a complementary summary measure of the diagnostic inaccuracy [6] .

Recently, the predictive receiver operating characteristic (PROC) curves have also been proposed. PROC curves are plots of PPV against 1-NPV of the test [2] .

For the optimization of binary classifiers, objective or loss functions have been proposed. They are based on diagnostic accuracy measures that can be maximized or minimized by finding the optimal diagnostic threshold. These measures include Youden's index (J) [7] , Euclidean distance of a ROC curve point from the point (0, 1) (ED) [8] and the concordance probability measure (CZ) [9] . The abovementioned measures are defined conditionally on the true disease status and are prevalence invariant. Their respective probability and natural frequency definitions are presented in Table 2 .

The risk of a diagnostic or screening test is related to its diagnostic accuracy and is defined as its expected loss. Therefore, it depends upon the following ( Table 2): 1.

The expected loss for the testing procedure, for a true negative result, for a false negative result, for a true positive result and for a false positive result, defined on the same scale.

The probabilities for a true negative result, for a false negative result, for a true positive result and for a false positive result. Risk is defined conditionally on the true disease status and is prevalence-dependent.

As there is inherent variability in any measurement process, there is measurement uncertainty, which is defined as a "parameter, associated with the result of a measurement, that characterizes the dispersion of the values that could reasonably be attributed to the measurand" [10] . The parameter may be the standard measurement uncertainty (u), expressed as a standard deviation and estimated as described in "Expression of Measurement Uncertainty in Laboratory Medicine" [11] . Bias may be considered as a component of the standard measurement uncertainty [12] .

The measurement uncertainty is gradually replacing the total analytical error concept [13] .

Although the estimation of measurement uncertainty is essential for quality assurance in laboratory medicine [11] , its effect on clinical decision-making and consequently on clinical outcomes is rarely quantified [14] . As direct-outcome studies are very complex, a feasible first step is exploring the effect of measurement uncertainty on misclassification [15] and subsequently on diagnostic accuracy measures and the corresponding risk. Exploring this relation could assist the process of estimation of the optimal diagnostic threshold or the permissible measurement uncertainty.

For the calculation of the diagnostic accuracy measures, the following is assumed:

There is a reference ("gold standard") diagnostic method classifying correctly a subject as diseased or non-diseased [16] .

The parameters of the distributions of the measurand are known. 

Either the values of the measurand or their transforms [17, 18] are normally distributed in each of the diseased and non-diseased populations.

The measurement uncertainty is normally distributed and homoscedastic in the diagnostic threshold's range.

If the measurement is above the threshold the patient is classified as test-positive otherwise as test-negative.

Hereafter, we use the term measurand to describe either the normally distributed value of a measurand or its normally distributed applicable transform.

Consequently, if σ is the standard deviation of the measurements of a screening or diagnostic test applied in a population (P), u the standard measurement uncertainty and σ p the standard deviation of the measurand in the population, then we get the following equation:

The definitions of the diagnostic accuracy measures can be expressed in terms of sensitivity (Se) and specificity (Sp). These definitions are derived from Table 2 and presented in Table 3 . Table 3 . Definitions of diagnostic accuracy measures against sensitivity and specificity.

Positive Predictive Value (PPV)

Diagnostic Odds Ratio (DOR) Youden's Index (J)

The symbols are explained in Appendix A.

The functions of sensitivity (Se) and specificity (Sp), hence the functions of all the above diagnostic accuracy measures, can be expressed in terms of the cumulative distribution function of the normal distribution and therefore of the error function and the complementary error function.

The error function, erf (x), is defined as follows:

while the complementary error function, erfc (x), is defined as follows:

Diagnostics 2020, 10, 610

Following the definition of the sensitivity and specificity of a test (Table 2) , the respective functions against diagnostic threshold (d) are calculated as follows:

where Ψ denotes the cumulative distribution function of a normal distribution; µ D the mean and σ D the standard deviation of the measurand of the test in the diseased population; µ D the mean and σ D the standard deviation of the measurand of the test in the non-diseased population; and u the standard measurement uncertainty of the test. Then, the sensitivity function of a test against its specificity (z) is calculated as follows:

The specificity function of a single test against its sensitivity (y) is calculated as follows:

Following Table 3 and Equations (4)- (7), the diagnostic accuracy measures of a test are defined as functions of either its diagnostic threshold, sensitivity, or specificity. Consequently, the derived parametric equations defining each measure can be used to explore the relations between any two measures.

Following the definition of the ROC curves and assuming a normal probability density function of the measurands of each of the diseased and non-diseased populations, the ROC function is calculated as follows:

where S denotes the survival function of normal distribution. Consequently, we get the following:

The function of the area under the ROC curve is defined as follows:

Diagnostics 2020, 10, 610

Moreover, it is calculated as follows:

where Φ denotes the cumulative distribution function of the standard normal distribution. The function of the area over the ROC curve is defined as follows:

Another ROC curve related quantity is the Euclidean distance (ED) of a ROC curve point t, roc t, µ D , µ D , σ D , σ D , u from the point (0, 1) or equivalently the Euclidean distance of the point (Se, Sp) from the point (1, 1) of perfect diagnostic accuracy. The respective function is defined as follows:

The predictive ROC (PROC) curve relation is defined as follows [2] :

This relation cannot be expressed in terms of elementary or survival functions.

To explore the relation between diagnostic accuracy measures or the corresponding risk and measurement uncertainty, an interactive program written in Wolfram Language [19] was developed in Wolfram Mathematica ® , ver. 12.1 [20] . This program was designed to provide five modules and six submodules for calculating, optimizing, plotting and comparing various diagnostic accuracy measures and the corresponding risk of two screening or diagnostic tests, applied at a single point in time in non-diseased and diseased populations ( Figure 2 ). The two tests measure the same measurand, for varying values of the prevalence of the disease, the mean and standard deviation of the measurand in the populations and the standard measurement uncertainty of the tests. The two tests differ in measurement uncertainty. It is assumed that the measurands and the measurement uncertainty are normally distributed.

Parts of this program have been presented in a series of demonstrations, at Wolfram Demonstration Project of Wolfram Research [6, [21] [22] [23] [24] [25] [26] [27] .

The program is freely available as a Wolfram Mathematica ® notebook (.nb) at: https://www.hcsl.c om/Tools/Relation.nb. It can be run on Wolfram Player ® or Wolfram Mathematica ® (see Appendix B). Detailed description of the interface of the program is available as Supplementary Material.

Parts of this program have been presented in a series of demonstrations, at Wolfram Demonstration Project of Wolfram Research [6, [21] [22] [23] [24] [25] [26] [27] .

The program is freely available as a Wolfram Mathematica ® notebook (.nb) at: https://www.hcsl.com/Tools/Relation.nb. It can be run on Wolfram Player ® or Wolfram Mathematica ® (see Appendix B). Detailed description of the interface of the program is available as Supplementary Material. 

The modules and the submodules of the program include panels with controls which allow the interactive manipulation of various parameters, as described in detail in Supplementary Material. These are the following:

The receiver operating characteristic (ROC) curves or the predictive receiver operating characteristic (PROC) curves of the two tests are plotted. 

The modules and the submodules of the program include panels with controls which allow the interactive manipulation of various parameters, as described in detail in Supplementary Material. These are the following:

The receiver operating characteristic (ROC) curves or the predictive receiver operating characteristic (PROC) curves of the two tests are plotted.

A table with the respective AUC and AOC and their relative difference is also presented with the ROC curves plot ( Figure 3 ). A table with the respective AUC and AOC and their relative difference is also presented with the ROC curves plot (Figure 3 ). 

It includes the following submodules:

•

The values of the diagnostic accuracy measures or the corresponding risk of the two tests, their partial derivatives with respect to standard measurement uncertainty, their difference, relative difference and ratio are plotted against the diagnostic threshold of each test (Figure 4 ).

Diagnostic accuracy measures against prevalence

The values of the diagnostic accuracy measures or the corresponding risk of the two tests, their partial derivatives with respect to standard measurement uncertainty, their difference, relative difference and ratio are plotted against the prevalence of the disease ( Figure 5 ). 

It includes the following submodules:

•

The values of the diagnostic accuracy measures or the corresponding risk of the two tests, their partial derivatives with respect to standard measurement uncertainty, their difference, relative difference and ratio are plotted against the diagnostic threshold of each test (Figure 4 ).

Diagnostic accuracy measures against prevalence

The values of the diagnostic accuracy measures or the corresponding risk of the two tests, their partial derivatives with respect to standard measurement uncertainty, their difference, relative difference and ratio are plotted against the prevalence of the disease ( Figure 5 ). • Diagnostic accuracy measures against standard measurement uncertainty

The values of the diagnostic accuracy measures or the corresponding risk of a test are plotted against the standard measurement uncertainty of the test (Figure 6 ). • Diagnostic accuracy measures against standard measurement uncertainty

The values of the diagnostic accuracy measures or the corresponding risk of a test are plotted against the standard measurement uncertainty of the test (Figure 6 ). 

It includes the following submodules:

• Diagnostic accuracy measures against sensitivity or specificity

The values of the diagnostic accuracy measures or the corresponding risk of the two tests, their partial derivatives with respect to standard measurement uncertainty, their difference, relative difference and ratio are plotted against either the sensitivity or the specificity of each test (Figure 7 ).

Diagnostic accuracy measures against sensitivity and specificity

The values of the diagnostic accuracy measures or the corresponding risk of the two tests or their partial derivatives, with respect to standard measurement uncertainty, are plotted against the sensitivity and the specificity of each test in three-dimensional line plots (Figure 8 ). 

It includes the following submodules:

•

The values of the diagnostic accuracy measures or the corresponding risk of the two tests, their partial derivatives with respect to standard measurement uncertainty, their difference, relative difference and ratio are plotted against either the sensitivity or the specificity of each test (Figure 7 ).

The values of the diagnostic accuracy measures or the corresponding risk of the two tests or their partial derivatives, with respect to standard measurement uncertainty, are plotted against the sensitivity and the specificity of each test in three-dimensional line plots (Figure 8 ). 

The values of various diagnostic accuracy measures and the corresponding risk of each of the two tests and their respective relative differences, at a selected diagnostic threshold, are calculated and presented in a table (Figure 10 ). 

The values of various diagnostic accuracy measures and the corresponding risk of each of the two tests and their respective relative differences, at a selected diagnostic threshold, are calculated and presented in a table (Figure 10 ).

An optimal diagnostic threshold for each test is calculated according to a selected objective or loss function. Then the values of various diagnostic accuracy measures and the corresponding risk of each of the two tests, at the respective optimal threshold, are presented in a table (Figure 11 ). 

An optimal diagnostic threshold for each test is calculated according to a selected objective or loss function. Then the values of various diagnostic accuracy measures and the corresponding risk of each of the two tests, at the respective optimal threshold, are presented in a table (Figure 11 ). 

The program was applied to a bimodal joint distribution, based on log-transformed blood glucose measurements in non-diabetic and diabetic Malay populations, during an oral glucose tolerance test (OGTT) [28] . Briefly, after the ingestion of 75 g glucose monohydrate, the two-hour postprandial blood glucose of 2667 Malay adults, aged 40-49 years, was measured with reflectance photometry. To apply the program, it was assumed that the prevalence of diabetes was 0.067, the 

The program was applied to a bimodal joint distribution, based on log-transformed blood glucose measurements in non-diabetic and diabetic Malay populations, during an oral glucose tolerance test (OGTT) [28] . Briefly, after the ingestion of 75 g glucose monohydrate, the two-hour postprandial blood glucose of 2667 Malay adults, aged 40-49 years, was measured with reflectance photometry. To apply the program, it was assumed that the prevalence of diabetes was 0.067, the measurement coefficient of variation and bias were equal to 4% and 2%, respectively and the log-transformed measurands of each population were normally distributed, as shown in Figure 1 . The normalized log-transformed measurand means and standard deviations in the diseased and non-diseased populations, the standard measurement uncertainty and the diagnostic threshold were expressed in units equal to the standard deviation of the log-transformed measurand in the non-diseased population. The normalized log-transformed diagnostic threshold 2.26 corresponds to the American Diabetes Association (ADA) diagnostic threshold for diabetes of the two-hour postprandial glucose during OGTT that is equal to 11.1 mmol/L [29] . The normalized log-transformed standard measurement uncertainties 0.023 and 0.23 correspond to standard measurement uncertainties equal to 1% and 10% of the mean of the measurand of the non-diabetic population or equivalently to a coefficient of variation equal to 1% and 10%, respectively.

The parameter settings of the illustrative case study are presented in Table 4 . The results of the application of the program are presented:

1.

In the plots of Figures 3-9, Figures 12-17. 

In the tables of Figures 10 and 11 . 3.

In Table 5 . Table 5 . 

The symbols of the settings column are explained in Appendix A. The optimal diagnostic thresholds with the respective parameters in Table 4 . Table 4 . Table 4 .

In this case, the measurement uncertainty has relatively little effect on the ROC and PROC curves, on AUC, sensitivity, specificity, overall diagnostic accuracy, positive predictive value, negative predictive value, Euclidean distance and concordance probability of the test, in accordance with previous findings [30, 31] . Measurement uncertainty has a relatively greater effect on diagnostic odds ratio, on likelihood ratio for a positive or negative result, Youden's index and risk.

As a result, the measurement uncertainty has relatively little effect on the optimal diagnostic thresholds maximizing the Youden's index or the concordance probability or minimizing the Euclidean distance. Conversely, it has a relatively greater effect on the optimal diagnostic thresholds minimizing risk (Table 5) . Table 4 . Table 4 . Table 4 . Table 4 .

Diagnostics 2020, 10, x FOR PEER REVIEW 18 of 26 Figure 14 . DAM relative differences against prevalence plots. Plots of the relative difference of the (a) positive predictive value (PPV), (b) negative predictive value (NPV), (c) overall diagnostic accuracy (ODA) and (d) risk (R) of two diagnostic or screening tests measuring the same measurand with different uncertainties, against prevalence (v) curves, with the respective parameters in Table 4 . Table 4 . Table 4 .

Diagnostics 2020, 10, x FOR PEER REVIEW 18 of 26 Figure 14 . DAM relative differences against prevalence plots. Plots of the relative difference of the (a) positive predictive value (PPV), (b) negative predictive value (NPV), (c) overall diagnostic accuracy (ODA) and (d) risk (R) of two diagnostic or screening tests measuring the same measurand with different uncertainties, against prevalence (v) curves, with the respective parameters in Table 4 . Table 4 . Figure 16 . DAM partial derivatives against diagnostic threshold plots. Plots of partial derivatives of (a) overall diagnostic accuracy (ODA), (b) Youden's index (J), (c) positive predictive value (PPV) and (d) risk (R), with respect to measurement uncertainty, of two diagnostic or screening tests measuring the same measurand with different uncertainties, against diagnostic threshold (d) curves, with the parameters in Table 4 .

Diagnostics 2020, 10, x FOR PEER REVIEW 19 of 26 Figure 16 . DAM partial derivatives against diagnostic threshold plots. Plots of partial derivatives of (a) overall diagnostic accuracy (ODA), (b) Youden's index (J), (c) positive predictive value (PPV) and (d) risk (R), with respect to measurement uncertainty, of two diagnostic or screening tests measuring the same measurand with different uncertainties, against diagnostic threshold (d) curves, with the parameters in Table 4 . Table 4 .

In this case, the measurement uncertainty has relatively little effect on the ROC and PROC curves, on AUC, sensitivity, specificity, overall diagnostic accuracy, positive predictive value, negative predictive value, Euclidean distance and concordance probability of the test, in accordance with previous findings [30, 31] . Measurement uncertainty has a relatively greater effect on diagnostic odds ratio, on likelihood ratio for a positive or negative result, Youden's index and risk.

As a result, the measurement uncertainty has relatively little effect on the optimal diagnostic thresholds maximizing the Youden's index or the concordance probability or minimizing the Euclidean distance. Conversely, it has a relatively greater effect on the optimal diagnostic thresholds minimizing risk (Table 5 ).

The purpose of this program is to explore the relation between diagnostic accuracy measures and measurement uncertainty, as diagnostic accuracy is fundamental to clinical decision-making, while defining the permissible measurement uncertainty is critical to quality and risk management (c) likelihood ratio for a negative result (LR −) and risk (R); and (d) Euclidean distance (ED) and diagnostic odds ratio (DOR), of two diagnostic or screening tests measuring the same measurand with different uncertainties, with the respective parameters in Table 4 .

The purpose of this program is to explore the relation between diagnostic accuracy measures and measurement uncertainty, as diagnostic accuracy is fundamental to clinical decision-making, while defining the permissible measurement uncertainty is critical to quality and risk management in laboratory medicine. The current pandemic of the novel corona virus disease 2019 (COVID-19) has demonstrated these convincingly [32] [33] [34] [35] [36] [37] .

There has been extensive research on either diagnostic accuracy or measurement uncertainty; however, such research is very limited on both subjects [14, 38, 39] .

This program demonstrates the relation between the diagnostic accuracy measures and the measurement uncertainty for screening or diagnostic tests measuring a single measurand (Figures 3-17) . This relation depends on the population parameters, including the prevalence of the disease (Figures 5  and 14 ) and on the diagnostic threshold (Figures 4, 15 and 16 ). In addition, measurement uncertainty affects the relation between any two of the diagnostic accuracy measures (Figures 7-9 and 17 ).

As the program provides plots of the partial derivative of the diagnostic accuracy measures with respect to the standard measurement uncertainty, it offers a more detailed insight ( Figure 16 ). In antithesis to the complexity of the relation, the program simplifies its exploration with a user-friendly interface.

Furthermore, it provides calculators for the calculation of the effects of measurement uncertainty on the diagnostic accuracy measures and corresponding risk ( Figure 10 ) and for calculating the diagnostic threshold optimizing the objective and loss functions of Section 1 ( Figure 11 ).

The counterintuitive finding that the measurement uncertainty has relatively little effect on the ROC and PROC curves, on AUC, sensitivity, specificity, overall diagnostic accuracy, positive predictive value, negative predictive value, Euclidean distance and concordance probability suggests that we should reconsider their interpretation in medical decision-making. However, further research is needed to explore the effect of measurement uncertainty on diagnostic accuracy measures with different clinically and laboratory relevant parameter settings. Furthermore, clinical laboratories should consider including measurement uncertainty in each test result report.

Compared to the risk measure, a shortcoming of Youden's index, Euclidean distance of a ROC curve point from the point (0, 1) and concordance probability as objective functions is that they do not differentiate the relative significance of a true negative and a true positive test result or equivalently of a false-negative and a false-positive test result. Accordingly, in the case study, the optimal diagnostic thresholds maximizing the Youden's index or the concordance probability or minimizing the Euclidean distance are considerably less than the ADA diagnostic threshold for diabetes of the two-hour postprandial glucose during OGTT (Table 5) . Nevertheless, the optimal diagnostic threshold minimizing the risk can be close to the ADA threshold, with specific expected loss settings ( Figure 11 ). Although risk assessment is evolving as the preferred method for optimization of medical decision-making [40] and for quality assurance in laboratory medicine [41] , the estimation of expected loss for each test result (Tables 2 and 3 ) is still a complex task. In the future, as the potential of the data analysis will increase exponentially, expected loss could be estimated by using evidence-based methods.

Shortcomings of this program are the following assumptions used for the calculations:

1.

The existence of a "gold standard" diagnostic method. If a "gold standard" does not exist, there are alternative approaches for the estimation of diagnostic accuracy measures [42] .

The parameters of the distributions of the measurand are assumed to be known. In practice, they are estimated [43] .

The normality of either the measurements or their applicable transforms [17, 18, 44, 45] ; however, this is usually valid. There is related literature on the distribution of measurements of diagnostic tests, in the context of reference intervals and diagnostic thresholds or clinical decision limits [46] [47] [48] [49] [50] .

The bimodality of the measurands that is generally accepted, although unimodal distributions could be considered [51, 52] .

The measurement uncertainty homoscedasticity in the diagnostic thresholds range. If measurement uncertainty is heteroscedastic, thus skewing the measurement distribution, appropriate transformations may restore homoscedasticity [53] .

As the program neither estimates the parameters of the distributions of the measurand, nor calculates any confidence intervals, it is not intended to analyze samples of measurements, but to be used as an educational and research tool, to explore and analyze the relation between diagnostic accuracy measures and measurement uncertainty.

All major general or medical statistical software packages (Matlab ® , NCSS ® , R, SAS ® , SPSS ® , Stata ® and MedCalc ® ) include routines for the calculation and plotting of various diagnostic accuracy measures and their confidence intervals. The program presented in this work provides 269 different types of plots of diagnostic accuracy measures (Figure 2 ), many of which are novel. To the best of our knowledge, not one of the abovementioned programs or any other software provides this range of plots without advanced statistical programming.

The program developed for this work clearly demonstrates various aspects of the relation between diagnostic accuracy measures and measurement uncertainty and can be used as a flexible, user-friendly, interactive educational or research tool in medical decision-making, to explore and analyze this relation.

Measures of diagnostic accuracy: Basic definitions

The predictive receiver operating characteristic curve for the joint assessment of the positive and negative predictive values

Statistical approaches to the analysis of receiver operating characteristic (ROC) curves. Med. Decis. Mak

The meaning and use of the area under a receiver operating characteristic (ROC) curve

The area under the ROC curve and its competitors

The Area Over a Receiver Operating Characteristic (ROC) Curve as an Index of Diagnostic Inaccuracy: Wolfram Demonstrations Project

Index for rating diagnostic tests

The choice of methods in determining the optimal cut-off value for quantitative diagnostic test evaluation

Classification accuracy and cut point selection

Evaluation of Measurement Data-Guide to the Expression of Uncertainty in Measurement

Expression of Measurement Uncertainty in Laboratory Medicine; Approved Guideline

Basics of estimating measurement uncertainty

Total error vs. measurement uncertainty: Revolution or evolution?

Toward a framework for outcome-based analytical performance specifications: A methodology review of indirect methods for evaluating the impact of measurement uncertainty on clinical outcomes

Criteria for assigning laboratory measurands to models for analytical performance specifications defined in the 1st EFLM Strategic Conference

Comparing two diagnostic tests against the same "Gold Standard" in the same sample

The box-cox transformation technique: A review

A generalised Box-Cox transformation for the parametric estimation of clinical reference intervals

An Elementary Introduction to the Wolfram Language

Receiver Operating Characteristic Curves and Uncertainty of Measurement: Wolfram Demonstrations Project

Uncertainty of Measurement and Areas Over and Under the ROC Curves: Wolfram Demonstrations Project

Uncertainty of Measurement and Diagnostic Accuracy Measures: Wolfram Demonstrations Project

Analysis of Diagnostic Accuracy Measures: Wolfram Demonstrations Project

Calculator for Diagnostic Accuracy Measures: Wolfram Demonstrations Project

Correlation of Positive and Negative Predictive Values of Diagnostic Tests: Wolfram Demonstrations Project

Calculation of Diagnostic Accuracy Measures: Wolfram Demonstrations Project

Bimodality in blood glucose distribution: Is it universal? Diabetes Care

American Diabetes A. 2. Classification and diagnosis of diabetes: Standards of medical care in diabetes-2019

Influence of imprecision on ROC curve analysis for cardiac markers

Assessment of the Diagnostic Accuracy of Laboratory Tests Using Receiver Operating Characteristic Curves; Approved Guideline

Potential preanalytical and analytical vulnerabilities in the laboratory diagnosis of coronavirus disease 2019 (COVID-19)

The laboratory diagnosis of COVID-19 Infection: Current issues and challenges

Diagnosis of SARS-CoV-2 infection and COVID-19: Accuracy of signs and symptoms; molecular, antigen and antibody tests; and routine laboratory markers

Diagnostic accuracy of an automated chemiluminescent immunoassay for anti-SARS-CoV-2 IgM and IgG antibodies: An Italian experience

Unacceptable" that antibody test claims cannot be scrutinised, say experts

Antibody tests in detecting SARS-CoV-2 infection: A meta-analysis

Uncertainty in measurement and total error: Tools for coping with diagnostic uncertainty

Measurement uncertainty in laboratory reports: A tool for improving the interpretation of test results

Risk, complexity, decision making and patient care

Estimation of the optimal statistical quality control sampling time intervals using a residual risk measure

Estimating diagnostic accuracy without a gold standard: A continued controversy

The box-cox transformation: Review and extensions

An analysis of transformations

Approved recommendation (1987) on the theory of reference values. Part 5. Statistical treatment of collected reference values. Determination of reference limits

Reference interval computation: Which method (not) to choose?

Application of the stockholm hierarchy to defining the quality of reference intervals and clinical decision limits

A systematic review of statistical methods used in constructing pediatric reference intervals

Distinguishing reference intervals and clinical decision limits-A review by the IFCC Committee on Reference Intervals and Decision Limits

Principles and Practice of Screening for Disease

2.3 Clinical test evaluation. Unimodal and bimodal approaches. Scand

Why do we need the uncertainty factor?

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license