key: cord-0724407-5rryqwh4
authors: Islam, Sheikh Mohammed Shariful; Khosravi, Abbas
title: The need for a prediction model assessment framework
date: 2021-02-10
journal: Lancet Glob Health
DOI: 10.1016/s2214-109x(21)00022-x
sha: 89ec9d33b953fafff65f3c08a119c26383ec4c15
doc_id: 724407
cord_uid: 5rryqwh4

nan

The need for a prediction model assessment framework

We thank Mohammad Jalali and colleagues (December, 2020) 1 for highlighting the need for transparency assessments of COVID-19 models. The authors evaluated the transparency of COVID-19 models using a 27-point binary criterion adopted with the use of three different checklists, and reported that more than half of the studies did not share their longitudinal data, and only 14% of the studies met 90% of the transparency items on their checklist. However, the authors did not consider the MINimum Information for Medical artificial intelligence (AI) Reporting (MINIMAR), 2 which provides reporting standards for AI model projections in health care. Furthermore, the checklists they used are not specific for assessing the transparency of prediction models and it is not clear how these criteria were applied. Jalali and colleagues also argue that model developers are largely responsible for providing transparency and not journals, which we do not fully agree on, because journals also have an obligation to hold authors responsible when it comes to providing details of key materials and information. 3 The recently released Consolidated Standards of Reporting Trials (CONSORT)-AI and Standard Protocol Items: Recommendations for Interventional Trials (SPIRIT)-AI guidelines provide the reporting standards for clinical trials that use AI. 4 The plethora of guidelines highlights the need for a consensus on a framework for prediction model assessment with tangible indicators for model evaluation and reporting.

In addition to a transparency assessment, prediction models without external validation and a prospective assessment of the net use of the models for reproducibility undermines its scientific value. 5 Before a prediction model is used for public health decision making, it should be evaluated for its real-world performance through short-term and continuous model training and optimisation to avoid a distributional shift, such as model degradation because of a change in testing data. Moreover, it is essential to understand the model choices, amounts of complexity, and assumptions, including how the model accounted for various sources of uncertainty, namely how physical distancing, mask usage, and other covariates are defined and measured. 6 Further, it is necessary to quantify socioeconomic factors, population behaviours, and government actions taken and planned into estimating the model projections with details of analysis. Along with the proper documentation, the projection models should share codes, software dependencies, and datasets via open-source frameworks, namely GitHub, Code Ocean, and ModelhHub resources. Also, it is crucial to consider the different types of bias in data and modelling, namely, measurement bias, evaluation bias, and deployment bias.

COVID-19 projection models have been used widely for making public health planning and resource allocations. A scientifically validated COVID-19 project model might help health-care policy makers to better prepare for mitigating the effects of the pandemic, make informed decisions, and enact appropriate actions to save human lives. However, caution is needed to interpret these models. Otherwise, it could lead to an over-allocation or under-allocation of health-care resources, unnecessary suffering, and a mistrust in models. COVID-19 projection models have not yet reported the details of the data used for the development, training, and evaluation of the models, making it difficult to assess the bias and fairness of the model and its applicability. Recognising these limitations, there is a crucial need to develop a framework that includes transparency, reproducibility, and a prospective validation to evaluate COVID-19 projection models. A multidisciplinary task force including experts in infectious disease modelling, health informatics, data science, computer science, epidemiology, statistics, health-care administration, and policy making is needed to create a set of benchmark metrics for healthcare model evaluation.

Transparency assessment of COVID-19 models

MINIMAR (MINimum Information for Medical AI Reporting): developing reporting standards for artificial intelligence in health care

Transparency and reproducibility in artificial intelligence

Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension

Transparency, reproducibility, and validity of COVID-19 projection models

Mathematical models in the evaluation of health programmes