Discussion
The 25 prognostic ML studies assessed in this scoping review were overwhelmingly focused on asthma and the majority were supervised models. The studies were mainly limited by a lack of validation or prospective study, and the absence of equations or code for replication, which are major steps required for clinical implementation. Some recent studies used data from 1 to 2 decades ago, which may have limited relevance to current populations for which treatments and care have changed. Some of the models were opaque, uninterpretable models that used high numbers of predictors and did not explain the resulting predictions. This is especially important in healthcare since a clinician needs to know not only who is at risk, but also what they can do to change the outcome.
A large proportion of studies did not report on the handling of missing data, which does not provide transparency to evaluate whether sample populations are under-represented, for example, towards those who are sicker and have more data. Smaller datasets were typically derived from research studies, where there is greater control over the variables collected or the inclusion criteria for the study. However, ML methods were typically developed for large datasets, and studies using national/regional databases, EHRs, or data from daily home monitoring benefit from large samples likely more representative of wider populations.
External validations are necessary to understand the generalisability of the predictions; however, only one was conducted. In the study, similar clusters of children with CF developed from data in Canada were identified in data from the UK, providing evidence for the generalisability of the model.22 Internal validations were frequent, but their performance relies heavily on the definition of the outcome. If the outcome is somewhat subjectively captured, for example, prescription of medication, the resulting predictions are biased towards the subjective. This is highlighted in the two prospective studies that identified no patient benefit despite good model performance during development.24 29 If the models are trained on data where the outcome is influenced by clinician decision, it is unsurprising that the models would not outperform a clinician. While these models may benefit areas of healthcare such as easing/increasing clinician workflow, objectively captured outcomes such as chest imaging, lung function, or physiological data may result in models with greater patient benefit.
This scoping review was limited in that the studies were not assessed with the full TRIPOD guidelines, and bias and clinical applicability were not assessed with the full Prediction model Risk Of Bias ASsessment Tool46 guidelines. A summarised reporting checklist was instead used, which investigated the articles at an overarching level rather than a granular level to identify key themes. Even without detailed assessment using the full reporting checklists, the summarised checklist revealed that studies still largely failed to report on or carry out key metrics, and thus more granular investigation at this point was not required to identify shortcomings in model reporting. Development of ML prediction models is still an unexplored area of research in paediatric CRDs other than asthma, highlighted here by a lack of studies identified in other respiratory conditions. As research into these areas continues, and as ML prediction studies in paediatric CRDs are becoming more frequent (72% published since 2018), it is important that the models are rigorously developed. A quality assessment tool for artificial intelligence-centered diagnostic studies is currently being developed, and combined with the TRIPOD guidelines for prediction studies will be useful for designing future ML prediction models with clinical implications.47
Further considerations
The lack of model implementation is a point of discussion in healthcare generally, and in addition to model development and reporting require regulatory, clinical and ethical frameworks.18 48–51 A hypothetical pathway for ML model development using these frameworks is summarised in figure 4.
Figure 4Hypothetical framework for developing machine learning (ML) prediction models in healthcare. EHR, electronic health record; FDA, Food & Drugs Administration; MHRA, Medicines and Healthcare products Regulatory Agency; PROBAST, Prediction model Risk Of Bias ASsessment Tool; TRIPOD, Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis.
Regulatory
The level of regulation and approval required for a prediction model can depend on its complexity, where more complicated, uninterpretable models are classed as a medical device and must be approved by relevant governing bodies, such as the Food & Drugs Administration (FDA) in the USA, or the Medicines and Healthcare products Regulatory Agency in the UK. A recent online database suggests that 64 AI/ML-based models have been FDA approved since the first one in 2016, which are predominantly within the fields of radiology and cardiology.52 Alternatively, simple models that prevent overfitting while improving interpretability may require less regulation if classed as a simple calculator. Depending on local regulations, these simple models may be employed using an app, online calculator, or hosted on platforms such as Programmable Interface for Statistical & Simulation Models (https://resp.core.ubc.ca/research/Specific_Projects/PRISM).
Regulatory pathways for AI-based and ML-based medical devices in the USA and Europe are outlined in Muehlematter et al.53 Generally, prospective studies and RCTs are the standards used by regulators to provide evidence for clinical decision-making.19 However, RCTs are time consuming and costly, which may explain why few were identified in this review. It has been suggested that observational real-world data from EHRs may be adequate to evaluate the performance of a ML model, and that while useful, prognostic studies and RCTs should not be solely relied on to bring ML to clinical care.54 55 Conversely, there is discussion that observational studies are less rigorous and have discrepant results to RCTs and should never be used to infer patient benefit.56 However, as EHRs and big data in healthcare accumulate and become increasingly representative of wider populations, it seems appropriate that methods to evaluate clinical effectiveness from observational data are given due consideration and acknowledged as a valuable resource complementary to RCTs. Appropriate design and methodology relating to evaluation of ML models in any RCT to evaluate their clinical utility will be an important discussion moving forward, and mutually agreed on guidelines by regulators and clinicians for model evaluation in EHR studies is necessary.
Patient safety, accountability and liability are further major considerations for implementation. A recent review suggested that the allocation of responsibility in ML models is not clear, and stronger guidelines are necessary to understand which stakeholders are responsible should a ML model contribute to patient harm.57 Decision support tools, which aid clinicians in their assessment of disease severity through associated risks, may require less accountability than decision making tools, where the model becomes automated and suggests or delivers treatments depending on thresholds of biomarkers or symptoms.48 Decision support tools are more likely to be fully realised in the short term over decision-making tools, since a clinician still acts as the final decision maker and is ultimately responsible. Without clearer regulations surrounding accountability and liability, and clearer frameworks for determining patient safety and benefit of ML models, the potential for implementation of decision-making tools is yet to be fully realised given the high risk of an erroneous prediction.58
Clinical
Implementation also requires the confidence of clinicians, and clinician involvement during model development is essential. Especially in respiratory disease, prior research has generated ample knowledge on contributors of poor outcomes, which should not be ignored in model development or assessment. Combining clinical knowledge with ML may improve both performance and clinical trust in models, better facilitating their adoption in clinical care.
There is currently a lack of knowledge translation and implementation science between data scientists and clinicians, which are needed to be integrated into model development. Qualitative research may be necessary to gauge acceptance and potential utility of predictive models before they are developed.
Ethical
ML algorithms have been known to amplify or create health disparities among marginalised groups. Ethical concerns can arise at every step of ML model development, including the selection/funding of the problem, collection of data, definition of the outcome, algorithm development and algorithm monitoring post deployment.59 These issues can arise from inconsistencies in access to healthcare or under representation of certain groups in particular centres, which is reflected in the data used to train models. Including variables directly in the model to account for marginalised groups, such as gender or ethnicity, is not always the best practice and may perpetuate the biases. A review detailing a roadmap for responsible and ethical ML in healthcare is useful for addressing some of these concerns.60 Diversity, equity and inclusion should be considered at every step of ML model development.
Opportunities with EHRs
The opportunity for ML to support clinical decisions has been pronounced through the adoption of EHR in healthcare systems.61 EHRs are often unstructured and inconsistently captured; however, they are a rich, real-world source of vast amounts of clinical data useful for uncovering meaningful patterns. Data infrastructure plays a key role in harnessing EHRs to enable the extraction, processing and analysis of large volumes of data. Feasibility and interoperability between data systems are important for this process, and standards such as fast healthcare interoperability resources (FHIR) should be considered (https://www.hl7.org/fhir/).
With appropriate infrastructure, a streamlined process between data capture, analytics and implementation can exist to predict outcomes for patient data at a new clinical encounter or visualise patient trajectories over time to support or inform clinical practice (figure 5). As EHR data grow large over time, the algorithms can and should be updated to reflect newer cohorts or include new information. The process is easily severed if steps for implementation are not considered or followed through, which risks an abundance of models that fail to be implemented into clinical practice. It is therefore necessary that models are developed to be generalisable, unbiased and interpretable with good clinical performance, and consider regulatory, clinical and ethical frameworks for implementation.
Figure 5Opportunity for machine learning (ML) in clinical care with the availability of electronic health records (EHR) systems. Patient data at each clinical encounter is stored in secured databases or analytical platforms, which have the capacity or potential to be accessed by researchers. Open-source software in healthcare means collaboration is more feasible in developing ML models. Following the appropriate steps for regulation and implementation, the resulting algorithms can be fed back into EHR systems to calculate the risk of poor outcomes for incoming data from a new clinical encounter. This can be accessed by clinicians to support or inform clinical care and decision-making. The analytical approach also has the potential to merge data from external sources, such as research or wearable devices to improve model performance. Furthermore, patients can often now access their own EHR data through apps and online patient portals. This has the potential to display the results of individual calculated risks should it be considered appropriate and the appropriate regulatory or governance processes applied.