Discussion
We analysed the accuracy with which admission CURB-65, NEWS2 and qSOFA scores predict ICU admission, early in hospital mortality and all-cause mortality within 30 days of hospital admission in the context of COVID-19. In general, calibration was poor as all three scores underestimated the risk of adverse outcomes. CURB-65 and qSOFA both performed poorly in comparison to their respective standard applications, suggesting their utility is limited in COVID-19. In contrast, NEWS2 and CURB-65 were better at predicting early death, defined here as death of <72 hours, where a NEWS2 threshold of 5 (the recommended threshold for urgent intervention) showed excellent NPV. We went on to derive two sets of parameters which, when combined on admission with COVID-19, may provide a more accurate prediction of mortality and may provide a useful basis for predictive scores to be validated in larger datasets.
CURB-65 was derived and validated to predict low, moderate and high risk of death at 30 days in patients with CAP and therefore assisted healthcare workers in deciding who required admission to the hospital. Pre-COVID validation demonstrated a score of <2 was consistently associated with low mortality rates of 0.4%–2.8% at 30 days.6 15 However, here in the COVID-19 setting, a CURB-65 score of <2 was associated with a mortality of 17%. In clinical practice, its utility has been expanded to support decision making around antimicrobial prescribing and escalation of care. We therefore performed an exploration of CURB-65 performance with respect to ICU admission and early mortality prediction. Our data suggest low CURB-65 scores may not support early COVID-19 discharge, but higher scores may still have value in predicting particularly poor outcomes; CURB-65 scores of ≥3 were associated with death in 60% of cases, compared with just 22% in the pre-COVID era.5 On that basis, high scores could prompt early escalation planning and inform discussions with patients and their families.
In the pre-COVID era, at the time of presentation to hospital with CAP, it was extremely unusual for the medical team to know the causal pathogen, and although it was well recognised that there was a range of virulence among the possible viral and bacterial causes of CAP, the data presented here confirm that SARS-CoV-2 is a highly virulent outlier. This finding has particular relevance to the evolving pandemic since, as transmission reduces, SARS-CoV-2 will become one of numerous endemic causes of CAP in many countries. It will therefore be important to recognise that this will reduce the performance of CURB-65 on undifferentiated CAP cases and makes a strong case for the implementation of rapid diagnostics to determine the aetiology of CAP.
NEWS2 has been widely implemented in English hospitals as a simple score consisting of routine physiological measurements. While it is most widely recognised as a simple tool to identify inpatients in need of urgent or emergent medical attention based on changing physiological measurements, it is also often used in the emergency department, where it has been validated in a number of syndromes, including sepsis and acute dyspnoea.16 17 We found that a NEWS2 score of <5 accurately identified a group of patients as a low-risk group for early mortality; however, it was less successful when 30-day mortality was considered. Our findings are based on a single measure of NEWS2 on admission, and future longitudinal work would be needed to confirm if established NEWS2 trigger thresholds remain valid for inpatients with COVID-19.
qSOFA was validated for use among hospital inpatients and emergency department admissions as a simple and accurate way to identify people with infections at higher risk of poor outcomes.4 11 18 Early data from China suggested those who survived COVID-19 had lower qSOFA scores, a finding replicated here.19 However, in our study, the median qSOFA in those who died within 30 days was <2, and mortality in this 'low-risk' qSOFA group was 32.5%. Taken together, the poor overall discriminatory ability of qSOFA and the poor diagnostic performance seen here suggests a qSOFA score on admission is not a useful prognostic tool in COVID-19. These findings are supported by a recent study which found qSOFA was low in people with COVID-19 admitted to critical care and could not reliably identify those at risk of death.20
The poor performance of qSOFA is interesting, given it was derived from cohorts of patients with sepsis, a syndrome defined as a ‘life-threatening organ dysfunction due to a dysregulated host response to infection’. It would be expected that many with COVID-19-associated mortality would meet that definition on the basis of respiratory failure. However, the striking difference between the physiology of bacterial sepsis and severe COVID-19 is that cardiovascular instability is rare in COVID-19.18 In our modelling of individual variables of CURB-65, we found that unlike the respiratory components, blood pressure was not independently associated with adverse outcome. Similarly, confusion, often a sign of haemodynamic compromise, was less relevant to outcomes in the CURB-65 score.
Blood pressure and mental status are integral components of the qSOFA score and in other contexts contribute to its ability to prognosticate; thus, the poor performance of qSOFA is explained by the limited effect of COVID-19 on these physiological parameters. These findings suggest COVID-19-associated mortality may be mediated by different mechanisms than conventional bacterial sepsis. An example of this may be the profound endothelial injury and abundant microthrombi identified in a recent postmortem study.21
Given the limited performance of the previously validated and widely used scores seen here, we explored whether performance could be improved by deriving new models. Using multiple logistic regression, we derived new models, CUCAF-SR and CUCA-SF, in an attempt to predict 30-day and 72-hour mortality, respectively. In keeping with the findings described earlier, markers of cardiovascular compromise were not independently associated with poorer outcomes, with markers of respiratory function, age and frailty appearing more relevant. This finding is supported by the findings of the ISARIC study, where respiratory function (respiratory rate and oxygen saturations), age and comorbidities, but not cardiovascular parameters, were important constituents of the 4C mortality score.22
Some limitations must be addressed. First, we only included a 2-week period, and it is possible demographics and outcomes may change across the course of the COVID-19 pandemic.23 Reassuringly, the characteristics and outcomes in the study population seen here are in keeping with those reported by the ISARIC study, one of the largest studies in this setting to date. For example, the median age here was 70 years compared with 72 years in the ISARIC study; 61% were male here compared with 59.9% in ISARIC; 17.1% here were admitted to critical care compared with 17% in ISARIC; and we observed 34% 30-day mortality, which is comparable to the hospital case fatality rate of 33.6% reported by ISARIC.24 25
A further limitation here is the inclusion only of those people admitted to hospital, thus excluding those well enough to be discharged from the emergency department. As a consequence, the observed risk presented among low-risk categories seen here may, in theory, be inflated; for example, it is possible only the sickest patients with CURB-65 scores of 0–1 were admitted. However, the main derivation dataset for the CURB-65 score included only patients admitted to the hospital, replicating the methods here.5 Similarly, a large validation study that included both admissions of those discharged directly from the emergency department found that 30-day mortality in those admitted to hospital with a low-risk CURB-65 score remained low at 0.0%–1.6%.15 This supports the conclusion that the observed high mortality among low CURB-65 scores in this study was due to SARS-CoV-2 virulence rather than study design. Conversely, some patients presented to the emergency department in a moribund state and did not survive long enough for a viral swab to be taken. Such patients were not included in our study, and generalisability to that small subset of patients may be limited. Data collection here did not include assessment of detailed patient demographics or comorbidities, but instead focused on clinical measurements normally taken at presentation to the hospital. Characteristics such as obesity, ethnicity and comorbidities are reported to be relevant to COVID-19 outcomes but are not included here.26 27 It may be that a combination of clinical parameters and patient characteristics is more informative than either in isolation and validation of such an approach is required.28 We defined ‘early mortality’ as death occurring within 72 hours of admission in order to capture patients who deteriorated quickly, but within a time frame that would allow them to have a known SARS-CoV-2 status and be identified by our investigators. This 72-hour timepoint differs from the 24-hour timepoint used in the NEWS2 score’s validation studies but, given the above constraints, was considered a pragmatic approach for our analysis. Finally, ICU admission is not appropriate for all patients, and our analysis of ICU admissions may be prone to unmeasured confounding in that regard.
The strengths of this study lie in the prospective collection of data on consecutive admissions from multiple regional hospitals with rigorous assessment of the performance of each score. These readily available data were compiled from real-world clinical assessment, and outcomes followed usual clinical care. We also demonstrate the hitherto underappreciated potential of highly trained and motivated specialty trainees and their ability to coordinate and collaborate for research.