Article Text

Predicting death from COVID-19 using pre-existing conditions: implications for vaccination triage
  1. Shujie Xiao1,2,
  2. Neha Sahasrabudhe1,2,
  3. Samantha Hochstadt1,2,
  4. Whitney Cabral1,2,
  5. Samantha Simons1,2,
  6. Mao Yang1,2,
  7. David E Lanfear1,2 and
  8. L Keoki Williams1,2
  1. 1 Center for Individualized and Genomic Medicine Research (CIGMA), Henry Ford Health System, Detroit, Michigan, USA
  2. 2 Department of Internal Medicine, Henry Ford Health System, Detroit, Michigan, USA
  1. Correspondence to Dr L Keoki Williams; kwillia5{at}


Introduction Global shortages in the supply of SARS-CoV-2 vaccines have resulted in campaigns to first inoculate individuals at highest risk for death from COVID-19. Here, we develop a predictive model of COVID-19-related death using longitudinal clinical data from patients in metropolitan Detroit.

Methods All individuals included in the analysis had a laboratory-confirmed SARS-CoV-2 infection. Thirty-six pre-existing conditions with a false discovery rate p<0.05 were combined with other demographic variables to develop a parsimonious prediction model using least absolute shrinkage and selection operator regression. The model was then prospectively validated in a separate set of individuals with confirmed COVID-19.

Results The study population consisted of 15 502 individuals with laboratory-confirmed SARS-CoV-2. The main prediction model was developed using data from 11 635 individuals with 709 reported deaths (case fatality ratio 6.1%). The final prediction model consisted of 14 variables with 11 comorbidities. This model was then prospectively assessed among the remaining 3867 individuals (185 deaths; case fatality ratio 4.8%). When compared with using an age threshold of 65 years, the 14-variable model detected 6% more of the individuals who would die from COVID-19. However, below age 45 years and its risk equivalent, there was no benefit to using the prediction model over age alone.

Discussion Using a prediction model, such as the one described here, may help identify individuals who would most benefit from COVID-19 inoculation, and thereby may produce more dramatic initial drops in deaths through targeted vaccination.

  • COVID-19
  • viral infection

Data availability statement

All data relevant to the study are included in the article or uploaded as online supplemental information.

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Key messages

  • Prioritisation for COVID-19 vaccination has largely been based on age; it is not known if using additional comorbidity information can improve targeting high-risk individuals without substantial increasing the numbers needing to be vaccinated.

  • We show that using comorbidities can substantially improve identifying individuals likely to die from COVID-19 if infected over using age alone as a predictor; however, the relative benefit of this added information disappears below the risk equivalent of age 45 years.

  • Our prediction algorithm was developed in a large and diverse patient population from metropolitan Detroit with electronic data on longitudinal care; hence, we were able to build and validate a prediction model of COVID-19-related death that is broadly generalisable and easily applied.


The COVID-19 pandemic, caused by SARS-CoV-2, has exceeded 32 million cases and a half million deaths in the USA.1 Real-world vaccination effectiveness studies suggest that the mRNA-based vaccines are highly effective in preventing symptomatic disease—up to 82% after one dose and 94% after two doses.2 Even with the recent overwhelming spread of the SAR-CoV-2 Delta variant, vaccination is associated with a >11 times lower age-standardised incidence rate ratio for death.3 To date over 7 billion vaccine doses have been administered, yet the distribution of these doses has been skewed—approximately 65% of individuals in high-income countries have been vaccinated in contrast to 6.5% in low-income countries.4

Given limited supply of vaccines, immunisation roll-out strategies have prioritised high-risk individuals; this prioritisation has been largely based on patient age.5 6 Vaccination prioritisation could be improved through better algorithms to identify individuals at highest risk of death once infected. Here, we leverage detailed longitudinal clinical information of pre-existing comorbidities to develop a prediction model of COVID-19-related death in a racially diverse patient population from southeast Michigan. It is hoped that the information gleaned from this well-characterised patient population can inform COVID-19 severity prediction and vaccination roll-out strategies elsewhere.


Patient and public involvement

This study was developed to identify individuals at highest risk for COVID-19 death to help target individuals for early immunisation. For the purposes of this study, we use terms African American and European American to refer to individuals who self-identified as non-Hispanic black and non-Hispanic white, respectively.

Model development

The prediction models were developed in accordance to Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis guidelines.7 We used medical records from patients receiving care at the health system to identify adults with the following characteristics: age ≥20 years, a PCR confirmed SARS-CoV-2 infection, and ≥1 outpatient visit 2 years to 1 month before the first positive SARS-CoV-2 test was collected (ie, the index date). Online supplemental figure 1 illustrates how individuals were identified and how their data were used for developing and testing the prediction model. A COVID-19-related death was defined as one occurring during a hospital admission or within 14 days of hospital discharge for a COVID-19 infection (n=805). We also included 89 patients who died outside of the hospital and whose last diagnosis of record was a COVID-19 infection (n=89).

Supplemental material

There were 15 502 laboratory confirmed COVID-19 cases with 894 related deaths (13 686 cases from 2020 and 1816 cases from 2021); index dates ranged from 12 March 2020 to 21 February 2021. We used 85% of the cases from 2020 (n=11 635 with 709 deaths) to develop the prediction model (training set), and we randomly set aside a group of 2051 cases (with 123 deaths) from 2020 and 1816 cases (with 62 deaths) from 2021 to measure model performance (testing set). The distinction by year of disease onset was done to ensure that the prediction model was robust to the changing characteristics of the epidemic and the patient populations affected.

All primary encounter diagnoses within the health system between March 2018 and January 2021 were categorised into 133 separate clinical categories, and each category consisted of multiple International Classification of Diseases, 10th Revision (ICD-10) codes. A pre-existing condition was defined as receiving at least two diagnoses (ICD-10 codes) within a category between 2 years prior to 1 month prior to the index date. Conditions in which ≤1 individual with COVID-19 was affected or conditions confined to only one sex (eg, pregnancy, erectile dysfunction and menopause) were not analysed. This resulted in 78 separate pre-existing conditions available for assessing association with COVID-19-related death. Patient age, sex, race-ethnicity, smoking status, body mass index (BMI), serum creatinine values and pre-existing conditions (the latter restricted to those with a false discovery rate adjusted p<0.05), were used as initial input in constructing the prediction model. Least absolute shrinkage and selection operator (LASSO) regression was used to select a parsimonious set of predictor variables for COVID-19-related death. LASSO regression was used since it can handle overfitting and multicollinearity8 9; less influential coefficients are shrunk to zero.8 The penalty parameter (lambda=0.01) for LASSO regression was selected to optimise performance in 10-fold cross validation.9 10

A risk score (RS) was constructed from the variables selected via LASSO; this score was assessed for its predictive performance in the set aside group of 3867 individuals. Variable weights were transformed by multiplying each coefficient by 1000; this ensured that all model weights were >1. Each individual’s composite RS was calculated by summing the weighted variable results. Analyses were conducted using the statistical software R11; the R packages glmnet and caret were used for LASSO regression and for calculating the confusion matrix, respectively.12 13


As shown in table 1, the average age of the 15 502 study individuals was 56.0 years (SD=18.0 years), and 9118 (58.8%) were female. The race-ethnic breakdown included 9176 (59.2%) European Americans, 4117 (26.6%) African Americans, 609 (3.9%) Latinos and 284 Asians (1.8%). Overall, 894 of the 15 502 individuals with laboratory-confirmed SARS-CoV-2 died from COVID-19 (case fatality ratio of 5.8%). The demographic characteristics of the individuals who died of COVID-19 differed from those who survived. When compared with those who survived, individuals who died tended to be older (76.95 years vs 54.73 years), were more likely to be male (54.7% vs 40.4%), had higher serum creatinine levels (1.58 mg/dL vs 0.99 mg/dL) and had a history of smoking (60.7% vs 40.8%). Patients in the training and testing sets were characteristically similar. A total of 709 of the 11 635 individuals used in the train set died of COVID-19 (case fatality ratio 6.1%), and 185 of the 3867 individuals in the testing set died of COVID-19 (case fatality ratio 4.8%).

Table 1

Characteristics of patients with COVID-19 stratified by analysis group and survival status*

The prediction model was developed and trained in a randomly selected set of 11 635 SARS-CoV-2-infected individuals. The demographic and clinical variable association results from model development are shown in both table 2 and online supplemental table 1 of the online supplement. Age was the most significant predictor for COVID-19-related death (p=1.76×10−144), but male sex (p=2.68×10−5), African American race (p=5.94×10−5), a history of smoking (p=3.28×10−6) and higher serum creatinine values (p=7.28×10−16) were also significantly associated. BMI was not associated with COVID-19-related death after adjusting for the above variables (p=0.675). Thirty-six pre-existing conditions were also associated with COVID-19-related death with a false discovery rate adjusted p<0.05. The most significant pre-existing conditions were a history of respiratory failure (p=1.22×10−18) and congestive heart failure (p=1.27×10−17). A final set of fourteen variables were selected for the prediction model (table 2), resulting in the following RS calculation:

Supplemental material

Embedded Image

Table 2

Variable selection for a prediction model of COVID-19-related death among individuals with laboratory confirmed SARS-CoV-2 infection

Test performance was assessed in a separate group of 3867 individuals (figure 1 and table 3). The optimum cut-point, defined by the Youden index,14 was a RS of ≥6685.4 in the 14-variable model and an age of ≥68 years in the age-only model. An age threshold of ≥65 years, the age used by many states to define early eligibility for vaccination,15 had a sensitivity of 83.2%, a specificity of 70.2%, a positive predictive value (PPV) of 12.3% and a negative predictive value (NPV) of 98.8%. In comparison, the 14-variable RS with the same specificity of 70.2% (RS ≥6646.5), had a sensitivity of 89.2%, a PPV of 13.1% and an NPV of 99.2%. At an age threshold of ≤45 years, there was no difference in sensitivity using the 14-variable RS with the same specificity (RS ≤5304.8).

Figure 1

Receiver operating characteristic (ROC) curves demonstrating the performance of two models to predict COVID-19-related deaths among individuals with laboratory confirmed SARS-CoV-2 infection (n=3867) from southeast Michigan and the Detroit metropolitan area. The black line denotes the 14-variable prediction model with black circles representing risk score thresholds. The grey line denotes the age-only prediction model with grey circles representing age thresholds. Red circles represent the Youden index (ie, the point that maximises Sensitivity +Specificity – 1). The area under the curve (AUC) for the 14-variable ROC curve was 0.868 (0.846–0.891), and the AUC for the age-only ROC curve was 0.846 (0.821–0.871).

Table 3

Differences in model performance at fixed specificity between the age-only and the 14-variable prediction models for COVID-19-related death among individuals with laboratory-confirmed SARS-CoV-2 infection*


Vaccines have been effective at reducing COVID-19 severity and death,16 yet global supplies are still limited, particularly in developing countries.4 Even in countries with ready access to vaccination, uptake has been insufficient to halt SARS-CoV-2 spread and a resurgence of infections and deaths. For example, in the US more than 40% of individuals are not fully vaccinated. This underscores the continued importance of targeting high risk individuals in the initial stages of vaccine roll-out, as well as for uptake once supply needs are met.

Wynants et al performed a systematic review of existing prediction models of COVID-19-related outcomes, but found a number of deficiencies in the existing literature.17 Of the 107 articles on COVID-19 prognostic models reviewed, 39 were for predicting mortality. Problematic issues in the existing literature included small study sizes and high potential bias (eg, by not adhering to prediction model reporting standards, using proxy measures for outcomes, and including study individuals not reflective of the larger target population). However, the review did identify three studies with uncertain bias but large sample sizes.18–20 Nevertheless, only two of these studies predicted COVID-19 mortality,19 20 and these were among individuals already severe enough to be admitted to the hospital.

In contrast, our prognostic score may be useful in identifying high-risk individuals based on pre-existing conditions (ie, characteristics that predispose to dying from COVID-19 prior to becoming infected). Design features which bolster the importance of our findings include using separate large and racially diverse groups for model development and validation, restricting cases to those with laboratory-confirmed SARS-COV-2 diagnoses, drawing on an extensive longitudinal record of pre-existing clinical conditions, and accounting for COVID-19-related deaths both within and outside of the hospital. In this regard, our RS represents a valuable tool to identify individuals at greatest risk from dying of COVID-19 and thus could inform vaccination roll-out schemas. For example, as compared with using an age cut-off of ≥65 years alone, our study found that incorporating pre-existing comorbidities could identify 6% more of the individuals who would die from COVID-19 if infected without increasing the total number of individuals deemed ‘high risk’ (ie, improved sensitivity with the same specificity). Conversely, our data suggest that once high-risk individuals (RS ≥5304.8) and individuals aged ≥45 years have been vaccinated, additional triage based on age or risk score among adults is not needed.

Our study should be considered in light of potential limitations. First, all the participants were recruited from a single health system in southeast Michigan. While this may limit the generalisability of our prediction model, it is important to note that our study population included all documented cases of SARS-CoV-2 infection with the health system. As a result, we broadly captured the diversity of the Detroit metropolitan area. Second, as this is an observational study, it is possible that our model missed other important predisposing clinical conditions. Nevertheless, the large number of conditions that we considered (ie, diagnoses made over nearly 3 years for the entire covered patient population) makes it is unlikely that we missed common diseases with large effects. However, the large number of variables that we evaluated simultaneously could also result in erroneous parameter estimation via multicollinearity, as has been observed elsewhere.21 To mitigate the effect of multicollinearity, we used LASSO regression. This penalised regression method constrains the degree of parameter inflation, selecting some variables for model inclusion while shrinking the parameter estimates of others to zero. In so doing, LASSO regression can improve model prediction accuracy while limiting the number of variables to those with the strongest effects.8 Third, the factors that predispose to COVID-19-related death, may not have the same relationship to vaccine response. For example, older age, smoking and obesity have been associated with lower response to SARS-CoV-2 vaccination.22 23 Unfortunately, in this study, we did not have measures of vaccine response; hence, we could not incorporate this into our model of COVID-19 mortality prediction. On the other hand, vaccines against SARS-CoV-2 were not widely available for the vast majority of our observation period. Therefore, it is highly unlikely that vaccination status confounded our risk model of COVID-19-related death. Lastly, the COVID-19 pandemic has been ever changing with new viral variants emerging rapidly.24 25 This rapid evolution has implications on vaccine response, breakthrough infections and viral virulence.26 27 To partially address this issue, we evaluated the performance of our model in patients first infected in 2020 and in 2021; our model produced similar results (data not shown). Therefore, it is possible that the predictive ability of our model may change over time, but we did not observe a noticeable difference in our time window.

In conclusion, we have developed a prediction model of COVID-19-related death using pre-existing patient characteristics and comorbidities. Since our model was based on a large, diverse and well-characterised patient population, we believe that the resulting prediction equation may be broadly suited to identifying individuals at high risk of COVID-19 death. Our model suggested that while age is an important and dominant risk factor for COVID-19-related death, if used alone to determine vaccine prioritisation, it would miss a substantial portion of high-risk individuals (ie, persons who would receive the largest risk benefit from vaccination).

Data availability statement

All data relevant to the study are included in the article or uploaded as online supplemental information.

Ethics statements

Patient consent for publication

Ethics approval

This study involves human participants and was approved by approved by the Institutional Review Board (IRB) of Henry Ford Health System. No approval ID provided.This study was approved by the Institutional Review Board (IRB) of Henry Ford Health System. The IRB permitted a waiver of individual consent to use and analyze longitudinal clinical records of health system patients in order to build a prediction model of COVID-19-related death. This waiver was predicated on the use of the data involving no more than minimal risk to study subjects (ie, data were collected as part of clinical care), that the research could not be practicably performed without a waiver, and that the waiver didn’t negatively affect the rights or welfare of study subjects. Given the rapidly evolving pandemic, it was also not possible to include patients or the public in the design or conduct of the study or in the reporting or dissemination of this work.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Contributors SX and LKW conceived the work; SX, NS, SH, WC, SS, MY, DEL and LKW were involved in either acquiring, analysing or interpreting the data for the work; SX, NS, DEL and LKW were involved in drafting the work; SX, NS, SH, WC, SS, MY, DEL and LKW revised the work for important intellectual content; all authors gave final approval of the version to be published; and all others agree to be accountable for all aspects of the work ensuring that questions related its accuracy or integrity are appropriately investigated and resolved.

  • Funding This work was supported by the Fund for Henry Ford Hospital (DEL and LKW) and from the following institutes of the National Institutes of Health: National Institute of Allergy and Infectious Diseases (R01AI079139 to LKW), the National Heart Lung and Blood Institute (R01HL103871 and R01HL132154 to DEL and R01HL118267, R01HL141845, and X01HL134589 to LKW) and the National Institute of Diabetes and Digestive and Kidney diseases (R01DK113003 to LKW).

  • Competing interests DEL reports serving as a consultant for Amgen, Janssen, Ortho Diagnostics, DCRI (Novartis), Cytokinetics and Martin Pharmaceuticals and having participated in the running clinical trials for Amgen, Bayer, and Janssen; these activities are unrelated to the subject matter of the current manuscript. LKW reports owning stock in companies which produce SARS-CoV-2 vaccines; there was no transactional relationship related to this manuscript. None of the other authors report any competing interests.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.