Article Text

Predictive signature of murine and human host response to typical and atypical pneumonia
  1. Matthew McCravy1,
  2. Nicholas O’Grady1,
  3. Kirin Khan1,
  4. Marisol Betancourt-Quiroz1,
  5. Aimee K Zaas1,
  6. Amy E Treece1,
  7. Zhonghui Yang1,
  8. Loretta Que1,
  9. Ricardo Henao2,
  10. Sunil Suchindran1,
  11. Geoffrey S Ginsburg1,
  12. Christopher W Woods1,
  13. Micah T McClain1 and
  14. Ephraim L Tsalik1
  1. 1Department of Medicine, Duke University School of Medicine, Durham, North Carolina, USA
  2. 2Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, North Carolina, USA
  1. Correspondence to Dr Matthew McCravy; matthew.mccravy{at}duke.edu

Abstract

Background Pneumonia due to typical bacterial, atypical bacterial and viral pathogens can be difficult to clinically differentiate. Host response-based diagnostics are emerging as a complementary diagnostic strategy to pathogen detection.

Methods We used murine models of typical bacterial, atypical bacterial and viral pneumonia to develop diagnostic signatures and understand the host’s response to these types of infections. Mice were intranasally inoculated with Streptococcus pneumoniae, Mycoplasma pneumoniae, influenza or saline as a control. Peripheral blood gene expression analysis was performed at multiple time points. Differentially expressed genes were used to perform gene set enrichment analysis and generate diagnostic signatures. These murine-derived signatures were externally validated in silico using human gene expression data. The response to S. pneumoniae was the most rapid and robust.

Results Mice infected with M. pneumoniae had a delayed response more similar to influenza-infected animals. Diagnostic signatures for the three types of infection had 0.94–1.00 area under the receiver operator curve (auROC). Validation in five human gene expression datasets revealed auROC of 0.82–0.96.

Discussion This study identified discrete host responses to typical bacterial, atypical bacterial and viral aetiologies of pneumonia in mice. These signatures validated well in humans, highlighting the conserved nature of the host response to these pathogen classes.

  • Pneumonia
  • Sensitivity and Specificity
  • Respiratory Infection
  • Bacterial Infection
  • Viral infection

Data availability statement

Data are available in a public, open access repository. Raw data are available through the gene expression omnibus under GEO set number GSE214051. Further data requests and questions should be forwarded to the first author.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

WHAT IS ALREADY KNOWN ON THIS TOPIC

  • Gene expression signatures show promise to differentiate between categories of pneumonia. It is not known if mouse models can be used to develop these signatures.

WHAT THIS STUDY ADDS

  • This study shows that a predictive pneumonia signature developed in mice retains excellent performance in humans.

HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY

  • Future studies will optimise this signature in other models of infection and apply it to humans in a clinical trial.

Introduction

Despite advancements in critical care and antimicrobial development, community-acquired pneumonia (CAP) continues to be a significant source of infectious illness worldwide.1 In the USA, pneumonia is the eighth-leading cause of death overall and the number one cause of death from infectious disease.2–4 The signs and symptoms associated with CAP are highly variable, often impacted by the causative organism and patient variables. Appropriate antibiotic treatment requires confirmation of an infectious aetiology, followed by the pathogen class (eg, bacterial, viral, atypical bacterial) and ideally, the pathogen itself.5–7 Traditionally, pathogen identification has required culturing the whole organism from a patient sample. Non-culture diagnostics, such as the pneumococcal urinary antigen test and syndromic respiratory panels, improve the ability to identify possible CAP aetiologies. However, these pathogen-detection approaches are limited in their clinical sensitivity or specificity, depending on the test. Even after exhaustive microbiological testing, as many as 62% of patients with CAP have no identified pathogen.8

Fortunately, appropriate empiric antimicrobial therapy can be selected based on the class of aetiological organism (typical bacteria, atypical bacteria or virus) rather than the specific pathogen.9 Given limitations in pathogen detection tests, the host response is an alternative diagnostic approach that allows the pneumonia aetiology to be categorised into broad, clinically relevant groups. Procalcitonin is the most well-described host response biomarker and has been extensively studied for the discrimination of bacterial and viral respiratory tract infections. However, it has poor sensitivity and specificity particularly when atypical bacterial aetiologies are considered.10 11

Gene expression profiling is another approach to characterising the host response. This holistic view of gene expression offers insights into the cellular processes associated with specific biological states and can be used to generate signatures of that disease. This approach has been used extensively to understand the host response to various infectious diseases such as sepsis, tuberculosis, acute respiratory infections (ARI), dengue and malaria among others.7 12–15 In this study, we leveraged large scale gene expression analyses to prospectively classify typical, atypical and viral pneumonia. We began with murine models of pneumonia to develop gene expression signatures of the host response to typical bacterial pneumonia (Streptococcus pneumoniae), atypical bacterial pneumonia (Mycoplasma pneumoniae) and viral pneumonia (influenza H1N1). We then validated our findings in publicly available human gene expression data from patients with known bacterial and viral pneumonias.

Methods

Inoculum preparation

Three models were employed to cover the three classes of CAP: typical bacteria with S. pneumoniae, atypical bacteria with M. pneumoniae and viral infection with the murine-adapted H1N1 strain PR8. S. pneumoniae ATCC 6303 (American Type Culture Collection) was grown to mid-logarithmic phase in Todd-Hewitt broth for 6 hours at 37°C and harvested by centrifugation. The dose was adjusted to 8.7×105 colony forming uprinciple component anits (CFU) per 50 µL.16 M. pneumoniae ATCC 15531 (American Type Culture Collection) was grown in SP4 broth (Remel) at 35°C until adherent, harvested by centrifugation and adjusted to a concentration of 1×108 CFU per 50 µL.17 Influenza A virus H1N1 PR/8/34 stock was obtained from Charles River Laboratories. The stock was titered via plaque assay using Madin-Darby Canine Kidney and diluted to a final concentration of 80 plaque forming units (PFU) per 40 µL. The final dose was chosen to be sublethal and allow for 120 and 168 hours time points.18

Murine infection

This work was initiated separately by two labs independently investigating similar questions in parallel. We took advantage of the work being performed and combined data to limit mouse numbers used. Each intervention included a control population to eliminate gender differences as one lab used male mice and the other female mice. Additional steps were taken to carefully control for the effect of gender as laid out in the methods section detailing differential gene expression analysis. The methods were selected to eliminate any gender-specific host response changes when comparing one infection type to the others.

Animals were housed in the same building in identical cages receiving identical diets. Formal randomisation and blinding was not used. Experimental design is summarised in figure 1. To model typical and atypical bacterial infection, female C57BL/6 mice aged 6–8 weeks old were intranasally inoculated with either 50 µL S. pneumoniae, 50 µL M. pneumoniae or 50 µl of PBS as a control. To model viral pneumonia, male C57BL/6 mice aged 6–9 weeks old were inoculated with 40 µL of PBS containing 80 PFU of live influenza virus or 40 µL of PBS, intranasally. Postinfection, mice were monitored for daily weights and behavioural changes. The mice were then sedated and euthanised via CO2 chamber at predetermined time points. A 500 µL of blood was collected from all mice via cardiac puncture and placed in tubes containing RNAlater (Mouse RiboPure RNA Isolation Kit, Ambion). Right lungs were harvested, homogenised and plated to confirm infectious status. Only animals with concordant exposure and culture were included. Ultimately, for the bacterial experiments, 38 S. pneumoniae-infected mice, 27 M. pneumoniae-infected mice and 37 control mice were used for further analysis. These animals were sacrificed at the following time points: 24 hours (n=11 S. pneumoniae, 10 M. pneumoniae, 14 controls), 48 hours (n=14 S. pneumoniae, 11 M. pneumoniae, 11 controls) and 72 hours (n=13 S. pneumoniae, 6 M. pneumoniae, 12 controls). For the viral experiments, 19 infected and 5 control mice were used for further analysis. Nine infected mice were sacrificed at 120 hours. 10 infected and 5 control mice were sacrificed at 168 hours. Three animals in the mycoplasma exposure groups and four animals in the viral group died prior to endpoint analysis. These deaths were related to anaesthesia complications. Animal numbers were determined per typical laboratory practice of >5 animals per group when preliminary data for power analysis are unavailable. Animals studies were conducted and reported in accordance with the animal research: reporting in vivo experiments (ARRIVE) guidelines.19

Figure 1

Experimental design. Male (viral) and female (bacterial) C57BL6/j mice aged 6–8 weeks were infected with either Streptococcus pneumoniae, Mycoplasma pneumoniae, H1N1 or control. Animals were sacrificed at 24, 48 and 72 hours for bacterial exposures and 120 and 168 hours for viral exposures. Blood was collected and analysed for RNA expression and differential gene expression. CFU, colony forming units; PFU, plaque forming units.

RNA preparation

Total RNA was isolated using the Mouse Ribopure Isolation Kit followed by globin mRNA depletion using the GLOBINclear Kit (Ambion). The NanoDrop spectrophotometer (Thermo Fischer Scientific) was used to analyse the amount and purity of the RNA. The integrity of each sample was inspected with the Agilent Bioanalyzer. We screened for samples that met the following predetermined quality checks: 260/280 ratio >1.8, 260/230 ratio >1.8 and RNA integrity number >7. Samples that passed screening were amplified and biotin-labelled using MessageAmp Premier RNA Amplification kit (Ambion) according to the manufacturer’s protocol. We subsequently amplified and hybridised the samples onto Affymetrix murine 430A2.0 microarrays containing 22 690 probes. Probe intensities were detected using Axon GenePix 4000B Scanner (Molecular Devices). Image files were generated using Affymetrix GeneChip Command Console software.

Differential gene expression analysis

Affymetrix probe data were background corrected, normalised and the robust multichip average expression values were computed with the R package Affy.20 21 Probes of interest were linked to an Entrez gene ID, gene symbol and gene name using the Bioconductor packages AnnotationDbi, mouse430a2.db and org.Mm.eg.db.22–24 For differential expression analysis, we used R package limma,25 and its linear modelling and empirical Bayes tool sets. Significantly differentially expressed genes (DEGs) are defined as having an false discovery rate (FDR)-corrected p value ≤0.05, using the Benjamini-Hochberg procedure.26 Genes with an FDR-adjusted p≤0.05 and log2 fold change ≥0.5 were passed on to pathway enrichment analysis. A permissive log fold change (LFC) threshold was selected to ensure an adequate number of genes were available for signature development. Pathway enrichment analysis was carried out with the R package ReactomePA27 and the top 25 enriched pathways were visualised using the R package enrichplot.28 This process was repeated for 24, 48 and 72-hours postinoculation for S. pneumoniae and M. pneumoniae-infected animals. For the influenza-infected animals, we used gene expression from 120 and 168 hours postinoculation but the procedures were otherwise the same. For the Mycoplasma and H1N1 cohorts, we additionally performed Gene Set Enrichment Analysis (GSEA). For GSEA, genes were ranked by LFC and passed into reactomePA. For both classical pathway analysis and GSEA, we use the Reactome database.29 In all cases, DEGs were based on comparison to gender-matched, uninfected control animals (ie, baseline).

To compare the host response across the different pathogen classes, we selected the time point that reflected the most DEG compared with uninfected controls, which was 72 hours for S. pneumoniae, 48 hours for M. pneumoniae and 168 hours for influenza. We considered these groups in both classical pathway analysis and GSEA in the methods described above.

Murine disease signature development

Analysis was performed on the 5000 probes showing the greatest variance in expression across the 126 samples analysed in this study. To account for confounding by gender when making comparisons between S. pneumonia or M. pneumoniae-infected mice and H1N1-infected mice, comparisons were only performed using genes that were differentially expressed in the initial analysis between each infectious model and gender-matched controls. As a second confirmatory step, we performed differential expression analysis between the 37 female, bacterial controls and 5 male, viral controls. We set a liberal p value threshold of 0.25 and removed all probe sets below this value under the assumption that these differences were most likely due to gender. Diagnostic signatures were generated using least absolute shrinkage and selection operator (LASSO) formulation (glmnet package).30 Nested leave-one-out cross validation was used to do both regularisation parameter (lambda) and accuracy estimation, with an alpha value of one. Models were identified that discriminated between groups and defined in the Results. The models estimated a probability that each sample belonged to the designated class.

External validation

We validated the murine disease signature in five human data sets obtained from the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo/): GSE63990, GSE60244, GSE42026, GSE40012 and GSE20346. For the purpose of validation, we tested the ability of the murine signatures to differentiate viral and bacterial infection in humans as there was no single robust data set with an adequate number of patients infected with typical, atypical and viral pneumonia. The types of samples included in each data set and the specific samples included for external validation are detailed below.

GSE63990 included 90 samples from subjects with non-infectious illness, 117 samples from subjects with acute viral respiratory infection, and 73 samples from subjects with acute bacterial respiratory infection.7 From this cohort, we included the 117 viral-infected and 73 bacterial-infected samples. GSE60244 contained whole blood transcriptional analysis from 158 adult subjects.31 118 had lower respiratory tract infections that were either bacterial (n=22), viral (n=71) or bacterial-viral coinfection (n=25) in aetiology. 40 age-matched healthy controls were also included. From this cohort, we included the 22 bacterial-infected and 71 viral-infected subjects. GSE42026 included gene expression data from children with respiratory infection.32 19 children had influenza H1N1/09, 22 had RSV, 18 had bacterial infection and 33 were healthy controls. From this cohort, we included 19 influenza H1N1/09 cases, 22 RSV cases and 18 bacterial-infected cases. GSE40012 included 61 subjects with bacterial pneumonia, 39 with influenza pneumonia, 14 with mixed bacterial/influenza pneumonia, 40 subjects with non-infectious SIRS and 36 healthy controls.33 From this cohort, we included the 61 bacterial pneumonia cases and 39 influenza pneumonia cases. Finally, GSE20346 contained repeated sampling of 10 patients. From this dataset, we extracted gene expression data from 26 bacterial pneumonia time points, 19 severe influenza A H1N1 time points and 36 influenza vaccination time points.34 From this cohort, we included the 26 bacterial pneumonia cases and 19 influenza cases. All data sets were imported using the GEOquery package.35

When validating the murine disease signatures in humans, we pooled the S. pneumoniae, M. pneumoniae and influenza classifiers into one gene list. We converted the mouse gene list to corresponding human ortholog genes and the appropriate probes using biomaRt,36 37 Ensembl archive release 104 May 2021 (https://may2021.archive.ensembl.org/index.html). We then followed an identical workflow described above in which we used LASSO formulation and nested leave-one-out cross-validation on the mouse-equivalent human probes. The model threshold was chosen based on the Youden method, which jointly maximises sensitivity and specificity.38 Bootstrapping (n=2000) using the adjusted bootstrap percentile (BCa) method was used to estimate CIs for predication accuracies. Using the pROC package, area under the receiver operator curve (auROC) and CIs are deduced with the DeLong method, using the algorithm by Sun and Xu.39 40 Confusion matrices and statistics were generated showing the accuracy of our predictions. All data analyses were completed using R statistical software V.4.2.0.41

Patient and public involvement

The public was not involved in the development of this project.

Results

Clinical characteristics

There was 100% concordance between exposure and group assignment for all groups. There was no significant difference in weight between preinoculation and sacrifice in control mice (weight change=0.40±1.1711 g (p=0.14)). Mice infected with S. pneumoniae lost an average of 2.05±0.86 g (p<2.2×10−16 for preinfection vs postinfection weight). M. pneumoniae-infected mice had no significant difference in weight between preinoculation and sacrifice (weight change=−0.048±0.44 g; p=0.18). Influenza-infected mice lost an average of 0.71±1.27 g (p-value=0.029).

Differential gene expression

Principal component analysis demonstrated considerable divergence between S. pneumoniae-infected and control specimens (online supplemental figure S1). The majority of DEGs were observed in S. pneumoniae-infected animals compared with both controls and other infections (figure 2). Total numbers of DEGs can be found in online supplemental table S1. The complete list and numbers of DEGs are extensive and will be made available on request. The gene with the greatest LFC was Ngp, which codes for neutrophil granule protein, was the most upregulated gene at 72 hours in our S. pneumoniae model and was also highly upregulated at the other time points. This is consistent with a role for NGP in acute bacterial infections given its localisation to immature granulocytes.42 43

Figure 2

Comparison between infection type. Gene set enrichment analysis on all Streptococcus pneumoniae-infected animals sacrificed at 72 hours were compared to Mycoplasma pneumoniae-infected animals sacrificed at 72 hours (left) and H1N1-infected animals sacrificed at 120 hours (right). False discovery rate (FDR)-adjusted p values≤0.05 and a log2 fold change ≥0.5 was set as a threshold for qualifying genes as being differentially expressed between comparison groups. Enrichment results are additionally restricted to an FDR-adjusted pvalue of less than or equal to 0.05.

There were fewer DEGs in both the mycoplasma and H1N1 infections, and the magnitude of differential gene expression was also smaller (figure 2). The only gene with a LFC>2 in either group was Ifit3 in the H1N1 120-hour group. This gene is related to downstream interferon signalling induced by viral infection.44 Ifit3 activity has also been used to discriminate between latent and active tuberculosis.45

Pathway analysis

Due to the low number of DEGs in the atypical and viral models, pathway analysis was only performed in S. pneumoniae-infected mice (figure 3). GSEA was performed on both Mycoplasma and H1N1 models to shed light on the pathophysiology despite the limited number of DEGs.

Figure 3

Temporal evolution of murine Streptococcus pneumoniae infection. We performed reactome pathway over representation analysis on S. pneumoniae-infected animals sacrificed at 24, 48 and 72 hours versus all bacterial controls. False discovery rate (FDR)-adjusted p values ≤0.05 and a log2 fold change ≥0.5 was set as a threshold for qualifying genes as being differentially expressed between comparison groups. Enrichment results are additionally restricted to an FDR-adjusted p value of less than or equal to 0.05.

At 24 hours in the S. pneumoniae group, the most enrichment was seen in pathways related to cytokine signalling. There was also enrichment in both the B and T cell signalling cascades and signalling though the interleukin 1 (IL-1) superfamily. By 48 hours, there was continued enrichment of cytokine signalling. However, there was enrichment in numerous pathways related to mRNA translation and processing. At 72 hours, many of the same pathways were enriched compared with 48 hours postinfection. Notable pathways are uniquely seen at 72 hours, however, included interleukin 2 (IL-2) and tumour necrosis factor (TNF) superfamily pathways (figure 3).

To identify pathways that differentiated typical and atypical bacterial infections, we compared the host response to S. pneumoniae at 72 hours and M. pneumoniae infection at 48 hours, the time points with the largest number of DEGs (adj. p≤0.05 and absolute LFC≥0.5). Using S. pneumoniae-infected mice as comparators, there were 2244 upregulated genes and 1525 downregulated genes in M. pneumoniae-infected mice. Pathway analysis of these genes showed the most pronounced enrichment in cytokine signalling, stress response and signalling by interleukins. Programmed cell death, chemical stress response, TLR, IL-1 family and C-type lectin receptors were also upregulated among others (online supplemental figure S2).

To identify pathways that differentiated viral infections from typical bacterial infections, we compared the host response to S. pneumoniae at 72 hours and H1N1 infection at 168 hours, the time points with the largest numbers of DEGs (online supplemental figure S3). Using S. pneumoniae-infected mice as comparators, there were 1669 upregulated genes and 1314 downregulated genes in H1N1-infected mice. Pathway analysis revealed pronounced enrichment in cytokine and interleukin signalling as well as several components of RNA processing (online supplemental figure S2).

GSEA was primarily notable for an over-representation of genes controlling cell cycle and division (online supplemental figures S4 and S5). This was true in both Mycoplasma-infected and H1N1-infected animals. These findings should be considered exploratory as fewer genes were used in the analysis.

Predictive analysis

To build a predictive signature out of the identified gene expression profiles, the top 5000 genes by variance were taken and used to generate sparse logistic regression models that identified a pathogen class in a binary manner. To account for differences in sex in the H1N1 infectious and control group, DEGs between the control Male and Female mice with an adj. p<0.25 were removed. The remaining genes (n=1782) were then run through the model. The complete list of genes passed into model generation will be made available on request.

We observed perfect accuracy for distinguishing S. pneumoniae infection from other types of infection with an auROC of 1.000 (figure 4). 47 genes contributed to the S. pneumonia model. The gene with the highest beta value (β=1.8045) is ATP11a, a protein involved in neutrophil degranulation. This was the only gene with a beta value greater than 1 for the S. pneumoniae model. The FDR-adjusted p value for all 47 genes was 0.

Figure 4

Murine signature development. The performance of a sparse logistic regression model with binary classification and nested LOOCV was applied to cases of Streptococcus pneumoniae infection (left), Mycoplasma pneumoniae infection (middle) and influenza infection (right). auROC, area under the receiver operator curve; LOOCV, leave-one-out cross-validation; ROC, receiver operating characteristic.

The murine M. pneumoniae signature had an auROC value of 0.943 (figure 4). 84 genes contributed to this model and the highest beta value was observed for the voltage-gated potassium channel Kcnn4 (β=3.3016) with an FDR-adjusted p value of 0. The T-cell-associated gene Trat1 also had a high beta value (β=1.2158) and an FDR-adjusted p value of 0.

Finally, our murine H1N1 influenza signature had an auROC of 1.000 (figure 4). 51 genes contributed to this model and the highest beta value was observed for interferon alpha-inducible protein 27 like 2A (Ifi27l2a, β=3.3887) with an FDR-adjusted p value of 0.

Detail plots of each signature can be found in online supplemental figure S6–S8.

External validation

We validated these findings in five publicly available gene expression data that included 487 subjects with bacterial or viral respiratory infection (GSE63990, GSE60244, GSE42026, GSE40012, GSE20346). No dataset could be identified that contained enough typical bacterial, atypical bacterial and viral infections to validate all three classes simultaneously, so the five datasets were combined. This combination required categorising patients into either viral or bacterial pneumonia, rather than the three categories that were initially proposed. Consequently, the predictive signature was validated based on ability to discriminate between viral and bacterial pneumonias. To validate the murine signature in this human data, we combined all three murine signatures, removed duplicates and probes with no gene symbol annotation, resulting in 168 genes (online supplemental table S2). These genes were transformed into their respective human orthologs, retaining 85% of the transcripts, resulting in a 143 gene signature (online supplemental table S3). We then performed leave-one-out cross-validation in each of the human datasets. The murine-derived signature demonstrated auROC values ranging from 0.821 to 0.957 (table 1). This translated to a prediction accuracy between 80.6% and 93.3%

Table 1

Murine signature validation in human pneumonia

Discussion

Distinguishing between aetiologies is a major challenge in management of CAP. Using gene expression data from murine models of each type of pneumonia, we have created a predictive model with excellent performance distinguishing between important categories of infection: bacterial and viral pneumonia. These results were validated on multiple publicly available human datasets. The most important genes in our prediction model were also differentially expressed between infectious agent groups, suggesting that these findings are driven by differences in inflammatory profiles between each of the infectious agents.

This work is an extension of prior studies by our group. In 2016, we used a similar predictive strategy to develop an expression signature based on samples obtained prospectively from patients presenting with acute respiratory illness.7 This signature had excellent diagnostic performance in its population, though was limited by its observational design and reliance on clinical adjudication as the reference standard. Further studies restricted our signature to patients with a single microbiologically confirmed organism, compared our signature to procalcitonin and attempted to identify the most parsimonious signature differentiating bacterial and viral infection.46–49 These studies improved the clinical applicability of our signature but were still limited by the lack of a gold standard to determine the microbiological aetiology of ARI. The current study overcomes this limitation by using an experimental design to derive our signature in mice and then apply it prospectively to human samples. In addition to controlling for confounding by experimental design, the present study offers greater opportunity to understand the underlying pathophysiology attributable to the different pathogens.

The most robust change in gene expression occurred in animals infected with S. pneumoniae. This is consistent with our physiologic data. S. pneumoniae-infected mice lose more weight, indicating increased severity of disease. Gene expression evolved from 24 to 72 hours. The initial response was dominated by genes involved in transcription. By 72 hours, the most upregulated pathways were related to cytokine and interleukin signalling and stress responses. In contrast, M. pneumoniae demonstrated a more muted host response. This is not surprising as atypical bacterial pneumonia, such as M. pneumoniae, is classically less clinically aggressive.50 Consistent with its intracellular pathogenesis, the host response to M. pneumoniae was more similar to influenza infection than the host response to other bacterial pathogens.10 50

Our comparisons are, therefore, largely driven by the differences between S. pneumoniae and all other groups. This was initially apparent in our Principle Component Analysis (PCA) and was further borne out by the low numbers of DEGs in the Mycoplasma and H1N1 groups. However, there are some discernable trends. In comparison to H1N1 models, ‘cellular responses to stress’ were less upregulated than when Streptococcus was compared with controls. This indicates that Streptococcus and H1N1 are more similar along these particular pathways. However, given the low numbers of genes in this analysis and the divergence between principal components between these two models this must be interpreted with caution.

Focusing specifically on the Mycoplasma model, the host response was less robust when compared with the other pathogens. However, the animals did demonstrate clinical signs of infection. Recent work has shown that murine models of Mycoplasma are sensitive to experimental conditions, such as mouse strain.51 C57 mice have milder inflammation following Mycoplasma infection in one study.51 For the purpose of our study, it was necessary to use C57 mice for all models. However, this may have led to a muted inflammatory response. Consequently, the number of DEGs and the magnitude of differential gene expression should be interpreted cautiously.

The validity of our model and clinical applicability of our findings in human subjects was confirmed by applying our signature to patient data with excellent results (auROC 0.82-0.957). By comparison, procalcitonin has an auROC of only 0.75 in distinguishing between infectious and non-infectious causes of pneumonia.52 The use of procalcitonin to prospectively differentiate between bacterial and viral aetiologies of pneumonia is similarly limited, with an auROC of only 0.73, with a sensitivity of 80.9% and a specificity of 51.6% at the optimal cut-off.10 Our signature is an improvement in both categories. These results support both the clinical value of our signature as well as the biological validity of the murine pathophysiologic responses to these pathogens.

Several other studies have investigated transcriptional profiling in diagnosing pneumonia in murine models. These studies focused on specific types of pneumonia, including viral pneumonia and pyogenic pneumonia.53 54 This study was unique in its inclusion of typical, atypical and viral pneumonia in the same analysis. In doing so, we reinforced the potential for host gene expression to serve as a new diagnostic strategy for infectious disease. Although there have been some questions as to the applicability of murine whole blood transcriptional findings to humans, we have shown through external validation that these murine-derived signatures may be clinically relevant.53 Similar cross-species relevance was observed in several other studies that validated murine-derived signatures in human subjects including Staphylococcus aureus and Escherichia coli infection55 and radiation exposure,56 among others.

Our study has several limitations. The most significant is that the bacterial and viral experiments were performed in mice of different genders. However, we account for this in two, orthogonal analytical steps. Furthermore, pathway analysis did not show any pathways related to sex or gender to be differentially expressed. This bolstered our confidence that this limitation in design has been adequately addressed by our analytical methodology. Additionally, we were not able to find an individual GEO validation data set with adequate numbers of human-derived samples from patients with typical bacterial, atypical bacterial and viral pneumonia. We, therefore, combined several GEO datasets, leading to heterogeneity in the validation cohort in terms of when samples were collected and how data was generated. Finally, we were unable to obtain peripheral blood leucocyte differentials. Because differences in gene expression can be related to changes in leucocyte subpopulations, mechanistic inferences from this data should be made with caution. However, prior work in other disease models has shown that absolute changes in leucocyte count are not the primary driver of differential gene expression.12 57 58

Through transcriptional profiling, we have developed a predictive signature to differentiate typical bacterial, atypical bacterial and viral pneumonia, represented by S. pneumoniae, M. pneumoniae and influenza, respectively. We have validated this signature in human gene expression data and achieved excellent predictive accuracy. These results validate the use of murine models of pneumonia. Future studies will focus on further validating this signature in patients prospectively as well as understanding the biological significance of the genes and pathways reflected in this signature.

Data availability statement

Data are available in a public, open access repository. Raw data are available through the gene expression omnibus under GEO set number GSE214051. Further data requests and questions should be forwarded to the first author.

Ethics statements

Patient consent for publication

Ethics approval

All murine work was approved by the Duke University Institutional Animal Use and Care Committee (IACUC) and was approved under Animal Study Protocols (ASP) A051-21-03 and A278-09-09 and all methods were approved by the Duke University Institutional Animal Care and Use Committee (IACUC).

References

Supplementary materials

Footnotes

  • Contributors MM analysed all data, drafted and edited main manuscript. This author is guarantor; NO’G analysed all data, drafted and edited main manuscript; KK analysed all data, drafted and edited main manuscript; MB-Q: Bacterial exposures: performance of experiments, data analysis, interpretation, manuscript editing; AKZ: Bacterial exposures: performance of experiments, data analysis, interpretation, manuscript editing; AET: Viral exposure: performance of experiments, data analysis, interpretation, manuscript editing; ZY: Viral exposure: performance of experiments, data analysis, interpretation, manuscript editing; LQ: Viral exposure: performance of experiments, data analysis, interpretation, manuscript editing; RH: Computational approach, data analysis and manuscript editing; SS: Computational approach, data analysis and manuscript editing; GSG: Computational approach, data analysis and manuscript editing; CWW: Bacterial exposures: performance of experiments, data analysis, interpretation, manuscript editing; MTM: Bacterial exposures: performance of experiments, data analysis, interpretation, manuscript editing; ELT analysed all data, drafted and edited main manuscript.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.