Biomarkers Of Disease

Metabolomics of World Trade Center-Lung Injury: a machine learning approach

Abstract

Introduction Biomarkers of metabolic syndrome expressed soon after World Trade Center (WTC) exposure predict development of WTC Lung Injury (WTC-LI). The metabolome remains an untapped resource with potential to comprehensively characterise many aspects of WTC-LI. This case–control study identified a clinically relevant, robust subset of metabolic contributors of WTC-LI through comprehensive high-dimensional metabolic profiling and integration of machine learning techniques.

Methods Never-smoking, male, WTC-exposed firefighters with normal pre-9/11 lung function were segregated by post-9/11 lung function. Cases of WTC-LI (forced expiratory volume in 1s <lower limit of normal, n=15) and controls (n=15) were identified from previous cohorts. The metabolome of serum drawn within 6 months of 9/11 was quantified. Machine learning was used for dimension reduction to identify metabolites associated with WTC-LI.

Results 580 metabolites qualified for random forests (RF) analysis to identify a refined metabolite profile that yielded maximal class separation. RF of the refined profile correctly classified subjects with a 93.3% estimated success rate. 5 clusters of metabolites emerged within the refined profile. Prominent subpathways include known mediators of lung disease such as sphingolipids (elevated in cases of WTC-LI), and branched-chain amino acids (reduced in cases of WTC-LI). Principal component analysis of the refined profile explained 68.3% of variance in five components, demonstrating class separation.

Conclusion Analysis of the metabolome of WTC-exposed 9/11 rescue workers has identified biologically plausible pathways associated with loss of lung function. Since metabolites are proximal markers of disease processes, metabolites could capture the complexity of past exposures and better inform treatment. These pathways warrant further mechanistic research.

Key messages

  • Can machine learning algorithms determine metabolomic signatures of World Trade Center Lung Injury (WTC-LI)?

  • Machine learning techniques are an innovative, promising approach to capturing the systemic nature of the pathogenesis of particulate matter associated lung disease.

  • We identified key pathways, such as sphingolipids and branched chain amino acids, that contribute to lung function loss after WTC particulate and toxin exposure.

  • To learn how machine learning can be integral to identifying key mediators of lung function loss in this novel metabolomics investigation of WTC first responders.

  • The metabolome may allow us to further identify biologically relevant therapeutic targets of particulate/toxin-associated lung disease.

Introduction

Fire Department of New York (FDNY) rescue workers exposed to World Trade Center particulate matter (WTC-PM) developed lung disease similar to other exposed cohorts.1–6 Characteristics of airflow obstruction predominated in symptomatic individuals that sought treatment.1 7 8 WTC-affected subjects continue to have their quality of life adversely impacted even 15 years after exposure.9 10

Biomarkers of metabolic syndrome (MetSyn), vascular injury and inflammation predict developing World Trade Center Lung Injury (WTC-LI) (defined as forced expiratory volume in 1 s percent predicted of normal (FEV1, %Pred) less than the lower limit of normal (LLN)).1 7 8 11 MetSyn is defined by inter-related risk factors of metabolic origin associated with end-organ disease and affects approximately one-third of the adult US population.12 13 A diagnosis of MetSyn is determined by the presence of at least three of the following comorbidities: abdominal obesity, insulin resistance, hypertriglyceridaemia, low high-density lipoprotein (HDL) and hypertension.12 We adjusted to National Cholesterol Education Program Adult Treatment Panel III (NCEP ATP III) definitions of MetSyn for our cohort and diagnosed MetSyn as having three of five following criteria: systolic blood pressure (SBP) ≥130 mm Hg or diastolic BP (DBP) ≥85 mm Hg; HDL <40 mg/dL; triglycerides≥150 mg/dL; insulin resistance as glucose ≥100 mg/dL; and body mass index (BMI)>30 kg/m2.

Our findings in the WTC-exposed FDNY cohort fit into a larger set of studies demonstrating the association of MetSyn, lipids and obstructive airways disease (OAD) after pollutant exposure.14 Prior studies have identified an association between metabolic mediators and disease severity following toxin exposure.1 15 16 Specifically, metal fume exposure secondary to welding has been linked to an increase in plasma unsaturated fatty acid (FA) levels. Consisting of particulates and metals (such as nickel, copper and iron), the welders’ exposure is similar to the WTC exposure; however, there exist significant differences. The exposure at the WTC site was diverse. The destruction of the WTC complex pulverised 1.2 million tons of construction material.17 18 Extensive analysis of the particulates for size and composition showed that metals (vanadium, chromium, nickel, iron, silicon and mercury) supplemented common construction materials, such as powdered concrete, calcium carbonate and silicates.17 Additional toxins included fibrous glass, asbestos, components of jet fuel, fire retardants and dioxins.18 With exposure including high-intensity particulates spanning the ultrafine to coarse range and associated volatile compounds, our WTC cohort allows for novel metabolic observations.

Metabolic profiling (metabolomics), a comprehensive analytical technique that measures all small organic molecules detectable in biological fluids, provides a contemporaneous snapshot of the organism’s physiology.19 The metabolome’s assessment of low-molecular-weight compounds may prove the closest link to disease phenotype.20 Furthermore, metabolomics provides a non-invasive functional genetics approach to understanding molecular complexity.20

However, metabolomics uses a high-throughput platform and therefore optimising the analysis of this large data set is a primary challenge in ‘omics research.21 A common approach, using significance value cut-offs in reducing the dimensionality of the data, subsumes the possibility of producing false discoveries, while discounting truly important compounds. For this reason, we turn to machine learning, which has the ability to analyse differential expression of individual metabolites, as well as consider how metabolites interact with each other to produce class separation. This may allow us to unmask metabolites that appear insignificant on their own, but reveal key phenotypic changes when analysed in tandem with other metabolites.

Machine learning methods include random forests (RF), neural networks, partial least squares and support vector machines. Our machine learning method of choice is RF because it is unbiased, performs well on datasets with sizes similar to ours, is non-parametric and rarely overfits a data set.22

RF uses a large ensemble of decision trees to classify subjects, produces an unbiased estimate of model classification accuracy and measures each metabolite’s importance to model classification accuracy. This measure, called mean decrease accuracy, allows the researcher to narrow his focus to only those metabolites most important to class differentiation. We hypothesised that implementation of machine learning techniques on the metabolome of WTC-exposed FDNY firefighters may elucidate pathways of interest in progression to WTC-LI, reveal clustering patterns reflective of mechanistic action and enable discovery of bioactive lipid metabolites related to loss of lung function.

Methods

Study design

Cases and controls were drawn from symptomatic subjects referred for subspecialty pulmonary examination (SPE) between 10/1/2001 and 3/10/2008, and underwent specialised pulmonary function testing as previously described.1 7 8 The parent cohort consisted of cases of WTC-LI (n=96) defined as FEV1, %Pred <LLN at SPE and controls (n=127) selected as previously described.1 7 8 Subjects were chosen for untargeted metabolome assessment by application of a further set of inclusion criteria, including stable case/control assignment as ensured by most recent spirometry measures from annual health physical, and no chronic sinusitis diagnosis. Based on these criteria, 15 cases of WTC-LI were available from the previously studied group, and a 1:1 ratio of cases of WTC-LI to randomly selected controls was determined (figure 1).7 8 Specifically, cases of WTC-LI were defined as having FEV1, %Pred <LLN as defined by National Health and Nutrition Examination Survey III and control subjects as FEV1, %Pred ≥LLN, as previously described.1 23

Figure 1
Figure 1

Study design. Metabolome assessment cases of World Trade Center Lung Injury (WTC-LI) and controls were selected. FEV, forced expiratory volume; LLN, lower limit of normal; MMTP, monitoring and treatment programme; NHANES PFT, National Health and Nutrition Examination Survey Pulmonary Function Test.

Demographics

Demographic and clinical data were obtained from the WTC medical monitoring and treatment programme (MMTP). Exposure intensity is categorised as per the FDNY-WTC Exposure Intensity Index and is based on first arrival time at the WTC site, as previously described.5 24 25 Specifically, subjects are considered to have high exposure if they arrived during the morning of 11 September 2001, intermediate exposure if they arrived in the afternoon of 11 September 2001 and low exposure if they arrived on 12 September 2001. Duration is defined as the time in months spent at the WTC site performing rescue and recovery efforts. BMI was obtained at two time points: MMTP enrolment and SPE.

Metabolomics

Serum collected within 200 days after 9/11/2001 was processed and stored as previously described.1 7 8 11 26  Aliquots of serum were maintained at −80°C  until processing and metabolomic quantification (see online supplemental methods). The bioinformatics system consisted of four major components: the Laboratory Information Management System, data extraction software, peak identification software and data processing tools for quality control, and compound identification. Quality control and curation were designed to ensure accurate and consistent identification of true chemical entities, and to remove those representing system artefacts, misassignments and background noise. Compounds were matched to library entries of retention index, mass and spectral data.27–30 Qualified metabolites were those detected in 80% or more of subjects per group with a relative SD of 15% or greater.31 In qualified metabolites, missing data were imputed using the minimum observed value of each compound.

Database management and statistics

SPSS V.23 (IBM) was used for data storage and handling. Continuous variables were expressed as median and IQR. Non-parametric Mann-Whitney U test was used to compare continuous data. For categorical data, count and proportions were used to summarise and Pearson χ2 was used for comparison.

Machine learning, dimension reduction and pattern recognition

Curated data of the qualified profile were subjected to RF (randomForest Package R V.3.4.3, R-Project). A refined metabolite profile was developed by including the top 5% of metabolites important to class separation as measured by mean decrease accuracy, a measure of the decrease in classification accuracy of the model should the given metabolite be removed. RF was rerun with the refined profile to evaluate the classification accuracy produced by the refined profile.22 Ten replicates of each RF model were run to assure stability of the model. The best RF model was identified by the lowest estimated out-of-bag error rate of classification, a permutation-based assessment of internal model validity. RF models consisted of 10^6 and 10^3 trees for the qualified and refined profiles, respectively, with  Inline Formula  sampled with replacement at each node.

Principal component analysis (PCA) (SPSS V.23, IBM) was implemented to view the case/control separation in qualified metabolites compared with the machine learning-generated refined profile. PCA is a tool used in exploratory data analysis, allows for dimension reduction while preserving the variance of the data and can be used in the construction of predictive models. To perform PCA, each attribute was mean-centred, normalised and projected onto the eigenbasis of the correlation matrix of all attributes. The number of components retained was determined based on analysis of the scree plot. A determination of the variance explained by PCA was quantified as the summation of the per cent total variance explained by the components identified by scree plot. Loading weights of metabolites on the principal components were plotted to identify potential relationships between correlated metabolites. We first used the qualified profile of 580 metabolites in the 30 subjects labelled by case/control status, then applied this method to the refined metabolite profile identified by RF.

Additionally, unsupervised two-way hierarchical clustering was performed on the refined profile’s correlation matrix using Pearson correlation as a distance metric and average linkage (Cluster V.3.0, Java Treeview, Eisen Lab).

Results

Demographics

Parent cohort

Cases of WTC-LI (n=96) were similar to controls (n=127) in the parent cohort in age on 9/11, clinically available lipids (triglycerides, HDL, low-density lipoprotein (LDL)), SBP and DBP. FEV1, %Pred, forced vital capacity per cent predicted of normal (FVC%Pred), FEV1/FVC and duration of time (months) spent at WTC site were significantly decreased in WTC-LI cases compared with controls, while BMI (at MMTP and SPE), exposure intensity, and heart rate (HR) were significantly increased in WTC-LI cases compared with controls in the parent cohort (table 1).

Table 1
|
Clinical measures, biomarker prevalence and model definition

Metabolomics subcohort

Cases of WTC-LI (n=15) were similar to controls (n=15) in age on 9/11, exposure intensity, duration, lipids (triglycerides, HDL, LDL), leucocyte subtype percentages (neutrophil, lymphocyte, monocyte, basophil and eosinophil), serum levels of glucose, sodium, chloride, potassium, uric acid, total protein and DBP. Cases of WTC-LI (n=15) had significantly decreased FEV1, %Pred, FVC%Pred and FEV1/FVC at SPE compared with controls (n=15) and also significantly increased BMI at both MMTP and SPE. Lastly, cases of WTC-LI had significantly increased SBP and HR (table 1).

Overall, cases of WTC-LI with metabolome (n=15) assessed were similar to the controls in the parent cohort (n=127) in spirometry, BMI, age on 9/11, exposure intensity, lipids (triglycerides, HDL, LDL), blood pressure, leucocyte subtype percentages, serum levels of glucose, sodium, chloride, potassium, uric acid and total protein (table 1).

Controls with metabolome assessed (n=15) and controls in the parent cohort (n=127) also were similar in clinically available lipids, leucocyte subtype percentages, serum levels of glucose, sodium, chloride, potassium, uric acid and total protein. Controls (n=15) differed from their parent cohort (n=127) in BMI at MMTP and SPE, HR, SBP and DBP; however, controls consistently have significantly lower BMI at MMTP and SPE compared with cases of WTC-LI, and the same trend remains true for HR, SBP and DBP. The cases of WTC-LI with metabolome assessed and cases in the parent cohort were similar in all of the described clinical characteristics and biochemical measurements (table 1).

Metabolomics

Of 765 metabolites detected, 580 qualified, were included in further analysis and are summarised, (online supplementary table S1). Initial PCA of the qualified profile revealed heterogeneity and overlap in metabolite expression levels between cases of WTC-LI and controls (figure 2A) and ill-defined metabolite clustering (online supplementary figure S1). Qualified metabolites were subjected to RF to reduce the dimensionality of the data by discovering those metabolites most relevant to our outcome. Mean decrease accuracy was measured for every metabolite to assess variable importance, and the top 5% were included in the refined metabolite profile (figure 3 and online supplementary table S2). RF of the refined metabolite profile achieved a 6.7% out-of-bag estimated error rate (93.3% estimated accuracy) (figure 3).

Figure 2
Figure 2

Demonstration of model optimisation: principal component analysis (PCA) scores plot. (A) PCA of the qualified profile reveals heterogeneity in the data. (B) PCA of the refined profile demonstrates improved class separation produced by the refined profile compared with initial PCA (A). WTC-LI, World Trade Center Lung Injury.

Figure 3
Figure 3

Random forests (RF) variable importance in projection. RF variable importance in projection is measured by mean decrease accuracy; the top 5% of metabolites important to class separation are shown. The confusion matrix shows classification accuracy of the refined profile. PUFA, polyunsaturated fatty acids.

PCA of the refined metabolite profile captured 68.3% of variance in the five components retained based on examination of the scree plot (online supplementary figure S2) and demonstrated improved class separation compared with the initial PCA (figure 2B).

In PCA, we see that features of the metabolome years prediagnosis can separate cases of WTC-LI from controls. Thereafter, we can begin to examine how these features may mechanistically interact.

Characteristics of the refined profile

The PCA loading weights plot of the refined profile and correlation heatmap suggest metabolite associations through clustering patterns (figure 4A, B). Overall, we identified five clusters (C1–5) of metabolites based on the correlation heatmap (figure 4B). These clusters included metabolites from the following pathways: amino acids, carbohydrates, cofactors/vitamins, lipids, nucleotides, peptides and xenobiotics.

Figure 4
Figure 4

(A) Refined profile principal component analysis loading weights plot was used to derive insight into possible association of biomarkers. (B) Correlation heatmap. Correlation matrix of refined profile subjected to hierarchical clustering using Pearson correlation as a distance metric.

The lipid-predominant cluster (C1) consisted of 1-palmitoyl-2-arachidonoyl-GPC (16:0/20:4n6), 1-stearoyl-2-arachidonoyl-GPC (18:0/20:4), arachidonate (20:4n6), uridine, 2-hydroxypalmitate, 2-hydroxystearate and lignoceroyl sphingomyelin (d18:1/24:0) (figure 4B). In PCA, these metabolites clustered closely (figure 4A). A member of C1, arachidonate (20:4n6), had the third highest mean decrease accuracy score and was therefore the third most important metabolite to class separation (figure 3).

Amino acid metabolites and two ascorbate and aldarate metabolites were identified in C2: arabonate/xylonate, gulonate, n-acetylglutamine, methylsuccinate, n2-acetyllysine, lanthionine and threonate. This group of metabolites included n2-acetyllysine, the metabolite with the highest mean decrease accuracy score (figure 3) and clustered tightly in the PCA scores plot (figure 4A).

Amino acids and lipids were key co-contributors in C3, which consisted of proline, dimethylglycine, propionylcarnitine, isobutyrylcarnitine, azelate (nonanedioate), vanillylmandelate (VMA), prolylglycine and 4-methylcatechol sulfate (figure 4B). Interestingly, several members of C2 bore strong correlations with members of C3 (figure 4B).

In contrast to the prior clusters, sphingolipids were identified in C4, which consisted of sphingosine, sphinganine and sphingosine 1-phosphate (figure 4B). This group of metabolites was tightly grouped in the PCA scores plot (figure 4A). Furthermore, this cluster contained the metabolite with the second highest mean decrease accuracy score, sphingosine (figure 3).

Finally, a phospholipid, docosahexaenoylcholine, and an alanine and aspartate metabolite, n-acetylasparagine, were found in C5 (figure 4B). Members of C5 were strongly correlated with metabolites of other clusters, namely C1 (figure 4B).

Discussion

The serum metabolome of the WTC-exposed firefighters soon after 11 September 2001 is metabolically rich and empowered by a well-described cohort that strengthens associations in the investigation of WTC-LI and systemic pathology; however, these associations are limited in interpretation as our study’s samples are entirely collected within 200 days of WTC exposure. Although we can speculate on the metabolic environment prior to 11 September, inferences can only be made assuming that any individual component of MetSyn may not clinically change within such a short period of time. The state of health prior to exposure can only speculatively, but not definitively, be considered by review of the early blood draws post exposure.

Prior literature closely links concurrent metabolic disease and lung function. In contrast, we have previously shown that these associations of MetSyn and future lung function loss occur in a cohort with normal lung function prior to exposure. Thus, we suggest that the metabolic environment soon after exposure can trigger an inflammatory cascade that contributes to persistent and irreversible lung damage.

Although the metabolome reflects the complex interaction between many different parent cells, our data suggest that the most biologically active WTC-LI-associated metabolites reflect lipids and amino acids. Several of these have been identified as associated with broader categories of OAD, and, in line with our hypothesis, these metabolites clustered in patterns representative of their established mechanisms; however, other clusters contain novel metabolites. Overall, these metabolite clusters represent biologically plausible signalling cascades.

Our exclusively sphingolipid cluster, C4, highlighted the importance of these established inflammatory and metabolic mediators. Sphingosine 1-phosphate is a pleiotropic inflammatory mediator involved in immune cell trafficking and asthmatic hyper-reactivity induction.32 Elevation of sphingosine, a product of sphingosine 1-phosphate degeneration, in cases of WTC-LI may indicate decreased bioavailability of sphingosine 1-phosphate, leading to compromised vascular integrity.32 We saw a decrease in sphingosine 1-phosphate levels in cases of WTC-LI. Of interest is our finding that cases of WTC-LI also had significantly higher HR and SBP.

In contrast to the homogeneity of C4, the metabolites in C1 were far more diverse, including lipids from a variety of subpathways. The importance of lignoceroyl sphingomyelin (d18:1/24:0) proposes that ceramide, a building block of sphingosine and a metabolite of LDL and very-low-density lipoprotein, may be involved in WTC-LI pathogenesis. Sphingomyelins are involved in ceramide synthesis and metabolism. Ceramide is involved in the synthesis and degradation of ceramide 1-phosphate. Ceramide 1-phosphate triggers release of arachidonate (20:4n6), the third most important metabolite to class separation in this study and a key mediator in the inflammatory cyclooxygenase (COX) pathway.32 Arachidonate (20:4n6), being the primary substrate in eicosanoid production, is involved in immune cell recruitment.33 34 Additionally, eicosanoids are pivotal in inflammatory pathogenesis. Moreover, changes in ceramide 1-phosphate correlate directly with changes in sphingosine and sphingosine 1-phosphate, suggesting that ceramide 1-phosphate is a precursor of sphingosine 1-phosphate.32 Together, these data elucidate interconnected pathways involved in early response to WTC-exposure that differentiate well cases of WTC-LI and controls based on the metabolome years pre diagnosis.

Chronic inflammation has been identified as a probable mediator of loss of lung function after exposure to WTC-PM. We know that metabolic dysfunction is not only a predictor of WTC-LI but also causes chronic inflammation. Azelate, a saturated FA in C1, was important to class separation. Saturated FAs activate TLR-4 in a mechanistic link between systemic inflammation and obesity, a key component of metabolic dysfunction.32 This activation of TLR-4 induces ceramide biosynthetic genes required for TLR-4-dependent insulin resistance.32 Glucose, a marker of insulin resistance, is a predictor of WTC-LI in the entire WTC-exposed firefighter cohort.

Notably, several classes of FAs were revealed by this analysis. In C1, lignoceroyl sphingomyelin (d18:1/24:0) clustered with polyunsaturated fatty acids (PUFA)s including arachidonate (20:4n6), and FAs, monohydroxy, as well as phospholipids. By examination of figure 5, it becomes apparent that phospholipid and sphingolipid metabolism may be linked by FA metabolism. While little is known about the specific FAs and phospholipids identified in this analysis, we do know the inflammatory roles of the PUFA arachidonate (20:4n6). The metabolites correlated with arachidonate (20:4n6) may also be related to immune cell recruitment. Evidence of this relationship is the central cell membrane role of phospholipids and sphingolipids, that sphingosine 1-phosphate is a potent pro-inflammatory mediator and neutrophil chemotactant, and that elevated sphingolipids may be indicative of an increased, disproportionate response to perturbation. The information presented here warrants mechanistic research into this potential pathway as a therapeutic target to ameliorate derailed inflammatory responses.

Figure 5
Figure 5

Pathway visualisation. Metabolic pathways of sphingolipids and phospholipids reveal pathways involving key metabolites, and that sphingolipid metabolism is linked with phospholipid metabolism by long-chain fatty acid metabolism. Node size correlates to fold change (red—up, blue—down, cases of World Trade Center Lung Injury/control).

As with previously discussed clusters, C2 contains some well-understood metabolites and some with unexplored roles. The most important metabolite to class separation is n2-acetyllysine, a lysine metabolite; lysine is an essential amino acid involved in the formation of protein and acetylation of lysine residues plays a key role in epigenetic modulation of inflammatory responses, creating bromodomain docking sites.35 In murine macrophages, pharmacological inhibition of a bromodomain attenuated nuclear factor-κB-directed production of nitric oxide and interleukin-6.35

Other essential amino acids presented in C2, namely several leucine, isoleucine and valine metabolites. These branched-chain amino acids (BCAAs) are hypothesised to be biologically active in respiratory illness due to their high anabolic capacity.36 Attenuated blood BCAA concentrations have been observed in patients with chronic obstructive pulmonary disease (COPD).37–39 Dietary supplementation with BCAAs ameliorates COPD-related respiratory muscle weakness and weight loss, and low serum BCAA concentration has been identified in COPD cohorts.37 40 41 This is the first data suggesting low serum BCAA concentration may predispose to PM-related lung disease in our WTC FDNY cohort.

One well-known metabolite found in C3, VMA, is indicative of epinephrine and norepinephrine release (both known stress hormones). Epinephrine is responsible for increased release of FAs and is a potential explanation of raised levels of FAs, such as propionylcarnitine and azelate (nonanedioate) (C3), in cases of WTC-LI. The pleiotropic inflammatory transcription factor c-Jun N-terminal kinases (JNK) is activated by exposure to free FAs and is involved in the signalling cascade of the receptor for advanced glycation end products (RAGE).42 We are currently investigating the roles of RAGE and JNK in WTC-LI.23 43 44

As hypothesised, this analysis can identify biologically plausible signalling pathways relevant to the development of WTC-LI. While established mediators such as sphingolipids were reviewed, several potentially novel mediators, including FAs, lysine, BCAAs (in their predispositional nature), glutamine and stress hormone metabolites, have been elucidated. Some of our findings, especially the COX mediators, sphingolipids and BCAAs, mimic similar findings in COPD and pollutant-exposed cohorts. The similarities observed between the metabolome of this select group of WTC-exposed firefighters and that of other cohorts with inflammatory disease support generalisability to OAD cohorts, and support validity of findings in this select group of WTC-exposed firefighters (additional discussion of metabolites identified in the clustering can be found in the supplement).

Our study has several limitations. The metabolome provides only a contemporaneous snapshot of small organic compounds present in serum. As such, our ability to single out only WTC-LI-pathogenesis-related metabolites is limited; however, we attempted to mitigate the prevalence of spurious discoveries through implementation of machine learning algorithms. Given the limited sample size of the current study, confounders have been relatively well controlled by selection from a homogeneous subject pool adhering to numerous criteria. Assuringly, the groups significantly differ in few metabolic biomarkers, only SBP, HR and BMI; other metabolic biomarkers and biomarkers of MetSyn are not significantly different between groups. Effects due to baseline BMI differences were mitigated by case definition based on percent predicted values. Additional unknown effects due to differences in exposure and duration of exposure are also minimised by the absence of significant differences in these variables between cases of WTC-LI and controls in subjects with metabolome assessed. Because distant history of smoking can predispose to non-resolving inflammation, our primary distinction is between ‘ever’ and ‘never’ smokers. Only never-smokers were included in this analysis; the metabolomic profile of ever-smoking WTC-exposed subjects remains unknown. Additionally, further profiling, including genomic, of this cohort must be performed to understand WTC-LI susceptibility.

It is important to note that any machine learning model is specific to its training data. However, the bootstrap-aggregating procedure followed by RF helps avoid overfitting that may typically be produced by a high variable-to-observation ratio. Simply put, the potential for false discovery is decreased by the high numbers of decision trees grown. Another aspect of this analysis is its strength in identifying intrinsic data patterns. While these patterns do not establish causal relationships, we assess model validity by out-of-bag classification procedures and support findings with pertinent literature. Due to the lack of an independent validation cohort, our findings have not been validated yet, but we acknowledge that this type of external validation would significantly strengthen the replicability and generalisability of our findings and methods.

Findings from this study have identified several metabolic pathways that may contribute to the pathogenesis of WTC-LI and similar lung injury-related conditions. The clusters differentiating WTC-exposed subjects with and without lung injury were primarily composed of lipid and protein metabolites, and therefore diet may be an important modifiable determinant of WTC-LI risk.

Dietary FAs may be able to alter the lipidomic signature that characterised the WTC-LI subjects. Specifically, a diet that is low in saturated fat intake and has a low omega-6-to-omega-3 FA ratio may help to correct high ceramide and the imbalance in phospholipid-derived long-chain PUFA metabolites (eg, arachidonate, docosahexaenoylcholine), which could have downstream beneficial effects on inflammatory and insulin signalling pathways.45–47

Diet may also help address the altered amino acid profile seen in WTC-LI subjects. In patients with advanced lung disease (COPD), BCAA supplements have been found to increase BCAA concentrations and improve health outcomes;38 48 however, in COPD, inadequate intake of energy, protein and BCAAs likely contributes importantly to the low BCAA concentrations. Although there is no data yet available on the dietary intakes of WTC-LI subjects, based on their sociodemographic background and prevalence of adiposity-based chronic disease and cardiometabolic risk factors, we assume that they tend to consume adequate energy and follow a typical Western diet, which, especially for males, tends to be high in protein.49

Importantly, it remains unclear to what extent, if any, diet contributes to the metabolomic signature of WTC-LI subjects, and what is the optimal diet for preventing, managing and treating WTC-LI and similar lung injuries. Additional research, including randomised clinical trials, is needed to examine the impact of diet and other modifiable lifestyle behaviours on WTC-LI progression.