Respiratory Physiology

Changes in interpretation of spirometry by implementing the GLI 2012 reference equations: impact on patients tested in a hospital-based PFT lab in a large metropolitan city

Abstract

Background The Global Lung Function Initiative (GLI-2012) focused on race/ethnicity as an important factor in determining reference values. This study evaluated the effects of changing from Canadian reference equations developed from an all-Caucasian cohort with European ancestry to the GLI-2012 on the interpretation of spirometry in a multiethnic population and aimed to identify the ethnic groups affected by discrepant interpretations.

Methods Clinically indicated spirometry in a multiethnic population (aged 20–80 years) collected from 2018 to 2021 was analysed. The predicted and lower limit of normal (LLN) values were calculated using three sets of reference equations: Canadian, GLI-race/ethnic-based (GLI-Race) and GLI-race/ethnic-neutral (GLI-Other). We compared the prevalence of concordance in the abnormal diagnoses (defined as <LLN) for forced vital capacity (FVC), forced expiratory volume in 1 s (FEV1), and FEV1/FVC among the three reference values, and evaluated whether race/ethnicity was associated with discordance.

Results Data from 406 participants were evaluated (non-Caucasian 43.6%). There was 85%–87% concordance for normal/abnormal FVC and FEV1 interpretations among the Canadian, GLI-Race and GLI-Other reference equations. In all ethnic groups, application of the Canadian references for interpretation led to a higher prevalence of abnormal (<LLN) FVC and FEV1compared with GLI-Race and GLI-Other. This trend was more prominent in Black, South-East Asian and Mixed/other ethnic groups when comparing the Canadian to the GLI-Race equations. In contrast, the discordance rates were similar among ethnic groups when compared with the GLI-Other reference equations. Interpretation of FEV1/FVC had a high rate of agreement among all equations.

Conclusion Interpretation using Canadian reference equations was associated with a higher prevalence of restrictive physiology compared with the GLI-2012 equations, particularly if the GLI-Race were used. These observations were mostly found in non-white Caucasian groups, highlighting the need to choose reference equations that reflect closely the ethnic mix of the population being evaluated in order to optimise patient management.

What is already known on this topic

  • Race/ethnicity is one of the important features in determination of normal reference values for spirometry. The Global Lung Function Initiative (GLI-2012) developed reference equations that account for differences in lung function between race/ethnic groups.

What this study adds

  • Use of the Canadian reference equations derived from an all-white Caucasian cohort with European ancestry results in overinterpretation of abnormal forced vital capacity and forced expiratory volume in 1 s (< lower limit of normal) in all ethnic groups compared with the GLI-2012. The magnitude of the discordance was more prominent in Black, South-East Asian and Mixed/other ethnic groups when compared with GLI-Race/ethnic-based reference equations.

How this study might affect research, practice or policy

  • Our data revealed the extent by which discordance in interpretations occurs in each reference equation set, the Canadian, the GLI-race/ethnic-based and GLI-race/ethnic-neutral reference equations and found that race/ethnicity was significantly associated with these discrepancies. This study suggests that lung function laboratories should carefully evaluate the choice of reference equations for the interpretation of lung function tests to better reflect the ethnic mix of the patient population to provide optimal clinical care.

Introduction

Spirometry is the most commonly used pulmonary function test (PFT) and plays a central role in diagnosis and management of lung diseases. It is interpreted in the context of reference values derived from a healthy population that is reflective of the patient being evaluated.1 2 The use of inappropriate prediction equations can lead to misinterpretation to result in missed or misdiagnosis of restrictive and/or obstructive lung disease.3 4 For these reasons, the Global Lung Function Initiative (GLI) developed new reference equations, the GLI-2012, to better reflect the patient populations with the intent of improving diagnostic acumen. GLI-2012 has created reference equations for all ages and multiethnic groups.5 In addition to the four main race/ethnicity groups (Caucasian, Black, North-East (NE) Asian and South-East (SE) Asian), GLI-2012 also provides an ‘Other’ equation that corresponds to other groups and individuals of mixed ethnic origin, by averaging the four main groups.

Studies comparing GLI-2012 to other reference equations in various races and respiratory diseases6–12 have shown that the GLI race/ethnic-based reference equations (GLI-Race) could fit the population in several validation samples. However, a recent publication found no evidence that interpretation using the GLI-Race reference equations improved the prediction of clinical events compared with the race/ethnic-neutral equations for Other/Mixed ethnicity (GLI-Other).13 Moreover, Baugh et al reported that the % predicted values for forced expiratory volume in 1 s (FEV1) and forced vital capacity (FVC) derived from GLI-other equations more accurately reflected clinically relevant outcomes than those derived from race-specific equations14 in an American population of smokers with and at-risk for chronic obstructive pulmonary disease and that race-specific equations underestimated disease severity.15

The PFT laboratories in the University of Toronto affiliated academic hospitals, a major academic medical centre in Canada, use a set of reference equations that was developed from data collected in 627 all-white Caucasian Canadians of European descent.16 Toronto is a large metropolitan city that has seen significant growth over the past several decades and is one of the most ethnically diverse populations in the North America. Data from 2016 Canadian Census show that 45% of Toronto respondents reported Asian (40%) or African (5%) ethnic origin.17 It is questionable whether the Canadian reference equations16 adequately represent the current population in the region. We postulate that the application of difference reference equations will change the interpretation of spirometry and alter the prevalence of abnormal findings. To our best knowledge, few studies have compared the Canadian16 and the GLI-2012 reference equations.18

The primary aim of this study is to determine the effects of changing from the Canadian to the 2012 GLI-Race and GLI-Other reference equations on the interpretation of spirometry in a large group of patients recently evaluated with spirometry. The secondary aim is to identify the ethnic groups where discrepancies in the interpretation occur.

Methods

Participants and classification of race/ethnicity

This is a retrospective analysis of pulmonary function data that were collected from 6 January 2018 to 8 December 2021 in ongoing research studies (REB number 19–5582 and 17–5652) where data regarding ethnicity of the study subjects were available. All participants signed informed consent. The current study included adults who were recruited as healthy control subjects and the patients followed by the General Internal Medicine or the Respirology Services and referred for PFTs for clinical assessment of respiratory symptoms. Ethnicity of participants was self-reported using categories as shown in online supplemental table S1. Only data from participants aged 20–80 years were included as the data for the Canadian reference equations only considered this age range.16

Patient and public involvement

Patients and/or the public were not involved in the design, or conduct, or reporting or dissemination plans of this study.

Spirometry

Spirometry and full pulmonary function studies were performed using the MIR Minispir (MIR, Rome, Italy) or Bodybox (Medisoft, Sorinnes, Belgium) by qualified technologists in clinic or in the Toronto General Pulmonary Function Laboratory. All testing procedures followed American Thoracic Society and European Respiratory Society Guidelines.19 Quality control audits of the data were conducted monthly. Only data that passed quality control were used for this study. For participants who had repeated testing done during the study period, only the first test was included in the analysis. Only prebronchodilator values were collected and analysed.

Predicted and the lower limit of normal values of lung function

Three sets of reference equations were used for derivation of normal reference values: (1) Canadian set16 which does not take into account race/ethnicity, (2) GLI-race/ethnic-based spirometry reference equations (GLI-Race) and (3) GLI 2012-Other equation which is race/ethnic-neutral (GLI-Other).5 For each reference set, % predicted and the lower limit of normal (LLN) values, corresponding to the lowest fifth percentile of predicted values, were calculated. For the GLI-Race calculations, we applied the GLI-2012 classifications according to the self-reported ethnicities5 20 (online supplemental table S2). For respondents who self-identified as Chinese but where the geographic region of origin in China is unknown (n=21), we considered them as either NE Asian or SE Asian categories in GLI-Race equations (online supplemental table S2). The results analysed by applying NE Asian (model 1) are shown in the manuscript, and those analysed by applying SE Asian (model 2) are found in the online supplemental materials.

Statistical analysis

Comparisons between race/ethnicity groups were conducted using one-way analysis of variance (ANOVA) or Kruskal-Wallis test for continuous variables and Pearson’s chi-square test or Fisher’s exact test for categorical variables. Kruskal-Wallis test was performed to compare LLN differences between the Canadian and GLI-2012 reference equations among ethnic groups, and Bonferroni correction was used when multiple comparisons were calculated. Abnormal FVC, FEV1, and FEV1/FVC were defined when the measured values were less than LLN. First, we compared interpretation based on the Canadian reference equations with the GLI-2012 (GLI-Race or GLI-Other) reference sets for all participants or each ethnicity group using rates of concordance and discordance. Concordance was defined as the same outcome, while discordance was defined as different outcomes when comparing the two equations. Second, univariable and multivariable logistic regression modelling were performed to identify the factors related to one of the discordant pairs in interpretations between the Canadian and GLI-2012 reference equation (abnormal (< LLN) in the Canadian reference equations and normal (≥ LLN) in GLI-2012). All statistical analyses were performed using R (V.4.1.1)/Rstudio (V.1.4.1717).

Results

During the study period, 419 patients underwent spirometry and from whom race/ethnic data were collected. We excluded 13 patients from analysis as their age range fell outside the Canadian reference equations (n=6, <20 years old; n=7, >80 years old). The demographic data revealed that 43.6% of participants were non-Caucasian with the majority self-identifying as SE Asian or Other/Mixed ethnicity (table 1). Interpretation using the Canadian reference equations led to lower percent predicted values for both FVC (FVC % predicted) and FEV1 (FEV1 % predicted) compared with the GLI-Race or GLI-Other reference equations in all participants and in all ethnic subgroups (p<0.001 respectively, paired t-test).

Table 1
|
Participant demographics and spirometry*

The concordance rates in the interpretation of abnormal FVC (defined as <LLN) between the Canadian, GLI-Race or GLI-Other reference equations are shown in figure 1A,B and table 2. Concordance was observed in 86% of participants. However, 56 participants (13.8%) whose FVC was considered abnormal according to the Canadian reference equations were interpreted to be normal by the GLI-Race or GLI-Other. This discordance rate was particularly pronounced for Black (50%), Other/Mixed (22.7%) and SE Asian (21.7%) ethnicity groups when the Canadian reference equations were compared with the GLI-Race (figure 1A and table 2). In contrast, when the interpretations based on the Canadian reference equations were compared with GLI-Other, the difference between the ethnic groups was reduced, with the exception of the Other/Mixed group which had a discordance rate of 22.7%. Similar observations were made in the diagnosis of abnormal FEV1 (FEV1<LLN), with the highest discordance rates in the Black, Other/Mixed and SE Asian groups (figure 1C,D and table 3). For the diagnosis of abnormal FEV1/FVC (FEV1/FVC<LLN), the concordance rate was more than 95% in all ethnicities among the interpretations according to the Canadian, GLI-Race and GLI-Other reference equations (online supplemental tables S3 and S4).

Figure 1
Figure 1

Stacked barchart comparing the concordant and discordant pairs for abnormal diagnosis (<LLN) when using Canadian reference equation compared with GLI-race/ethnic-based (GLI-Race) or GLI-race/ethnic-neutral (GLI-Other) equation. (A and B) FVC; (C and D) FEV1. (A) and (C) compare the Canadian reference equations and GLI-Race, and (B) and (D) compare the Canadian reference equations and GLI-Other. Data are presented as n (%). Percentages may not total 100 due to rounding. We analysed the data by classifying Asian participants as NE Asian (model 1) if they could not be classified clearly as either NE Asian or SE Asian. FVC, forced vital capacity, FEV1, forced expiratory volume in 1 s, GLI, Global Lung Function Initiative, NE, North East, SE, South East, LLN, lower limit of normal.

Table 2
|
Number of concordant and discordant interpretations for abnormal FVC (FVC<LLN) when using Canadian reference equation compared with GLI-Race or GLI-Other equation*
Table 3
|
Number of concordant and discordant interpretations for abnormal FEV1 (FEV1<LLN) when using Canadian reference equation compared with GLI-Race or GLI-Other equation*

The source of the discrepancies in the interpretations is only due to difference in the values of the LLN among the three reference sets. The Canadian reference equations consistently predicted higher LLN values for FVC and FEV1 than the GLI-Race and GLI-Other, except for NE Asian group (table 4, online supplemental figure S1). In addition, differences in the LLN for both FVC and FEV1 were significantly different between ethnic groups when either GLI-Race or GLI-Other was applied (p<0.001, respectively). These differences were more pronounced when GLI-Race was applied to the non-Caucasian ethnic groups (Black, SE Asian, Other/Mixed) in comparison with Caucasian group. Although difference in the LLN value of FEV1/FVC was also different between ethnic groups (p<0.001), the Canadian reference equation did not necessarily predict a higher LLN value for FEV1/FVC.

Table 4
|
Differences in the calculated lower limit of normal (LLN) values for FVC, FEV1 and FEV1/FVC by race/ethnicities between the different reference equations*

Next, we conducted univariable and multivariable logistic regression analyses to identify the factors that contribute to the discordance in the interpretations of FVC and FEV1 using the Canadian vs the two GLI-2012 reference sets. When GLI-Race was applied, sex, Black, SE Asian, Mixed/other ethnic groups were significantly associated with discrepancies in both FVC and FEV1 interpretation (table 5). However, when GLI-Other was applied, only male sex was found to be significant factor in the discordance of the FVC interpretation, while age and weight were found to be factors significantly associated with discordance in the FEV1 interpretation (table 6). We also repeated the same analyses by classifying Asian participants who could not be classified clearly as either NE Asian or SE Asian as SE Asian (model 2); the results were similar (online supplemental tables S5–S11). When the smoking history or types of spirometers were adjusted in the statistical models, the findings were similar (data not shown).

Table 5
|
Univariable and multivariable analysis to identify the factors causing discordance in FVC and FEV1 interpretation defined as abnormal (<LLN) in Canadian reference equations and normal (≥LLN) in GLI-Race*
Table 6
|
Univariable and multivariable analysis to identify the factors causing discordance in FVC and FEV1 interpretation defined as abnormal (<LLN) in Canadian reference equations and normal (≥ LLN) in GLI-Other*

Discussion

The present study evaluated the impact of different reference equation on interpretation of spirometry in a multi-ethnic cohort in a large Canadian city. It revealed that application of the all-Caucasian Canadian reference equations led to the over-interpretation of abnormal (<LLN) FVC and FEV1 compared with GLI-Race and GLI-Other equations in all ethnic groups. The magnitude of the discordance was especially large in Black, Mixed/other and SE Asian population. The discordance was statistically significant even after adjusting for the key factors used for derivation of the reference values, that is, sex, age and height, when the Canadian reference equations were compared with GLI-Race reference equations. Although we observed discordance between ethnic groups when comparing the Canadian and the GLI-Other reference equations, these were not statistically significant; only male sex was found to be a significant factor in the discordance of FVC while age and weight were significant factors in the discordance of FEV1. Unsurprisingly, FEV1/FVC was highly consistent between the two equations.

Although some studies have shown disagreement,21 GLI-Race reference equations generally fit for multiple race/ethnicities.6–12 The current study revealed that the Canadian reference equations, compared with GLI-Race, led to higher rates of abnormal FVC and FEV1, especially in non-Caucasian groups. Moreover, the LLN values of FVC and FEV1 did not show perfect agreement even in the Caucasian group when comparing the Canadian and GLI-Race reference equations. For the Caucasian group, predicted LLN values of FVC and FEV1 according to the Canadian reference equations were higher than those derived from reference equations developed from data of Canadian Caucasian adults22 and the third National Health and Nutrition Examination Survey (NHANES) Ⅲ reference equations for Caucasian (non-Hispanic White) ethnic group14 (online supplemental table S12). Thus, there is also the possibility of over-interpretation of abnormal lung function (<LLN) in Caucasian patients when the Canadian reference equations are applied. An Italian group compared the reference equations which were developed from several existing reference equations and concluded the necessity of applying reference equations derived from normal subjects who are as similar as possible to the study population being evaluated and using similar conditions of measurements.23 These results emphasised the importance of using appropriate reference predictions that are most representative of the population in question.

The significance of considering race/ethnicity in lung function prediction equations is not limited to genetic/racial differences. Race and ethnicity are constructed by a complex combination of social, cultural and genetic factors.24 It has been suggested that other factors, such as socioeconomic status and education, are associated with lung function.25 26 For example, Asian-Indians born in USA have higher pulmonary function compared with immigrant Asian Indians, suggesting the effect of differing environmental conditions.27 In other words, in multi-ethnic cities such as the one where this study was conducted or other large cosmopolitan cities around the world, the race/ethnicity categories may be ambiguous in many participants. A recent report of 567 Asian subjects living in the USA found that the GLI-Other reference equations adequately fitted spirometry data compared with the NHANES III and GLI-Race equations.28 Comparison of the percent predicted lung function based on GLI-Other versus the GLI-Race equations in 3972 Black participants who participated in the NHANES III study showed that FEV1 and FVC z-scores based on GLI-Other had more agreement between White and Black populations rather than GLI-Race equations.29 Although the GLI-Race references led to lower FEV1 and FVC in Black compared with the White populations, modelling of mortality risk was similar when GLI-Other was applied for lung function interpretation.29 These studies have led some scholars to suggest that consideration of race/ethnicity may be counterproductive in the interpretation of PFT.30 There is considerable debate as to whether race-specific equations or universal reference equations is superior.31 Our data revealed the extent to which discrepancies occurs in each reference set according to race/ethnic groups by comparing the Canadian reference equations to both GLI-Race and GLI-Other of the GLI. Our findings suggest that the choice of reference equations should be carefully evaluated in different ethnic groups and considered when interpreting PFT.

In the current study, there was a difference in the interpretation of FEV1 and FVC, but no discrepancy in interpretation of airway obstruction (FEV1/FVC<LLN) between the reference equations. While some have argued that incorporation of FEV1/FVC ratio in interpretations of PFTs may minimise the impact of race/ethnicity,30 recent reports of clinical outcomes in preserved ratio impaired spirometry (PRISm), defined as FEV1<80% predicted and FEV1/FVC≥70%, suggest that the use of inappropriate reference equations could have clinical consequence. In two large population studies in the USA where the NHANES III reference equation was applied31 and in a Belgian study where the GLI-Race equations were used, participants with PRISm have increased respiratory symptoms, mortality and faster FEV1 decline.32 33 Although some patients with PRISm transitioned to normal spirometry over time,32 34 early identification of this group is important. The prevalence of PRISm decreases when post-bronchodilator data are assessed.35 While our study evaluated pre-bronchodilator spirometry data, discrepancies in the interpretations of spirometry between the GLI and Canadian equations of restrictive patterns and PRISm would still be found.

There are several limitations to this study. First, GLI-2012 equations do not incorporate all race/ethnicities. For example, Black reference equations in GLI-2012 were generated by the data only from African Americans, and classification of non-African American Black individuals is inconsistent.36 37 By classifying all Black participants in the GLI-2012 Black ethnic group, we are undoubtedly not accounting for the diversity of the people from the African continent and other locations, such as the Caribbean, the geographic origin of the many of the Black population in Toronto. For other races/ethnicities not covered by GLI-2012, reference equations for geographically and ethnically proximate groups were applied according to GLI-2012.20 Second, race/ethnicity was self-reported, which may not be accurate enough for the clinical purposes as described in original GLI-2012 paper,5 as inter-racial families are common in the greater Toronto area. Third, the number of participants included in some ethnicity groups was not large although the ethnic mix is reflective of the population in the region. While our PFT Laboratory assesses 250 patients weekly, race/ethnicity data collection is not routine clinical practice. Only data from REB-approved studies where this information was collected were included in this paper. While we assessed mathematical considerations such as the difference in LLN values and the clinical concordance rate, larger-scale studies are required to validate our findings. Fourth, this study compared the Canadian and two GLI-2012 equations for interpretation of spirometry, rather than clinical outcomes or mortality risk. As the Canadian reference equations only considered the age range of 20–80 years,16 we excluded 13 patients outside of this age range. As the GLI allowed for the modelling of complex nonlinear relationships over a wide age range, this omission will not have significant impact.

It should also be noted that the purpose of the current study is not to evaluate clinical outcomes associated with the labelling of the pulmonary function pattern but rather, to evaluate the discrepancies in the prevalence of pulmonary function abnormalities when different reference equations are applied to the same test data. The observed discrepancies in the interpretation of spirometry based on the choice of reference equations provide the rationale for ongoing discussion as the physiological pattern of PFT are key early factors that determine the clinical pathway of patients with respect to subsequent investigations, treatment and other therapeutic management. Thus, the use of lung function prediction equations should be carefully considered in each medical centre to ensure best practices in providing medical care to multiethnic populations.

Conclusion

By changing from the all Caucasian Canadian reference equations to the GLI-2012, the prevalence of restrictive lung physiological patterns as defined by FVC and FEV1<LLN is expected to be decreased. The impact is particularly high in Black, SE Asian and Mixed/Other ethnic groups where the potential misclassification of restrictive defects would lead to unnecessary invasive tests such as CT imaging of the chest, lung biopsies and serological testing to arrive at a final diagnosis. Conversely, under-calling of abnormal spirometry as normal will result in missed diagnosis of lung disease. Our findings suggest that careful evaluation of the choice of reference equation and caution when interpreting lung function is required in different ethnic groups.