Article Text

Ten-year prediction model for post-bronchodilator airflow obstruction and early detection of COPD: development and validation in two middle-aged population-based cohorts
  1. Jennifer L Perret1,2,3,
  2. Don Vicendese1,4,
  3. Koen Simons1,
  4. Debbie L Jarvis5,
  5. Adrian J Lowe1,
  6. Caroline J Lodge1,
  7. Dinh S Bui1,
  8. Daniel Tan1,
  9. John A Burgess1,
  10. Bircan Erbas6,
  11. Adrian Bickerstaffe1,
  12. Kerry Hancock7,
  13. Bruce R Thompson8,
  14. Garun S Hamilton9,10,
  15. Robert Adams11,
  16. Geza P Benke12,
  17. Paul S Thomas13,
  18. Peter Frith14,
  19. Christine F McDonald2,3,
  20. Tony Blakely1,
  21. Michael J Abramson12,
  22. E Haydn Walters1,15,
  23. Cosetta Minelli5 and
  24. Shyamali C Dharmage1
  25. on behalf of the TAHS and ECRHS Investigator Groups
  1. 1Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Melbourne, VIC, Australia
  2. 2Department of Respiratory and Sleep Medicine, The Austin Hospital, Melbourne, VIC, Australia
  3. 3Institute for Breathing and Sleep (IBAS), Melbourne, VIC, Australia
  4. 4The Department of Mathematics and Statistics, La Trobe University, Bundoora, VIC, Australia
  5. 5National Heart and Lung Institute (NHLI), Imperial College London, London, UK
  6. 6School of Psychology and Public Health, La Trobe University, Melbourne, VIC, Australia
  7. 7Chandlers Hill Surgery, Adelaide, SA, Australia
  8. 8Faculty of Health, Arts and Design, Swinburne University of Technology, Hawthorn, VIC, Australia
  9. 9Department of Lung, Sleep, Allergy and Immunology, Monash Health, Melbourne, VIC, Australia
  10. 10School of Clinical Sciences, Monash University, Melbourne, VIC, Australia
  11. 11Adelaide Institute for Sleep Health (AISH), Flinders University, Adelaide, SA, Australia
  12. 12School of Public Health & Preventive Medicine, Monash University, Melbourne, VIC, Australia
  13. 13Faculty of Medicine, University of New South Wales, Sydney, NSW, Australia
  14. 14College of Medicine and Public Health, Flinders University, Adelaide, SA, Australia
  15. 15School of Medicine, University of Tasmania, Hobart, TAS, Australia
  1. Correspondence to Dr Jennifer L Perret; jennifer.perret{at}


Background Classifying individuals at high chronic obstructive pulmonary disease (COPD)-risk creates opportunities for early COPD detection and active intervention.

Objective To develop and validate a statistical model to predict 10-year probabilities of COPD defined by post-bronchodilator airflow obstruction (post-BD-AO; forced expiratory volume in 1 s/forced vital capacity<5th percentile).

Setting General Caucasian populations from Australia and Europe, 10 and 27 centres, respectively.

Participants For the development cohort, questionnaire data on respiratory symptoms, smoking, asthma, occupation and participant sex were from the Tasmanian Longitudinal Health Study (TAHS) participants at age 41–45 years (n=5729) who did not have self-reported COPD/emphysema at baseline but had post-BD spirometry and smoking status at age 51–55 years (n=2407). The validation cohort comprised participants from the European Community Respiratory Health Survey (ECRHS) II and III (n=5970), restricted to those of age 40–49 and 50–59 with complete questionnaire and spirometry/smoking data, respectively (n=1407).

Statistical method Risk-prediction models were developed using randomForest then externally validated.

Results Area under the receiver operating characteristic curve (AUCROC) of the final model was 80.8% (95% CI 80.0% to 81.6%), sensitivity 80.3% (77.7% to 82.9%), specificity 69.1% (68.7% to 69.5%), positive predictive value (PPV) 11.1% (10.3% to 11.9%) and negative predictive value (NPV) 98.7% (98.5% to 98.9%). The external validation was fair (AUCROC 75.6%), with the PPV increasing to 17.9% and NPV still 97.5% for adults aged 40–49 years with ≥1 respiratory symptom. To illustrate the model output using hypothetical case scenarios, a 43-year-old female unskilled worker who smoked 20 cigarettes/day for 30 years had a 27% predicted probability for post-BD-AO at age 53 if she continued to smoke. The predicted risk was 42% if she had coexistent active asthma, but only 4.5% if she had quit after age 43.

Conclusion This novel and validated risk-prediction model could identify adults aged in their 40s at high 10-year COPD-risk in the general population with potential to facilitate active monitoring/intervention in predicted ‘COPD cases’ at a much earlier age.

  • COPD epidemiology
  • clinical epidemiology

Data availability statement

Data are available upon reasonable request. TAHS is a cohort study with data that has been prospectively collected since 1968 and will be an ongoing resource for future epidemiological analyses. Data collection protocols have been detailed in the TAHS cohort profile paper published in 2016 (Matheson et al 2016 doi: 10.1093/ije/dyw028). The raw data have not been made widely available, but expressions of interest can be discussed with the corresponding author, Dr J Perret, and/or principal investigator, Professor S Dharmage, on an individual basis. ECRHS is a cohort study with data that has been prospectively collected since 1990 and will be an ongoing resource for future epidemiological analyses. Data collection protocols are detailed at The raw data have not been made widely available, but expressions of interest can be discussed with the principal investigator, Professor D Jarvis, on an individual basis.

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Key messages

  • How can we classify individuals at high chronic obstructive pulmonary disease (COPD)-risk to create opportunities for early COPD detection before too much lung damage has occurred?

  • Using information that is readily accessible from patients and a machine learning methodology, we have developed and validated a COPD risk-prediction model with good discriminatory ability from Australian and European general populations aged in their 40s to predict post-bronchodilator airflow obstruction approximately 10 years later.

  • This approach can classify individuals when aged from their 40s but at high or very high COPD-risk who could benefit from serial spirometry; we strengthen the rationale for smoking cessation strategies in middle-age; and advance available precision medicine.


Chronic obstructive pulmonary disease (COPD) ranks among the highest causes of potentially preventable hospitalisations,1 2 yet there is a lack of action to generate high-quality evidence to support the pre-emptive identification and/or management of individuals most at-risk. A risk-prediction approach like what is used to manage modifiable risk factors for cardiovascular disease and type II diabetes,3 4 could also be useful for COPD which is multifactorial and typically features a gradual progression of airflow obstruction that can be established by middle-age. Evaluating COPD-risk for adults aged in their 40s represents an important time window, as selected screening of high-risk individuals using spirometry could confirm disease well before they usually seek medical attention.5 Although only one study has studied the cost-effectiveness of actively finding COPD cases and found systematic case-finding could be useful if targeting older smokers,6 theoretically, appropriate and early individualised interventions have potential to favourably influence poorer lung function trajectories,7 8 and thereby slow or even prevent COPD onset. In the usual clinical scenario where healthcare professionals see patients prior to testing,9 a risk-prediction model can have both diagnostic and ‘prognostic’ features as it would cover current and onward risks and assist in determining both the need for further tests and prognosis.

Previous attempts to develop COPD risk-prediction models have been limited and include: administrative databases, which had inaccurate smoking and COPD information; case–control designs, which are prone to selection bias; and/or stepwise regression statistical models, which are inclined to overfitting.10 11 To date there has been only one externally validated risk-prediction tool that used longitudinal data but this was based on several clinical test results that would generally be unavailable to treating clinicians and their patients at the time of initial assessment.12 Furthermore, no previous risk-prediction model has incorporated changes in smoking status prior to lung function measurement to contrast continuing smokers with quitters, which would indicate the potential prospective impact of subsequent smoking behaviour.

Using data from two of the largest respiratory cohorts worldwide, the Tasmanian Longitudinal Health Study (TAHS) and European Community Respiratory Health Survey (ECRHS), we aimed to develop and validate such a COPD risk-prediction model for middle-aged adults using a ‘real world’ scenario in a general population setting.


The Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis prediction model development and validation checklist,13 and 2020 Editors’ prediction framework on prediction modelling were followed.11

Study design: development cohort

Our sample included participants from the whole-of-population TAHS cohort, born in 1961, first studied in 1968 (n=8583) and followed into middle-age (figure 1).14 15 At mean age 43 years, baseline questionnaire data from 5729 (67%) respondents were collected (online supplemental Methods E1). Approximately 10 years later, this original cohort was retraced and invited to participate in the 2012–2016 study (n=6128). Of 3609 respondents (58.9%), 2719 underwent pre-bronchodilator/post-bronchodilator (BD) spirometry (75.3%). Participants were those who had postal survey data plus 10-year smoking status/spirometry data (n=2407). Participants who reported doctor-diagnosed COPD and/or emphysema at baseline were excluded (n=15).

Figure 1

Study flow diagram of participation and non-participation in the development cohort, Tasmanian Longitudinal Health Study 1968–2016. Percentages for non-participation at subsequent follow-ups relate the proportion from the original 1968 survey. *Numbers may overlap. BD, bronchodilator; COPD, chronic obstructive lung disease.

Study design: validation cohort

ECRHS, a collaborative study of 29 centres within 14 mostly European countries, first recruited 17 250 20–44-year-old adults in the general community between 1992 and 1994 (ECRHS I),16 details of which are available at https://wwwecrhsorg/. Participants of ECRHS II completed a detailed questionnaire, work history calendar (n=9645) and pre-BD spirometry (1998–2004, n=8033, age range 26–56). ECRHS III (2008–2012) was conducted in 27 centres in which participants underwent a detailed administered questionnaire and pre-BD/post-BD spirometry (n=5970, age range 38–67). The validation sample consisted of those persons aged in their 40s who participated in ECRHS II and subsequently underwent post-BD spirometry at ECRHS III in their 50s with complete predictor data (n=1407, online supplemental figure E1).

Outcome data collection and definition

Details on lung function data collection using international standards17 and reference values18 are outlined in online supplemental Methods E3. Post-bronchodilator airflow obstruction (post-BD-AO), referred to as spirometry consistent with COPD, was defined by forced expiratory volume in 1 s (FEV1)/forced vital capacity (FVC)<5th percentile of normal predicted values following inhaled BD administered via spacer (ie, z-score<–1.645 SD).18 Using this FEV1/FVC criterion, mild-to-moderately severe post-BD-AO was defined by post-BD FEV1 ≥50% predicted, and severe-to-very severe post-BD-AO by <50% predicted.19

Prediction model development and validation


A pragmatic approach to selecting the predictor variables was adopted through using information which could be reasonably recalled in middle-age, practical to collect in primary care and feasibly harmonised with ECRHS data (online supplemental table E1,Method E2). The final input variables included: sex; current respiratory symptoms (wheezing, cough, sputum, breathlessness on exertion, chest tightness); smoking (current, duration, intensity, age-of-onset); asthma (asthma-ever, current adult asthma by age-of-onset) and socioeconomic status (occupational class, online supplemental Methods E1). Smoking at baseline and 10-year follow-up was expressed by a four-level variable: never-smoker; ex-smoker who quit before baseline; current smoker at baseline who quit before follow-up; or current smoker at follow-up. Baseline spirometry was not included as a predictor in the final model as post-BD spirometry was only collected for a subset of TAHS participants, enriched for asthma and symptoms (n=897).

Model development

Using R statistical software, we adopted randomForest,20 a flexible, non-parametric and semi-automated machine learning method that considered all possible predictors and their interactions (online supplemental Methods E4a,table E2). The model was built on four randomly selected subsets of the data (80% of 2407 observations) and tested on a distinct fifth subset (20%, ie, remaining observations), optimally tuned and internally validated using a fivefold cross-validation scheme and this process was replicated 25 times. The final model was chosen based on the maximum area under the receiver operator characteristic curve (AUCROC, that is, its ability to discriminate between participants with and without post-BD-AO), followed by maximal sensitivity. Two thresholds were used to define a positive outcome:>50% probability of being a ‘COPD case’; and the “optimal” threshold as defined by the Youden index.21 Imputation of missing data was performed using a single imputation method integral to randomForest. More detailed statistical methods are reported in online supplemental Methods E4 (online supplemental sections E4a–g, Figures E2–4).

Hypothetical cases, individualised predictions and risk classification

Using the final model, personalised predictions were calculated from different case scenarios and recalibrated using the Platt scaling method.22 Model calibration was assessed using the Hosmer-Lemeshow (HL) test that is, to assess the model’s ability to match the predictions to the observed (or actual) COPD outcomes.

COPD-risk groups were defined based on the following approximations previously used in other clinical tools3 4: minimal risk if <1% predicted probability; low 1%–5% predicted probability; moderate 5%–10% predicted probability; high 10%–20% predicted probability or very high >20% predicted probability.

External calibration and discrimination

After model development, ECRHS data were used for external validation as two participant subsets: the main validation was derived from ECRHS participants with an extended age range of 40–49 years at baseline and 50–59 years when undergoing spirometry (n=1407) to broaden the model’s transportability, and this was compared with ages similar to the development cohort that is, 40–44 years and 50–54 years, respectively (n=548). The final mean (SD) of model performance metrics was extracted from bootstrapped replications (n=50) and repeated 50 times to summarise uncertainty (online supplemental Methods E4h, table E3).


Patient and public involvement

Patients, TAHS participants or the public were not involved in the design, or conduct, or reporting, or dissemination plans of our research.


TAHS and ECRHS participants

Descriptive results

Of the 2407 TAHS participants, 4.5% (n=108) fulfilled the lung function criterion for COPD at mean age 53 (table 1). Of these 108 participants, mild, moderate and moderately severe airflow obstruction was present for 106 (98%, n=91, n=11 and n=4, respectively). Post-BD-AO of any severity was present for 11.8% (n=62) of current smokers and 12.9% (n=50) of those who reported wheezing at age 43. A total of 187 (0.52%) clinical datapoints were missing in 3.8% (n=87) participants which included two cases with post-BD-AO (online supplemental table E4).

Table 1

Characteristics of participants with and without post-BD airflow obstruction in the development and validation samples

Among 1407 ECRHS participants who had complete data, post-BD-AO was present in 6.7% (n=95) and this included 10.1% (n=39) of all current smokers and 18.8% (n=47) of those who reported wheezing at baseline. Compared with TAHS, ECRHS participants were somewhat more likely to have exertional breathlessness, be current and heavier smokers, and not have current asthma (online supplemental table E4).

TAHS and ECRHS participants who had post-BD-AO in their 50s reported more current wheeze, chronic cough, sputum and chest tightness at baseline, that is, they were substantially more symptomatic than those without post-BD-AO (table 1). There were fewer current smokers in the group with complete compared with some missing data, but otherwise there were no appreciable differences in baseline characteristics (online supplemental table E5) or spirometry (online supplemental table E6).

Internal cross-validation of the final developed model

Discrimination between the risk-predictions and observed outcome was good, with an AUCROC of 80.8% (95% CI 80.0% to 81.5%) (table 2 and figure 2). Using the Youden index,21 sensitivity was 80.3% (77.7 to 82.9) and specificity 69.1% (68.7 to 69.5). The NPV was ≥98.5% compared with a low PPV (11.1%), but this was 2.5-fold higher than the baseline prevalence of post-BD-AO (4.5%). The HL test provided reasonable evidence of calibration (p>0.13, table 2 and online supplemental figure E4). Imputing missing data did not appreciably improve the predictive model performance (AUCROC 81.1%).

Figure 2

(A–C) Area under the receiver operator characteristic curve (AUCROC). Internal validation of the main chronic obstructive lung disease risk-prediction model using complete cases in Tasmanian Longitudinal Health Study (A). External validation using the corresponding (40–44 and 50–54 years) and extended age groups (40–49 and 50–59 years) in European Community Respiratory Health Survey (B and C, respectively). The Youden index that defines the optimal cut-off as specified in table 2 is indicated by the small black dot on the corresponding curves.

Table 2

Performance metrics for the internal cross-validation and external validation of the COPD risk-prediction model, with and without imputation in the development TAHS dataset*

External validation of the final developed model

Validation in the extended age group (ie, 1407 observations) performed similarly but with greater precision than that in the restricted age group (n=548 observations) and showed fairly good discriminatory ability, that is, AUCROC 75.6 and 74.6%, respectively (table 2 and figure 2). The PPV was not appreciably different when restricted to only current smokers aged 40–49 years but was slightly higher for adults with any current respiratory symptom/s (17.9% compared with 13.7%, table 2). This PPV was 2.7-fold higher than the baseline prevalence of post-BD airflow obstruction (6.7%).

Interactions between predictors

Of 210 potential interactions, the most frequent combination was between occupational class and smoking duration. For smoking beyond 25 years duration, the 10-year predicted probabilities for post-BD-AO were around 25% (figure 3, highlighted in yellow) which increased to around 40% for the occupational classes of labourers/cleaners, intermediate production/transport, house persons but not trade workers (highlighted in orange). The example of the single classification tree in online supplemental Methods E4c, figure E2, shows the 10-level occupational variable could be split multiple times within the same individual tree, with the averaging of predicted probabilities across thousands of classification trees plausibly explained the gradient (or blurring) of colours. The frequency of interactions is illustrated by online supplemental figure E5; 8 of the 10 most frequent interactions were between the smoking variables and occupation, 2 were between asthma and occupation, and none were between smoking and asthma. The ‘multi-way importance plots’ showed that occupational class, smoking duration and age-of-asthma and smoking onsets were more significant predictors in the TAHS dataset (online supplemental figures E6 and E7).

Figure 3

Interaction plot between the effects of increasing smoking duration (0–37 years) and occupation class on post-bronchodilator airflow obstruction at age 53 years. Recalibrated predicted probabilities range between <0.1 (blue) and 0.5 (red). Occupation class categories labelled from right to left: advanced clerical services (ACS), elementary clerical services (ECS), house persons (HP), intermediate production/transport (IPT), intermediate clerical services (ICS), labourer/cleaner/related workers (LC), legislator/manager (LM), professional (Pro), technicians/associate professional (Tech) and trade/related workers (TW).

Individualised predicted probabilities and predicted occurrence

Due to the large number of potential combinations of predictors, it was not possible to present the full prediction model and predictions for all hypothetical scenarios. Selected examples of 43-year-old adults have been entered into the primary model (ie, complete cases and threshold >0.50) to predict probabilities of having the COPD outcome in their 50s. These scenarios included: an asymptomatic current smoker with varying smoking intensities/pack-years, then a current smoker with symptoms (online supplemental table E7); an ex-smoker with varying quit dates and respiratory symptoms (online supplemental table E8); a non-smoker with asthma (table 3); and comparisons between groups of quitters and continued smokers with or without active asthma at baseline (table 3).

Table 3

Hypothetical examples of individualised predictions by baseline smoking and asthma status in a high-risk occupation: risk difference with and without quitting by age 50s

Predictions for a current smoker

Predicted probabilities for post-BD-AO while aged 50s for a 43-year-old tradesman who currently smoked are presented in online supplemental table E7, while varying the daily cigarette intensity and age of smoking onset separately. Overall, the results suggest two smoking thresholds: (1) predicted risk-estimates that plateau beyond a smoking intensity of 20 cigarettes/day despite an increasing pack-year smoking history and (2) an acceleration of predicted risk-estimates beyond 20 years duration of smoking. Thus, the COPD-risk for a 43-year-old tradesman who smoked ≥20 cigarettes/day from age 18 was high (ie, predicted occurrence of one in every seven similar individuals), with and without respiratory symptoms typical of obstructive lung diseases. The predicted probability was very high if he started smoking from age 13 (1 in 3.7 persons). A twofold variation in the predicted probabilities for post-BD-AO when aged 50s was observed across the spectrum of occupations (online supplemental table E9).

Predictions for an ex-smoker

Predicted probabilities for post-BD-AO while aged 50s for a 43-year-old tradesman who had quit smoking are presented in online supplemental table E8, with varying years since quit dates (and therefore varied quit age and pack-years). These risk-estimates showed that the subgroup who quit even as recently as 12 months prior to baseline had substantially lower COPD-risk when compared with current smokers in table 3. Thus, the predicted COPD-risk for a 43-year-old ex-smoker of 25 pack-years who quit 5 years earlier, was only low-to-moderate, even in the presence of isolated respiratory symptoms typical of obstructive lung diseases. A similar 2.2-fold variation across occupational classes was also seen, however, all risk-predictions were in the low range (1.12%–2.50%) (online supplemental table E10).

Predictions for a non-smoker who has active asthma

Predicted probabilities for a 43-year-old female unskilled worker (eg, cleaner) showed that having active (current) asthma in the absence of smoking inferred moderate COPD-risk at age 50s with little variation by age-of-asthma onset (predicted probability 6%–9%, table 3) . The risk-estimate was low for remitted asthma, although the predicted occurrence was not negligible at around 1 in 38 similar persons.

Difference in COPD-risk between groups of quitters and continuing smokers

Four hypothetical examples of asymptomatic 43-year-old unskilled current heavy smokers who were partitioned into subgroups of quitters and continuing smokers over the next 10 years, with or without concurrent asthma at baseline. For current smokers without active asthma, the risk-difference in predicted probabilities between those who quit or continued smoking over the next 10 years was 22.5% (4.51% compared with 27.0%, respectively, table 3), which is equivalent to a one in 4.4–fold difference in COPD-risk. For similar smokers with active asthma, the risk-difference was 25.6% (16.4% compared with 42.0%, respectively or one in 3.9-fold difference in COPD-risk).


Using information from questionnaires that is readily accessible from patients and clinicians in a typical clinical scenario, we developed and validated a COPD risk-prediction model from general Australian and European populations aged in their 40s, to calculate 10-year COPD-risks as determined by post-BD airflow obstruction in their 50s. The variables of the final model comprised nine stem questions on known risk factors and resembled those of a basic respiratory assessment that covered participant sex, respiratory symptoms, smoking, asthma and occupation. As indicated by figure 3, online supplemental figure E5–7, our machine learning methods were able to account for the likely interactions between these predictors, especially with regards to smoking and at-risk occupations. Our risk-predictions could potentially inform on further testing of high-risk adults aged in their 40s using spirometry to uncover ‘COPD cases’ which could create opportunities for earlier detection and active intervention. However, the predictions do not relate to actual cases of clinical COPD but to spirometrically defined COPD, with and without symptoms or risk factors.

The baseline prevalence of post-BD airflow obstruction for adults aged in their 40s is low, yet our model had good discriminatory ability. The PPV in the validation subset of symptomatic adults aged 40–49 indicated that the predictions were multiple times the baseline prevalence of post-BD airflow obstruction, and was higher than the recent Lancet article that presented machine learning-based predictions of non-fatal adverse effects following an acute coronary syndrome (in theonline supplemental file).24 However, it is acknowledged that for symptomatic adults who are identified by our risk-prediction model to be at high or very high COPD-risk, approximately 5.6 spirometry tests would be performed to uncover one case of spirometrically defined COPD, with the remainder being false positive results. Using individual case scenarios as examples, our prediction model confirmed high COPD-risk smoking profiles, but also has added to the knowledge base of causal inference by challenging the assumption of dose–response associations with smoking through illustrating two threshold effects: insignificant increases in predicted probability beyond smoking >20 cigarettes/day and an escalating risk for smoking durations longer than 20 years. It also highlighted a moderate COPD-risk for active asthma in non-smokers and discovered the modest predictive ability of respiratory symptoms which in retrospect is not unexpected given active asthma and chronic bronchitis can commonly occur in the absence of airflow obstruction. The modelling also found a 2.2-fold occupational risk and these risk-predictions were surprisingly highest for unskilled workers of lower socioeconomic status rather than for trade workers, and this possibly relates to the healthy worker effect. The 10-year risk-predictions for current smokers in their 40s were substantially lower for subsequent quitters than for continuing smokers thus supporting more intensive tobacco cessation counselling and support for this age group.

Active case-finding to identify individuals with moderate-to-severe COPD is advocated by expert bodies25 26 to identify adults with early COPD and reduce morbidity, mortality and economic costs through early intervention, although conclusive evidence to support this initiative is lacking. Spirometry testing has generally been underused as a diagnostic test for COPD,27 28 despite recommendations for testing to be considered in symptomatic adults with and without known risk factors.29 30 While active case-finding in smokers is feasible and likely to be cost-effective,6 31 there has been a lack of action among primary care physicians to pre-emptively manage individuals who have relatively few symptoms.32 This is typified by the inclusion of early spirometry within the section, ‘screening tests of unproven benefit’ of the Australian primary care guidelines.29 This recommendation was largely informed by a lack of direct evidence to determine the benefits and harms of screening in asymptomatic adults even when at high risk,33 34 while comparable screening programmes for coronary heart disease and diabetes3 4 were based on only limited data before being recommended as part of routine practice.35 36 Historically, COPD may not be given equal priority by primary care physicians as the disease has traditionally been regarded to be self-inflicted with stigmatisation of affected people,37 and its multifactorial nature beyond smoking has only recently been appreciated. The health system seems to place the responsibility for COPD prevention primarily with public health initiatives, and thus, establishing the cost-effectiveness of pre-emptive identification and providing integrated research support for practice change would be needed to improve the uptake of spirometry use in primary care.

Our risk-prediction model includes known risk factors as predictors of lung function consistent with COPD, that is, smoking, active asthma38 and potential hazardous exposures from unskilled jobs, for which preventive management strategies are the cornerstone of best clinical practice. External validation in a similar population-based cohort suggests robustness in our predicted probabilities for individual case scenarios. While we acknowledge that our model alone cannot identify all individuals on an accelerated course to severe airways obstruction, this approach could help identify adults aged in their 40s who are at higher risk and may benefit from serial spirometry to detect rapidly progressive COPD as a ‘targeted intervention’. Although the use of a supervised learning model requires careful interpretation of the findings, our individualised risk-predictions might be useful in refining guideline recommendations that consider spirometry testing in adults at least 40 years of age29 who are heavy smokers,29 39 symptomatic29 40 and/or have recurrent chest infections.39 Similarly, our novel approach to partition current smokers aged in their 40s into either quitters or persistent smokers over the next 10 years could motivate middle-aged smokers to change their behaviours.41 While we did not account for the reasons underlying quitting and acknowledge that these are distinct participant subgroups, a causal interpretation is biologically plausible given smoking cessation can improve lung function trajectories,7 and asthma control.8

Strengths and limitations

By design, our risk-predictions were based on information that was easily collected and relevant to an age when early COPD begins to manifest clinically and when there is some potential for reversal or at least stabilisation. Our use of randomForest methodology was advantageous over regression methods for prediction as it could inherently accommodate non-linearity, multi-collinearity and multiple interactions (figure 3, online supplemental figure E4). External validation using general population-based data from Europe extends the generalisability to different geographical regions and to a broader age group of 50–59 years old, although validation in non-Caucasian populations is still needed.

Although much larger participant numbers such as those available in administrative health databases could have improved the predictive accuracy, our study design was superior because we used objective and individualised spirometry measurements (rather than ICD-9 codes) and a detailed smoking history. Post-BD spirometry is more relevant to clinically important COPD outcomes than pre-BD measurements,42 especially for countries with moderate-to-high asthma prevalence such as Australia. Although we did not have post-BD spirometry for the majority of participants in their 40s, we argue that this represents a usual clinical scenario when an individual is assessed for the first time.

Our selection of predictor variables could have limited our model performance as we did not have reliable data on family history of COPD/emphysema, respiratory infections and other air pollutants. Finally, this study was not designed to address causal inference and rate of lung function decline, so caution is advised when interpreting the effect size of quitting smoking on change in COPD-risk and progression to clinical COPD, respectively.


This pragmatic and validated COPD risk-prediction model could predict high or very high risk of post-BD airflow obstruction in 10 years’ time in Caucasian adults aged 40–49 years. These risk-predictions are especially relevant to COPD in the presence of respiratory symptoms, and to the asthma-COPD overlap (in the presence of current asthma). We have quantified substantial differences in COPD-risk between middle-aged quitters and continuing smokers, which provide rationale to intensify tobacco cessation strategies for smokers less than 50 years of age, especially unskilled workers with a history of asthma. This work has potential to facilitate the pre-emptive detection of COPD at a much earlier age in primary care settings.

Data availability statement

Data are available upon reasonable request. TAHS is a cohort study with data that has been prospectively collected since 1968 and will be an ongoing resource for future epidemiological analyses. Data collection protocols have been detailed in the TAHS cohort profile paper published in 2016 (Matheson et al 2016 doi: 10.1093/ije/dyw028). The raw data have not been made widely available, but expressions of interest can be discussed with the corresponding author, Dr J Perret, and/or principal investigator, Professor S Dharmage, on an individual basis. ECRHS is a cohort study with data that has been prospectively collected since 1990 and will be an ongoing resource for future epidemiological analyses. Data collection protocols are detailed at The raw data have not been made widely available, but expressions of interest can be discussed with the principal investigator, Professor D Jarvis, on an individual basis.

Ethics statements

Patient consent for publication

Ethics approval

TAHS was approved by Human Ethics Review Committees at all participating institutions, principally The Universities of Melbourne (040375) and Tasmania (H0012710). ECRHS II and III were performed with the approval of the corresponding local/regional committees for all participating centres (refer reference 23). Written informed consent was obtained from all participants.


We acknowledge the TAHS study participants and previous investigators, Dr Heather Gibson, Dr Bryan Gandevia, Dr Harold Silverstone and Dr Norelle Lickiss. We thank Professors Mark Jenkins and John Hopper (Centre for Epidemiology & Biostatistics, VIC), Dr James Markos (Launceston Hospital, TAS), Dr Richard Wood-Baker (Royal Hobart Hospital, TAS) and Dr Iain Feather (Gold Coast Hospital, Queensland) who are investigators of TAHS but not coauthors of this manuscript, for their assistance with obtaining funds and data collection. We also thank A/Professor David Johns, Dr Melanie Matheson, Professor Graham Giles, Professor Lyle Gurrin, Professor Alan James, Professor Nicholas Zwar, Professor Peter Sly and Professor Nicholas de Klerk for their input into the study design and methodology, but who are not coauthors of this manuscript. Furthermore, we recognise all the study site coordinators and respiratory scientists who collected data in the lung function laboratories of Tasmania, Victoria, Queensland and New South Wales; the research interviewers and data entry operators; and the organisational roles of Ms Cathryn Wharton and Dr Desiree Mészáros. We thank the late Stephen Morrison (University of Queensland) for his assistance with obtaining funds and collecting data and recognise the Archives Office of Tasmania for providing data from the 1968 and 1974 TAHS questionnaires and copies of the school medical records.

We also formally acknowledge the Investigators of the European Community Respiratory Health Survey (ECRHS) for their role in obtaining research funding and collection of data, as well as the support of the study coordination by the European Commission (018996), Medical Research Council, and separate grant funding for the local studies as outlined below. (A) Scientific teams of ECRHS: ECRHS II Coordinating Centre (London): P Burney, S Chinn, C Luczynska†, DJ, J Knox. Project Management Group: P Burney (Project leader-UK) S Chinn (UK), C Luczynska† (UK), D Jarvis (UK), P Vermeire† (Antwerp), J Bousquet (Montpellier), JH (Erfurt), M Wjst (Munich) RdM† (Verona), JMA (Barcelona), J Sunyer (Barcelona) CJ (Uppsala), U Ackermann-Liebrich (Basel), N Kuenzli (University of Basel and University of Southern California, Los Angeles, USA); F Neukirch (Paris), ECRHS II Participating Centres: Australia: Melbourne (M Abramson, E H Walters, J Raven), Belgium: Antwerp South, Antwerp Central (P Vermeire, JW, M van Sprundel, V Nelen) Estonia: Tartu (R Jõgi, A Soon), France: Bordeaux (A Taytard, CR),) Grenoble (IP, J Ferran-Quentin), Paris (F Neukirch, BL, R Liard, M Zureik) Montpellier (J Bousquet, P J Bousquet), Germany: Erfurt (JH, M Wjst, C Frye, I Meyer) Hamburg (H Magnussen, D Nowak), Iceland: Reykjavik (TG, E Bjornsson, D Gislason, K B Jörundsdóttir) Italy: Pavia (A Marinoni, S Villani, M Ponzio, F Frigerio, M Comelli, M Grassi, I Cerveri, AC) Turin: (RB, M Bugiani, P Piccioni, E Caria, A Carosso, E Migliore, G Castiglioni) Verona: RdM†, G Verlato, E Zanolin, SA, A Poli, V Lo Cascio, M Ferrari, I Cazzoletti) Netherlands: Groningen (M Kerkhof) Norway: Bergen (A Gulsvik, E Omenaas, CS, B Laerum) Spain: Albacete (JM-MR, E Almar, M Arévalo, C Boix, G González, J M Ignacio García, J Solera, JDamián) Barcelona (JMA, J Sunyer, M Kogevinas, JPZ, X Basagaña, A Jaen, F Burgos, C Acosta) Galdakao: (N Muñozguren, J Ramos, IU, U Aguirre) Huelva: (J Maldonado, AP-V, J L Sanchez) Oviedo (F Payo, I Huerta, A de la Vega, L Palenciano, J Azofra, A Cañada) Sweden: Göteborg (K. Toren, L Lillienberg, A C Olin, B Balder, A Pfeifer-Nilsson, R Sundberg) Uppsala: (CJ, G Boman, D Norback, G Wieslander, M Gunnbjornsdottir) Umeå (E Norrman, M Soderberg, K A Franklin, B Lundback, BF, L Nystrom) Switzerland: Basel (N Küenzli, B Dibbert, M Hazenkamp, M Brutsche, U Ackermann-Liebrich) UK: Caerphilly (M Burr†, J Layzell) Ipswich (DJ, R Hall, D Seaton) Norwich (DJ, B Harrison), ECRHS III Coordinating Centre (London): D Jarvis, P Burney, M Tumilty, J Potts. Project Management Group: D Jarvis (UK), P Burney (UK), JH (Erfurt), RdM† (Verona), JMA (Barcelona) CJ (Uppsala), K Toren (Goteburg), T Gislasson (Iceland) T Rochat (Basel), B Leyneart (Paris) C Svanes (Bergen) JW (Antwerp) JPZ (Barcelona). ECRHS III Participating Centres: Australia: Melbourne (M Abramson, G Benke, S Dharmage, B Thompson, S Kaushik, M Matheson). Belgium: South Antwerp & Antwerp City (JW, H Bentouhami, V Nelen) Estonia: Tartu (R J õ gi, H Orru) France: Bordeaux (CR, P O Girodet) Grenoble (IP, V Siroux, J Ferran, J L Cracowski) Montpellier (PD, A Bourdin, I Vachier) Paris (BL, D Soussan, D Courbon, C Neukirch, L Alavoine, X Duval, I Poirier) Germany: Erfurt (JH, E Becker, G Woelke, O Manuwald) Hamburg (H Magnussen, D Nowak, A-MK), Iceland: Reykjavik (TG, B Benediktsdottir, D Gislason, E S Arnardottir, M Clausen, G Gudmundsson, L Gudmundsdottir, H Palsdottir, K Olafsdottir, S Sigmundsdottir, K Bara-Jörundsdottir), Italy: Pavia (I Cerveri, AC, A Grosso, F Albicini, E Gini, E M Di Vincenzo, V Ronzoni, S Villani, F Campanella, M Gnesi, F Manzoni, L Rossi, O Ferraro) Turin: (M Bugiani, RB, P Piccioni, R Tassinari, V Bellisario, G Trucco) Verona: (RdM†, SA, L Calciano, L Cazzoletti, M Ferrari, A M Fratta Pasini, F Locatelli, P Marchetti, A Marcon, E Montoli, G Nguyen, M Olivieri, C Papadopoulou, C Posenato, G Pesce, P Vallerio, G Verlato, E Zanolin), Netherlands: Groningen (HMB), Norway: (CS, E Omenaas, A Johannessen, T Skorge, F Gomez Real) Spain: Albacete (JM-MR, E Almar, A Mateos, S García, A Núñez, P López, R Sánchez, E Mancebo) Barcelona:(JMA, JPZ, J Garcia-Aymerich, M Kogevinas, X Basagaña, A E Carsin, F Burgos, C Sanjuas, S Guerra, B Jacquemin, P Davd and Galdakao: N Muñozguren, IU, U Aguirre, S Pascual) Huelva: (J Antonio Maldonado, AP-V, J Luis Sánchez, L Palacios, Oviedo: (F Payo, I Huerta, N Sánchez, M Fernández, B Robles) Sweden: Göteborg (K Torén, M Holm, J-L Kim, A-C Olin, A Dahlman-Höglund), Umea (BF, L Braback, L Modig, B Järvholm, H Bertilsson, K A Franklin, C Wahlgreen) Uppsala: (B Andersson, D Norback, U Spetz Nystrom, G Wieslander, G M Bodinaa Lund, KNisser), Switzerland: Basel (NMP-H, N Künzli, D Stolz, C Schindler, T Rochat, J M Gaspoz, E Zemp Stutz, M Adam, C Autenrieth, I Curjuric, J Dratva, A Di Pasquale, R Ducret-Stich, E Fischer, L Grize, A Hensel, D Keidel, A Kumar, M Imboden, N Maire, A Mehta, H Phuleria, M Ragettli, M Ritter, E Schaffner, G A Thun, A Ineichen, T Schikowski, M Tarantino, M Tsai, UK:London (PB, DJ, S Kapur, RN, J Potts,) Ipswich: (DJ, M Tumilty, N Innes) Norwich: (DJ, M Tumilty, A Wilson). (B) Local funding grants for ECRHS testing centres: ECRHS II: Australia: NHMRC grant code 980894; Belgium: Antwerp: Fund for Scientific Research (grant code, G.0402.00), University of Antwerp, Flemish Health Ministry; Estonia: Tartu Estonian Science Foundation grant no. 4350, France: (all) Programme Hospitalier de Recherche Clinique—Direction de la Recherche Clinique (DRC) de Grenoble 2000 number 2610, Ministry of Health, Ministère de l’Emploi et de la Solidarité, Direction Génerale de la Santé, Centre Hospitalier Universitaire (CHU) de Grenoble, Bordeaux: Institut Pneumologique d’Aquitaine; Grenoble: Comite des Maladies Respiratoires de l’Isere Montpellier: Aventis (France), Direction Regionale des Affaires Sanitaires et Sociales Languedoc-Roussillon; Paris: Union Chimique Belge-Pharma (France), Aventis (France), Glaxo France; Germany: Erfurt GSF—National Research Centre for Environment and Health, Deutsche Forschungsgemeinschaft (grant code, FR1526/1-1), Hamburg: GSF—National Research Centre for Environment and Health, Deutsche Forschungsgemeinschaft (grant code, MA 711/4-1), Iceland: Reykjavik, Icelandic Research Council, Icelandic University Hospital Fund; Italy: Pavia GlaxoSmithKline Italy, Italian Ministry of University and Scientific and Technological Research (MURST), Local University Funding for Research 1998 and 1999; Turin: Azienda Sanitaria Locale 4 Regione Piemonte (Italy), Azienda Ospedaliera Centro Traumatologico Ospedaliero/Centro Traumatologico Ortopedico—Istituto Clinico Ortopedico Regina Maria Adelaide Regione Piemonte Verona: Ministero dell’Universita e della Ricerca Scientifica (MURST), Glaxo Wellcome SPA, Norway: Bergen: Norwegian Research Council, Norwegian Asthma and Allergy Association, Glaxo Wellcome AS, Norway Research Fund; Spain: Fondo de Investigacion Santarias (grant codes, 97/0035-01, 99/0034-01 and 99/0034 02), Hospital Universitario de Albacete, Consejeria de Sanidad; Barcelona: Sociedad Espanola de Neumologıa y Cirugıa Toracica, Public HealthService (grant code, R01 HL62633-01), Fondo de Investigaciones Santarias (grant codes, 97/0035-01, 99/0034-01 and 99/0034-02), Consell Interdepartamentalde Recerca i Innovacio Tecnologica (grant code, 1999SGR 00241), Instituto de Salud Carlos III; Red de Centros de Epidemiologıa y Salud Publica, C03/09, Red de Bases moleculares y fisiologicas de las Enfermedades Respiratorias, C03/011, and Red de Grupos Infancia y Medio Ambiente G03/176; Huelva: Fondo de Investigaciones Santarias (grant codes, 97/0035-01, 99/0034-01 and 99/0034-02); Galdakao: Basque Health Department Oviedo: Fondo de Investigaciones Sanitaria (97/0035-02, 97/0035, 99/0034-01, 99/0034-02, 99/0034-04, 99/0034-06, 99/350, 99/0034-07), European Commission (EU-PEAL PL01237), Generalitat de Catalunya (CIRIT 1999 SGR 00214), Hospital Universitario de Albacete, Sociedad Española de Neumología y Cirugía Torácica (SEPAR R01 HL62633-01), Red de Centros de Epidemiología y Salud Pública (C03/09), Red de Bases moleculares y fisiológicas de las Enfermedades Respiratorias (C03/011) and Red de Grupos Infancia y Medio Ambiente (G03/176); 97/0035-01, 99/0034-01 and 99/0034-02); Sweden: Göteborg, Umea, Uppsala: Swedish Heart Lung Foundation, Swedish Foundation for Health Care Sciences and Allergy Research, Swedish Asthma and Allergy Foundation, Swedish Cancer and Allergy Foundation, Swedish Council for Working Life and Social Research (FAS), Switzerland: Basel Swiss National Science Foundation, Swiss Federal Office for Education and Science, Swiss National Accident Insurance Fund; UK: Ipswich and Norwich: Asthma UK (formerly known as National Asthma Campaign). ECRHS III: Australia: NHMRC (grant code 1007965), Belgium: Antwerp South, Antwerp City: Research Foundation Flanders (FWO), grant code G.0.410.08.N.10 (both sites), Estonia: Tartu-SF0180060s09 from the Estonian Ministry of Education. France: (all) Ministère de la Santé. Programme Hospitalier de Recherche Clinique (PHRC) National 2010. Bordeaux: INSERM U897 Université Bordeaux Segalen, Grenoble: Comite Scientifique AGIR adom 2011. Paris: Agence Nationale de la Santé, Région Ile de France, domaine d’intérêt majeur (DIM) Germany: Erfurt: German Research Foundation HE 3294/10-1, Hamburg: German Research Foundation MA 711/6-1, NO 262/7-1, Iceland: Reykjavik, The Landspitali University Hospital Research Fund, University of Iceland Research Fund, ResMed Foundation, California, USA, Orkuveita Reykjavikur (Geothermal plant), Vegagerðin (The Icelandic Road Administration, ICERA). Italy: all Italian centres were funded by the Italian Ministry of Health, Chiesi Farmaceutici SpA. In addition, Verona was funded by Cariverona Foundation, Education Ministry (MIUR). Norway: Norwegian Research council grant no 214123, Western Norway Regional Health Authorities grant no 911631, Bergen Medical Research Foundation. Spain: Fondo de Investigación Sanitaria (PS09/02457, PS09/00716, PS09/01511, PS09/02185, PS09/03190), Servicio Andaluz de Salud, Sociedad Española de Neumología y Cirurgía Torácica (SEPAR 1001/2010); Sweden: all centres were funded by The Swedish Heart and Lung Foundation, The Swedish Asthma and Allergy Association, The Swedish Association against Lung and Heart Disease. Fondo de Investigación Sanitaria (PS09/02457), Barcelona: Fondo de Investigación Sanitaria (FIS PS09/00716), Galdakao: Fondo de Investigación Sanitaria (FIS 09/01511), Huelva: Fondo de Investigación Sanitaria (FIS PS09/02185) and Servicio Andaluz de Salud Oviedo: Fondo de Investigación Sanitaria (FIS PS09/03190). Sweden: all centres were funded by The Swedish Heart and Lung Foundation, The Swedish Asthma and Allergy Association, The Swedish Association against Lung and Heart Disease. Swedish Research Council for Health, Working Life and Welfare (FORTE) Göteborg: also received further funding from the Swedish Council for Working Life and Social Research. Umea also received funding from Vasterbotten Country Council ALF grant. Switzerland: The Swiss National Science Foundation (grant nos 33CSCO-134276/1, 33CSCO-108796, 3247BO-104283, 3247BO-104288, 3247BO-104284, 3247-065896, 3100-059302, 3200-052720, 3200-042532, 4026-028099). The Federal Office for Forest, Environment and Landscape, The Federal Office of Public Health, The Federal Office of Roads and Transport, The Canton’s Government of Aargan, Basel-Stadt, Basel-Land, Geneva, Luzern, Ticino, Valais and Zürich, the Swiss Lung League, the Canton’s Lung League of Basel Stadt/Basel, Landschaft, Geneva, Ticino, Valais and Zurich, SUVA, Freiwillige Akademische Gesellschaft, UBS Wealth Foundation, Talecris Biotherapeutics GmbH, Abbott Diagnostics, European Commission 018996 (GABRIEL), Wellcome Trust WT 084703MA, UK: Medical Research Council (grant no 92091). Support was also provided by the National Institute for Health Research through the Primary Care Research Network.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • JLP, CM and SCD are joint senior authors.

  • Contributors Funding acquisition: SCD, EHW, MJA, DLJ. Data curation and resources: SCD, EHW, MJA, BRT, PST, DLJ. Conceptualisation: JLP, SCD, CM, DLJ, CFM, KH, PF, TB, AB, AJL, CJL, DSB, DT, JAB, BE, GSH, RA, GPB. Data access and verification: JLP, DV, SCD, DLJ. Formal analysis: JLP, DV, KS, CFM. Investigation methodology and validation: DV, KS, CFM. Manuscript writing - original draft: JLP, DV. Manuscript writing - review & editing: all authors especially CFM, TB, MJA, BE. Project administration: SCD, DLJ. Guarantor: JLP. JLP and DV had full data access and can verify the analysis. CM and SCD contributed equally as senior authors.

  • Funding The TAHS was supported by the National Health and Medical Research Council (NHMRC) of Australia, research grants 299901 and 1021275; the University of Melbourne; Clifford Craig Foundation; the Victorian, Queensland and Tasmanian Asthma Foundations; Royal Hobart Hospital; Helen MacPherson Smith Trust; GlaxoSmithKline; and John L Hopper. JP, AL and SCD are funded through the NHMRC of Australia. The ECRHS was supported by grants from the European Commission (018996) and Medical Research Council (ECRHS III no. 92091), and multiple local grants that supported study testing centres of ECRHS II and III which have been listed in the acknowledgement section. These sponsors of the study had no role in study design, data collection, data analysis, data interpretation or writing of the report. The corresponding author had full access to all the data in the study and had final responsibility for the decision to submit for publication.

  • Competing interests JP received a travel grant supported by Boehringer-Ingelheim, and together with MJA, EHW, AL, CL and SCD, holds an investigator-initiated grant from GlaxoSmithKline for unrelated research. SCD additionally holds an investigator-initiated grant from AstraZeneca for unrelated research. MJA also holds investigator-initiated grants from Pfizer, Boehringer-Ingelheim and Sanofi for unrelated research; has undertaken an unrelated consultancy for and received assistance with conference attendance from Sanofi; and received a speaker’s fee from GlaxoSmithKline. BRT has received unrelated speaker and consultancy fees from Chiesi, Mundipharma and 4D medical. KH has received personal fees and non-financial support from Astra Zeneca, GlaxoSmithKline, Novartis, Chiesi, Boehringer Ingelheim and Teva outside the submitted work. AL has additionally received non-financial support from Primus Pharmaceuticals for unrelated research. No other authors reported financial disclosures.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.