Background Chronic obstructive pulmonary disease (COPD) is a heterogeneous group of lung conditions challenging to diagnose and treat. Identification of phenotypes of patients with lung function loss may allow early intervention and improve disease management. We characterised patients with the ‘fast decliner’ phenotype, determined its reproducibility and predicted lung function decline after COPD diagnosis.
Methods A prospective 4 years observational study that applies machine learning tools to identify COPD phenotypes among 13 260 patients from the UK Royal College of General Practitioners and Surveillance Centre database. The phenotypes were identified prior to diagnosis (training data set), and their reproducibility was assessed after COPD diagnosis (validation data set).
Results Three COPD phenotypes were identified, the most common of which was the ‘fast decliner’—characterised by patients of younger age with the lowest number of COPD exacerbations and better lung function—yet a fast decline in lung function with increasing number of exacerbations. The other two phenotypes were characterised by (a) patients with the highest prevalence of COPD severity and (b) patients of older age, mostly men and the highest prevalence of diabetes, cardiovascular comorbidities and hypertension. These phenotypes were reproduced in the validation data set with 80% accuracy. Gender, COPD severity and exacerbations were the most important risk factors for lung function decline in the most common phenotype.
Conclusions In this study, three COPD phenotypes were identified prior to patients being diagnosed with COPD. The reproducibility of those phenotypes in a blind data set following COPD diagnosis suggests their generalisability among different populations.
- COPD epidemiology
- COPD exacerbations
Data availability statement
Data may be obtained from a third party and are not publicly available.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
What are the characteristics of patients with chronic obstructive pulmonary disease (COPD) and a fast decline in their lung function, and can they be reproduced in different populations?
In 13 260 patients with COPD, the ‘fast decliner’ was the most common phenotype, characterised by younger patients with lung function loss with an increased number of COPD exacerbations.
The ‘fast decliner’ phenotype was reproduced in an unseen data set after COPD diagnosis. The most important risk factors for lung function decline were gender, COPD severity and exacerbations.
Chronic obstructive pulmonary disease (COPD) is a widespread group of lung diseases such as asthma, emphysema and chronic bronchitis that causes breathing difficulties as a result of fast lung function decline.1 Several studies2–5 have shown that common risk factors associated with lung function decline in patients with COPD are smoking,4 emphysema4 and severity of emphysema,3 as well COPD exacerbations2 5 along with elevated blood eosinophil counts.2 Kerkhof et al2 showed that patients with mild-to-moderate COPD with a high burden of exacerbations and elevated blood eosinophils have significant mitigation of their lung function decline when treated with inhaled corticosteroids (ICS). Despite this finding suggests that early treatment may prevent further lung function loss, the full risk profile of those patients, and their projected lung function loss, remain key issues still largely unknown and underexplored in the present literature.
In this study, we aim to tackle these issues and provide a framework to improve the characterisation of patients with COPD and a fast decline in their lung function before diagnosis. In so doing, we develop several machine learning algorithms able to predict lung function decline after diagnosis. The implementation of this approach promises to allow medical practitioners with opportunities for early intervention and prevention of lung function loss.
This study is a retrospective analysis of an observational cohort spanning through a 4 years period (2015–2018) among patients with COPD in the UK. Data were extracted from the Royal College of General Practitioners (RCGP) Research and Surveillance Centre (RSC) database,6 7 which includes more than 5 million patients, and in which over 2 million records and 500 million prescriptions (as of December 2017) are uploaded each week.8
Inclusion and exclusion criteria are shown in figure 1. The study included patients with a Read code9 for COPD diagnosis, older than 35 years, current or former smoker, without active asthma, with a forced expiratory volume in 1 s (FEV1) to forced vital capacity ratio (FEV1/FVC) of ≤0.7 (ie, the threshold for COPD diagnosis1) and who completed FEV1 records for four consecutive years. Specifically, we used FEV1 records in year 1 as a baseline, followed-up by at least 3 years of FEV1 recordings. We excluded patients younger than 35, non-smokers (as this group may be misdiagnosed with COPD and to align with National Institute for Health and Care Excellence (NICE) guidelines,10 those with active asthma and FEV1/FVC ratio of >0.7, as well as patients with less than 3 years of lung function (FEV1) values. Our inclusion and exclusion criteria yielded a total of 13 260 patients.
To identify patients with underlying COPD phenotypes we split the cohort into two groups: (a) the training data set, consisting of patients with COPD registered to a general practitioner (GP) practice before the COPD diagnosis; and (b) the validation data set that includes patients with COPD registered after their COPD diagnosis (figure 2). Thus, patients in both data sets share similar COPD-related characteristics. We divided our sample according to the COPD diagnosis date, rather than randomly, to allow our algorithms to learn patterns in the data prior to COPD diagnosis (training data set) and classify patients in an unbiased, data-driven way into clusters (phenotypes). We then used those clusters learnt in the training data set to predict new clusters for patients after COPD diagnosis (validation data set) and assessed their agreement as described below in the ‘Cluster validation after diagnosis’ section. Similarly, we trained three different regression algorithms to predict lung function decline in the training data set. We assessed their performance in the validation data set as described in the ‘Predictive models’ section.
The training data set was used to group patients of similar characteristics into distinct clusters (ie, COPD phenotypes) using k-means cluster analysis (ie, a method that splits the data into mutually exclusive groups). To apply k-means clustering, we standardised 19 clinically relevant variables (sex, body mass index, smoking, COPD severity, COPD exacerbations, emphysema, diabetes, hypertension, coronary artery disease, acute myocardial infarction, congestive cardiac failure, anxiety, depression and six types of treatment) into uncorrelated ones of the same scale. In other words, prior to cluster analysis, we reduced the dimensionality of the data from the 19 selected variables to three uncorrelated components that explained the most variability of the data by using multiple correspondence analysis (MCA)—the equivalent of principal components analysis for categorical data.11 Prior to this, we also imputed the missing values for the categorical variables of body mass index and COPD severity by using multivariate imputation by chained equations.12
Given that the choice of clusters via k-means needs to be predetermined in advance, we began our clustering procedure by performing a hierarchical cluster analysis13 which does not require a predetermined number of clusters. We used the derived dendrogram to visually assess the optimal number of clusters (figure 3). This selection process implicates following the branch of the tree with the largest height (distance from top to bottom) and drawing a horizontal line (dashed line) across the other branches. The number of times in which the horizontal line intersects the branches determines the optimal number of clusters.
To confirm the number of clusters determined by the dendrogram’s visual inspection, we performed further statistical methods, namely the elbow14 and silhouette15 methods. The elbow method measures how close subjects are within the same cluster by minimising heterogeneity (or maximising homogeneity): A lower within cluster variation indicates good compactness. The silhouette method measures how close a subject in one cluster is to subjects in neighbouring clusters by using the average silhouette width to measure the distance between clusters. Here, the bigger the average silhouette width, the larger the distance between the clusters.
Next, we applied the k-means algorithm (figure 4) using different clusters (eg, from 1 to 10 clusters). The point beyond which a further reduction in the within the sum of squares (or increase in the average silhouette width) does not change the robustness (or separation) of clusters allowed us to determine the optimal number of clusters.
Intriguingly, the silhouette plots can also be used to determine the robustness of the clusters derived by using either the hierarchical or the k-means clustering method.16 In our sample, these outputs indicate that k-means should be the preferable clustering method relative to hierarchical clustering (figure 5). This is for two main reasons. First, the average silhouette width under the k-means algorithm (figure 5; bottom plot) was bigger than the one under the hierarchical algorithm (figure 5; top plot). Second, there were more subjects with negative silhouette widths under the hierarchical algorithm than the k-means clustering—especially for clusters 1 and 3—suggesting that the latter method offers more stable clusters than the former.
We trained three regressors (decision tree, gradient boosting machine, linear regression) to predict lung function in the validation data set. We used FEV1 as the dependent variable, and the 19 variables used in the MCA step and age as predictors. Moreover, with the R library ‘caretEnsemble’,17 we constructed two ‘ensemble models’: a linear and a random forest (RF) ensemble of the above-mentioned regressors.
All algorithms were first trained and tested on 70% and 30% of the training data sets (ie, RF train and RF test; figure 2), respectively, for finely tuning —by using automated tuning with the R library ‘caret’17—of their parameters. They were then re-trained in the full training data set and tested to assess their final performance in the blind validation data set. This was done by calculating the root mean squared error (RMSE) and mean absolute error (MAE). The former is the square root of the difference between observed and predicted values (ie, the prediction errors or residuals); it shows how far from the regression line the prediction errors are and is calculated as:
where and are the observed and predicted values, respectively.
The MAE is instead the MAE between observed and predicted values; it shows the magnitude of the prediction errors and is calculated as:
where and are the predicted and observed values, respectively.
Finally, we calculated the effect of the most important predictors on lung function decline by means of the best performing model in the validation data set (ie, the one with the lowest RMSE and MAE values).
All statistical analyses were implemented with the statistical software R.18
Patient and public involvement
Patients were not involved in the design, conduct, reporting or dissemination plans of this study.
Table 1 summarises the descriptive characteristics of patients registered with a GP before and after their COPD diagnosis at the baseline (first year of the study period).
When looking at the association between FEV1 and the number of COPD exacerbations before and after COPD diagnosis (figure 6), the decline in lung function with an increased number of exacerbations appears to be faster in the period prior to COPD diagnosis than after diagnosis. Thus, we examined whether a similar pattern existed among the phenotypes we derived, as well as the extent of such a decline.
Prior to COPD diagnosis
Table 2 presents the baseline characteristics of the three clusters of patients identified for the pre-COPD diagnosis period.
Phenotype A was characterised by a higher proportion (one-third) of severe/very severe COPD (with severity being defined by the physician) and a higher number of COPD exacerbations; almost half of them had hypertension, and one-third were depressed. Almost all patients with this phenotype were treated with ICS and a combination of ICS and LABA (long-acting beta agonist) treatment, while a considerable proportion was treated with LAMA (long-acting antimuscarinic) and mucolytics. Phenotype B was characterised by patients of an older age, a higher male majority, as well as a higher proportion of overweight patients, a high prevalence of diabetes and cardiovascular comorbidities (hypertension, coronary artery disease, acute myocardial infarction, congestive cardiac failure) and depression, but the majority of them had moderate COPD severity. Almost half of the patients in this phenotype were treated with ICS and LAMA and one-third of them with an ICS and LAMA combination. Phenotype C was characterised by patients of a younger age, more than one-third of whom were overweight, but almost half of them had moderate COPD severity. Patients in this phenotype have the lowest number of COPD exacerbations and better lung (FEV1) function, yet almost half of them had hypertension and one-third of them had depression. The most frequent treatment of those patients was LAMA and mucolytics. The most noticeable patients’ characteristics for each of the three derived phenotypes are summarised in table 3.
When observing the association between lung function and number of COPD exacerbations (figure 7), the fastest decline in FEV1 was observed in patients of phenotype C: Those patients were also younger, suggesting that phenotype C can resemble the clinical features of the fast decliner phenotype.2–4
Clusters validation after diagnosis
To validate the cluster assignments derived prior to COPD diagnosis, we developed a RF model, that used the three clusters (derived by k-means clustering) as the dependent variable and the 19 categorical variables (sex, body mass index, smoking, COPD severity, COPD exacerbations, emphysema, diabetes, hypertension, coronary artery disease, acute myocardial infarction, congestive cardiac failure, anxiety, depression and six types of treatment) and age, as independent variables.
The RF model was trained on a random sample of the training data set consisting 70% of the data (RF train data set; n=8037; figure 2) and tested on the remaining 30% of the data (RF test data set; n=3445; figure 2) for internal validation.
To improve the RF model’s performance, we used a 10-fold cross-validation method. This method involves splitting the data in 10 folds (samples): the first nine of them are used for training and one for testing. Then the next nine folds are used for training and 1 for testing and so forth until each one of the 10 folds has been used for testing. We further optimised the model’s performance by applying parameter tuning.19 This led to an (internal) accuracy of 99% for predicting the same clusters in the RF test data set.
The very same model was then trained in the full training data set (both RF train and RF test) and tested to predict cluster assignments in the blind validation data set (figure 2). The predicted clusters were compared with those of the validation data set—derived with the same approach described above for the training data set, that is, data reduction and k-means clustering—using the Adjusted Rand Index20 and Jaccard Index21 for external clustering validation (ie, measuring the extent of agreement between clusters derived by two different methods). Both indices showed an agreement of 80% between the predicted clusters using the RF model and the clusters derived using k-means clustering in the validation data set.
Predicted lung function loss after diagnosis
Given the prevalence of phenotype C in the sample and the limited literature on the fast decliner phenotype at present, we can predict the lung function of patients with this phenotype by training three regressors (decision tree, gradient boosting machine and linear regression) and a linear ensemble of those regressors in the data set prior to COPD diagnosis (figure 8).
As shown in figure 8, the gradient boosting machine (gbm) performs better than the linear regression (generalised linear model (glm)) (ie, it has the lowest RMSE value) and the decision tree (rpart). Moreover, the performance of those models combined through a linear model (linear ensemble; red dashed line) is as good as the gbm model.
We then combined those three models (gbm, glm, rpart) under a RF ensemble and assessed the performance of all models after COPD diagnosis (table 4). We observe that the linear regression model performed as suitably as the gbm and the linear ensemble. In contrast, the rpart and the RF ensemble performed worst in the validation data set.
Additionally, we used a more conventional linear regression to calculate the effect of the most important predictors for lung function decline (table 5) in the training data set.
As shown in table 5, all of the above predictors explain 95% of the model’s variance. The most important predictor—that explained 36% of the variance— was sex, which was associated with a decline in lung function of 0.066 L (or 66 mL) for male compared with female patients. The second most important predictor was COPD severity, which explained 18% of the variance—where patients with moderate and severe COPD had a statistically significant lung function decline of 35 mL and 64 mL, respectively, compared with those with mild COPD. LAMA treatment was also associated with 37 mL decline in lung function as well an increased number of COPD exacerbations—ranged from 38 mL to 51 mL and 78 mL for one, two and more than two exacerbations, respectively.
The least important predictors were smoking, diabetes and LABA treatment—which explained from 2% to 4% of the variance—and predicted a lung function decline ranging from 16 mL to 24 mL and 28 mL, respectively. Age, however was associated with a statistically significant of 1.7 mL increase per year, which is not a surprising finding per se given that the fast decliner phenotype was characterised by better lung function in patients.
This study aimed to better characterise patients with COPD—in particular patients with the ‘fast decliner’ phenotype—by means of statistical and machine learning tools. Statistical methods, such as MCA11 and cluster analysis,13–15 are traditionally used in COPD research and beyond to reduce the dimensionality of the data into few uncorrelated variables that explain most of the variability and group subjects of similar characteristics into homogeneous and distinct clusters. These methods use all patients’ information by integrating demographics along with clinical and treatment characteristics. Due to COPD heterogeneity, this integration allows for better identification and characterisation of COPD phenotypes that extends beyond the typical clinical approach (ie, following the Global Obstructive Lung Disease Initiative recommendations).22 Moreover, machine learning provides researchers and practitioners with rpart, RF and gbm models23 24 that can accommodate non-linear relationships.
Here, we applied these tools to go beyond the traditional analysis of demographic and clinical characteristics of patients with COPD to predict lung function decline after their COPD diagnosis. The strengths of our approach consist (a) using a prospective longitudinal public data set, (b) a large sample size of 13 260 patients, (c) multiple imputations for handling missing values and (d) a choice of variables to be included in cluster analysis, as well the number of clusters, by combining data-driven methods with knowledge from the existing literature and clinical expertise.
The use of a large sample size allowed us to identify three distinct clusters (ie, phenotypes) of patients with different demographic, clinical and treatment characteristics prior to COPD diagnosis able predict similar clusters of patients’ profiles post-diagnosis with an 80% agreement. This encouraging finding suggests that such phenotypes can be reproduced across different data sets and populations. Another advantage of using a large sample is the ability to split the training data set randomly (ie, prior to COPD diagnosis) into RF train and RF test subsets, train the RF model on the RF train data set and validate its predictions on the RF test data set—a process called internal validation. We further validated the phenotypes on the post COPD diagnosis data set to predict cluster assignment and compare these with those derived by the k-means method.
Moreover, the 10-fold cross-validation used when training our models (ie, rpart, RF, gbm, linear regression and ensembles), along with the tuning of the models’ parameters, improves performance and avoids overfitting—a phenomenon observed when the same model is used for both training and prediction without been tested (prior to prediction) on an unseen data set (whose observations did not contribute to its training.
Finally, we ensembled the individual models (ie, rpart, RF, gbm, linear regression) by using either a linear or a RF regressor to boost their performance. We then used the model with the best performance (ie, the linear regression) to identify the most important risk factors for lung function decline in patients with the fast decliner phenotype. Two of those predictors—COPD severity and COPD exacerbations—projected a decline in lung function of more than 30 mL, which is constant with findings of similar studies.2 25Specifically, Kerkhof et al2 used multilevel mixed-effects linear regression models to determine the association between annual exacerbation rate following initiation of ICS therapy and FEV1 decline. The authors also carried out a longitudinal study of a similar sample size to ours (n=12 178 patients with mild-to-moderate COPD) and found a decline in lung function of 19 mL/year for each exacerbation for patients with blood eosinophil counts equal to or greater than 350 cells/µL not on ICS and a reduced lung function loss that ranged from 4 mL/year to 15 mL/year for those treated with ICS. In his effort to explore the heterogeneity of COPD progression, Papi et al25 reported the variability in lung function decline from the Evaluation of COPD Longitudinally to Identify Predictive Surrogate End-point (ECLIPSE) cohort.4 In this 3-year prospective study, 38% and 31% of patients had a lung function decline of more than 40 mL/year and from 21 mL/year to 40 mL/year, respectively; 23% had a 20 mL/year decrease to 20 mL/year increase in their lung function, while just 8% had more than 20 mL/year lung function increase. In our sample—patients with the fast decliner phenotype—we observed a decrease of more than 40 mL in lung function in men (54%), those with severe COPD (9%) and those with equal to or more than two COPD exacerbations (3%); a decrease between 21 mL and 40 mL was also observed in patients with moderate or very severe COPD (49%) as well in those with one COPD exacerbation (7%), diabetes (13%) and those on LAMA (31%) and LABA (13%) treatment. We also observed a decrease in lung function of 16 mL in active smokers (43%). Furthermore, in their 5-year prospective study, Nishimura et al3 classified patients with COPD into three phenotypes based on lung function loss: the fast decliners with a decline in lung function of 63±2 mL/year, the slow decliners of a 31±1 mL decline per year and the sustainers of a 2±1 mL/year decline in their lung function. The severity of emphysema was found to be independently associated with a rapid decline in lung function.
There are several limitations in our study, which also represent important calls for future research. One limitation relates to the quality of the available data, given that the data were collected from different GP practices with not standardised measurement processes. As such, the accuracy of respiratory values (eg, FEV1) reported may vary across practices. Moreover, by including patients with at least 3 years of spirometry follow-ups may improve the reliability of their lung function but could bias the results as patients with different follow-up times could be different. Another limitation is the lack of information on how the presence and/or the severity of emphysema was captured in our database. While the presence of emphysema in the RCGP and RSC database is recorded based on the clinician’s assessment,9 this is not sufficient to capture its severity. Had a severity score of emphysema, similar to the one calculated by Nishimura et al3—using a visual and computerised emphysema severity assessment—was available, our algorithm would be more accurate to predict the change in lung function attributed to this risk factor. A third limitation is the lack of biomarkers from the RCGP database, such as the eosinophil count which is a significant predictor in lung function decline.2 Should biomarkers be used as predictors, our regressors would be more accurate to predict lung function in a blind data set. Our sample also lacks detailed treatment information such as dosage and frequency of treatment intake. Should such information had added to our model, a GP could infer by what amount a treatment can be adjusted or how frequently should be taken to mitigate lung function loss. We, however believe that these are all important calls for future research and would be potentially tackled in the future by applying our models for prediction on other available COPD data sets, such as the Optimum Patients Care Research Database (OPCRD) database,26 which also contains a proper assessment of emphysema severity and biomarker information.
Despite the above limitations, this work represents, to the best of our knowledge, the first study—among those studies that have implemented machine learning to identify clinically meaningful COPD phenotypes27—that fully characterises patients with COPD with a fast decline in their lung function as well as predicts lung function loss. This was achieved using regressors ranging from the conventional linear regression to the most advanced rpart, RF and gbm.
First, we used k-means clustering to identify three COPD phenotypes prior to diagnosis. Next, using a RF model, we showed that these phenotypes can be reproduced in a different blind data set (after COPD diagnosis) by achieving a high level of agreement (80%) between the predicted cluster assignments to those derived by k-means clustering.
Additionally, we trained three models (rpart, gbm and glm) on the data set prior to COPD diagnosis and validated them after diagnosis to predict lung function loss after diagnosis. We further developed two ensembles models using either a linear or a RF model to improve the performance in the blind validation data set. We found that the most advanced machine learning models were as good as the linear regression model. This led us to identify several risk factors to predict lung function loss in patients with the fast decliner phenotype. Similar models can be developed for the other two phenotypes, which are included in our future research agenda.
Moving forward, we anticipate that validations of our framework in non-UK populations may help further understand individual patient lung function profiles, improve treatment decision-making in patients with COPD with major lung function decline and prevent lung function loss at an early stage.
Data availability statement
Data may be obtained from a third party and are not publicly available.
Patient consent for publication
University of Surrey’s Institutional Review Board (353003-352994-40371074).
We acknowledge patients for allowing their data to be used for surveillance and research. Practices who have agreed to be part of the RCGP RSC and allow us to extract and used health data for surveillance and research. Ms Filipa Ferreira from RCGP and Mr Julian Sherlock from the University of Surrey. Apollo Medical Systems for data extraction. Collaboration with EMIS, TPP, In-Practice and Micro-test CMR supplier for facilitating data extraction. Colleagues at Public Health England.
Contributors VN is responsible for conceptualisation, data curation, formal analysis, investigation, methodology, validation, visualisation, writing of the original draft, reviewing and editing the final manuscript. SM is responsible for reviewing, writing and editing the final manuscript. WG, MF and LS are responsible for providing resources, software, supervision, validation and reviewing the manuscript. DBP is responsible for conceptualisation and reviewing the manuscript. All authors approved the final version of this manuscript and agree to be accountable for all aspects of the work. VN acts as a gaurantor of the overall content of the study.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests VN is an employee of Parexel. SM is the director of the Organisational Neuroscience Laboratory. DBP declares advisory board membership with Aerocrine, Amgen, AstraZeneca, Boehringer Ingelheim, Chiesi, Mylan, Mundipharma, Napp Pharmaceuticals, Novartis and Teva; consultancy agreements with Almirall, Amgen, AstraZeneca, Boehringer Ingelheim, Chiesi, GlaxoSmithKline, Mylan, Mundipharma, Napp Pharmaceuticals, Novartis, Pfizer, Teva and Theravance; grants and unrestricted funding for investigator-initiated studies (conducted through Observational and Pragmatic Research Institute) from Aerocrine, AKL Research and Development, AstraZeneca, Boehringer Ingelheim, British Lung Foundation, Chiesi, Mylan, Mundipharma, Napp Pharmaceuticals, Novartis, Pfizer, Respiratory Effectiveness Group, Teva, Theravance, UK National Health Service and Zentiva; payment for lectures/speaking engagements from Almirall, AstraZeneca, Boehringer Ingelheim, Chiesi, Cipla, GlaxoSmithKline, Kyorin, Mylan, Merck, Mundipharma, Novartis, Pfizer, Skyepharma and Teva; payment for manuscript preparation from Mundipharma and Teva; payment for the development of educational materials from Mundipharma and Novartis; payment for travel/accommodation/meeting expenses from Aerocrine, AstraZeneca, Boehringer Ingelheim, Mundipharma, Napp Pharmaceuticals, Novartis and Teva; funding for patient enrolment or completion of research from Chiesi, Novartis, Teva and Zentiva; stock/stock options from AKL Research and Development, which produces phytopharmaceuticals; owns 74% of the social enterprise Optimum Patient Care (Australia and UK) and 74% of Observational and Pragmatic Research Institute (Singapore);); 5% shareholding in Timestamp, which develops adherence monitoring technology; is peer reviewer for grant committees of the Efficacy and Mechanism Evaluation programme and Health Technology Assessment; and was an expert witness for GlaxoSmithKline.
Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.
Provenance and peer review Not commissioned; externally peer reviewed.