Introduction The aim of this study was to investigate which patient-reported outcome measure was the best during the recovery phase from severe exacerbation of chronic obstructive pulmonary disease (COPD).
Methods The Exacerbations of Chronic Pulmonary Disease Tool (EXACT), the COPD Assessment Test (CAT), the St George’s Respiratory Questionnaire (SGRQ), the Dyspnoea-12 (D-12) and the Hyland Scale (global scale) were recorded every week for the first month and at 2 and 3 months in 33 hospitalised subjects with acute exacerbation of COPD (AECOPD).
Results On the day of admission (day 1), the internal consistency of the EXACT total score was high (Cronbach’s alpha coefficient=0.89). The EXACT total, CAT, SGRQ total and Hyland Scale scores obtained on day 1 appeared to be normally distributed. Neither floor nor ceiling effects were observed for the EXACT total and SGRQ total scores. The EXACT total score improved from 50.5±12.4 to 32.5±14.3, and the CAT score also improved from 24.4±8.5 to 13.5±8.4 during the first 2 weeks, and the effect sizes (ES) of the EXACT total and CAT score were −1.40 and −1.36, respectively. The SGRQ, Hyland Scale and D-12 were less responsive, with ES of −0.59, 0.96 and −0.90, respectively.
Discussion The EXACT total and CAT scores are shown to be more responsive measures during the recovery phase from severe exacerbation. Considering the conceptual framework, it is recommended that the EXACT total score may be the best measure during the recovery phase from AECOPD. The reasons for the outstanding responsiveness of the CAT are still unknown.
- COPD exacerbations
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
Which patient-reported outcome measure is the best during acute exacerbation of chronic obstructive pulmonary disease (COPD)?
Considering the conceptual framework, it is recommended that the Exacerbations of Chronic Pulmonary Disease Tool (EXACT) total score may be the best measure during the recovery phase from exacerbation.
This is the first study to draw a direct comparison among five tools, namely the EXACT, the COPD Assessment Test, the St George’s Respiratory Questionnaire, the Dyspnoea-12 and the Hyland Scale (global scale), during the recovery phase from severe exacerbation of COPD.
Acute exacerbation of chronic obstructive pulmonary disease (AECOPD) is a common cause of emergency hospitalisation and a great risk for morbidity and mortality. Although the definition has been extensively discussed in the literature around the world,1–4 AECOPD has been clearly defined by a change of the respiratory symptoms in subjects with COPD. The latest Global Initiative for Chronic Obstructive Lung Disease (GOLD) document also definitely describes an exacerbation of COPD as an acute worsening of respiratory symptoms that results in additional therapy.5 Although an event-based definition of AECOPD has been often used in large-scale clinical trials,6 7 our concern is possible underestimation of this alternative method to define the AECOPD compared with a symptom-based definition.
How should we make a decision on symptom changes beyond the normal day-to-day variation? The criteria for AECOPD developed by Anthonisen et al 8 have historically been the most frequently used method, requiring the occurrence of at least one of the following symptoms: increased dyspnoea, increased sputum volume and increased sputum purulence. Wedzicha and her colleagues have preferred to use the London COPD cohort diary cards, which were developed by their own clinics, in many pioneering research activities.9–11 Furthermore, studies that have focused on evaluating patient-reported outcomes (PROs) during AECOPD have been reported.4 12–18 Respiratory-specific questionnaires such as the St George’s Respiratory Questionnaire (SGRQ) may be useful to assess the recovery of patients during AECOPD.19 20 Thus, a standardised method to quantify and evaluate the symptoms needs to be established.
In light of the growing importance of measuring PROs, some short and simple instruments have become available especially for clinical use since the technological advances in recent years such as the Rasch Analysis have made it possible to reduce the number of items in a questionnaire. Although the COPD Assessment Test (CAT) is a simple instrument originally published by Jones et al 21–23 to measure health status in stable COPD, Mackay et al 24 evaluated the usefulness of the CAT during AECOPD to assess exacerbation severity, and they concluded that the CAT provides a reliable score of exacerbation severity.
On the other hand, a tool developed by Leidy et al that was specifically designed to quantify AECOPD, the Exacerbations of Chronic Pulmonary Disease Tool (EXACT) Patient-Reported Outcome (known as EXACT-PRO), has been reported to be reliable, valid and sensitive to change during exacerbation recovery.25–27 The EXACT is a diary of recorded symptoms during the AECOPD and conceptually different from health status measurement tools such as the CAT or SGRQ in subjects with stable COPD. Thus, both the EXACT and CAT have been validated to quantify AECOPD.24 27–29 Therefore, in the present study, we hypothesised that the EXACT would have higher responsiveness than other patient-reported measures in evaluating symptomatic changes during AECOPD.
The aim of this study was to provide an answer to the simple question of which tool is the best for assessing AECOPD and the recovery period in clinical practice. To define the most sensitive measure in detecting the changes during the recovery phase from AECOPD, we investigated and compared the responsiveness of the following outcome measures: the EXACT25–27 and CAT,21 24 as well as other patient-reported measures, namely the SGRQ,19 20 the Dyspnoea-12 (D-12)30 31 and the Hyland Scale (global scale).32 The authors investigated and compared the responsiveness of scores obtained from five different PRO measures during the recovery phase from severe exacerbation.
A total of 33 hospitalised subjects with AECOPD were recruited and followed from the Department of Pulmonary Medicine of the National Center for Geriatrics and Gerontology between September 2013 and March 2016. In the present study, an AECOPD was defined as a worsening of respiratory symptoms that required treatment with oral corticosteroids or antibiotics, or both.6 7 The inclusion criteria were (1) a clinical diagnosis of COPD, (2) age over 40 years, (3) a history of smoking (10 pack-years or greater), (4) forced expiratory volume in 1 s (FEV1)/forced vital capacity (FVC) <0.7 on or before the first day of admission, (5) absence of previous inflammatory changes on chest radiographs that influenced pulmonary function (eg, a previous thoracoplasty or tubercular sequelae), and (6) need to be hospitalised because of the aggravated symptoms of COPD compatible with exacerbation or severe exacerbation. The exclusion criteria were (1) intubation on the day of admission, (2) comorbidities such as tuberculosis, lung cancer, bronchiectasis and non-tuberculous mycobacteria, (3) non-infectious exacerbations, including exacerbations due to pneumothorax or cardiac failure alone, (4) uncontrolled comorbidity, and (5) disturbed consciousness. Since it is considered that the therapeutic management of AECOPD with no clinical signs of pneumonia should be the same as for an exacerbation of COPD as a result of pneumonia,33 we included patients with COPD complicated by pneumonia. Although all subjects were evaluated and treated as a general rule according to the international or Japanese guidelines for AECOPD, the type of treatment given was recorded throughout the present study. All eligible subjects with AECOPD underwent pulmonary function tests and arterial blood gas analysis at the baseline and on days 14, 28, 56 and 84 wherever possible. According to the method described by the American Thoracic Society and the European Respiratory Society Task Force in 2005,34 three acceptable spirometric flow–volume curves were recorded with the patient sitting (Chestac-65V; Chest, Tokyo, Japan). The highest FEV1 and the highest FVC values among the three manoeuvres were then analysed. The predicted values for the FEV1 and FVC were calculated according to the proposal from the Japanese Respiratory Society.35
Participants were requested to complete a self-administered booklet including the Japanese versions of the following five types of the PRO measures at baseline and on days 7, 14, 21, 28, 56 and 84 in the following order: EXACT, CAT, SGRQ, D-12 and Hyland Scale.
The EXACT is a 14-item diary where each attribute or item is assessed on a 5-point or 6-point ordinal scale and summed to yield a total score that is converted to a 0–100 scale, with higher scores indicating a more severe exacerbation.25–27 Three respiratory symptom domains are embedded in the instrument (breathlessness, cough and sputum, and chest symptoms), and the EXACT total scores are also calculated. The recall period was ‘today’ and patients selected the answer that best described their experience for that day. Although the EXACT should be set to use the last available baseline value to identify a new exacerbation after every event, this system is not applied in the present study since no other PRO measures have similar rules.
Disease-specific health status was assessed with previously validated Japanese versions of the CAT and SGRQ (Japanese version 2).36 37 The CAT is a questionnaire consisting of eight items scored from 0 to 5 in relation to cough, phlegm, chest tightness, breathlessness going up hills/stairs, activity limitations at home, confidence leaving home, sleep and energy. The CAT scores range from 0 to 40, with a score of 0 indicating no impairment.21 24 28 37 The SGRQ is a disease-specific instrument designed to measure impact on overall health, daily life and perceived well-being in patients with obstructive airways disease.20 36 It consists of 50 items and 76 weighted responses, with scores ranging from 0 to 100. Scores are calculated for three components: symptoms, activity and impact, as well as a total score.
To assess the severity of dyspnoea, we used the Japanese version of the D-12, which consists of 12 items (7 physical items and 5 affective items), each with a 4-point grading scale (0–3).30 31 The D-12 produces a total score (range 0–36, with higher scores representing more severe breathlessness) and two component scores: physical (items 1–7 with scores ranging from 0 to 21) and emotional (items 8–12 with scores ranging from 0 to 15). Global health was also assessed by the Japanese version of the Hyland Scale, with scores ranging from 0 to 100, where 0=‘might as well be dead’ and 100=‘perfect quality of life’.32 38
The EXACT, CAT, SGRQ, D-12 and Hyland Scale were self-administered under supervision in a booklet form. One of the authors (KoN) reviewed the surveys to ensure that subjects did not unintentionally omit the answers or provide multiple responses to any questions. Although the original developers of the EXACT recommend using an electric version such as that provided on an electronic personal digital assistant (PDA), devices with the Japanese version were not available. Therefore, all the surveys were conducted using a paper-based method without the patients knowing anything about their own previous responses, that is, without informed administration. The completed questionnaires were collected every night during the hospitalisation, and the participants were asked to bring them on their subsequent clinic visits after discharge from a hospital.
Furthermore, since the authors had speculated that both the EXACT and the CAT were worthy of our full attention with respect to the responsiveness in consideration of the literature,24 28 29 these two questionnaires were given every night during the first 4 weeks. We also intended to make a direct comparison between the two tools based on diary performance.
All results are expressed as mean±SD. Calculating Cronbach’s alpha coefficient enabled us to assess the internal consistency. The score distribution of the PRO measures was evaluated by histograms and the Shapiro-Wilk test. The effect size (ES) and the standardised response mean (SRM) were used as responsiveness indexes.39–43 The former represents the mean change in the score divided by the SD of the baseline scores. The latter represents the mean change in the score divided by the SD of the change in the score. Cohen39 suggested that ES of 0.2–0.5 were regarded as being small, 0.5–0.8 were moderate and those ≥0.8 were large although the SRM is perhaps the closest to the ES. A p value of less than 0.05 was considered to be statistically significant.
A total of 33 subjects were enrolled in the present study. They were predominantly male (87.9%), and the average age at initial hospitalisation was 75.3 years (SD=8.8). The mean FEV1 was 1.02 L at baseline in 31 subjects for whom spirometric measures were available and 1.22 L 3 months later in 30 subjects. Using the classification of severity of airflow limitation of the GOLD criteria,5 out of 30 patients studied on day 84, 5 subjects (16.6%) were in GOLD 1, 11 (36.7%) in GOLD 2, 8 (26.7%) in GOLD 3 and 6 (20.0%) in GOLD 4. As nine patients (27.2%) were diagnosed with COPD for the first time, they had not received treatment prior to admission. Twenty-four patients (72.7%) were treated with single or dual inhaled bronchodilators. Nineteen subjects (57.6%) were also given inhaled corticosteroids. Five patients (15.2%) were treated with long-term oxygen therapy before admission. Two were receiving continuous positive airway pressure treatment for the comorbidity of obstructive sleep apnoea syndrome.
The average length of stay (LOS) in hospital was 16.5 days, and the median value was 15 days. The LOS was longer than 30 days in two cases (6.1%), both due to complications. All the participants except for one were given varying doses of both oral or intravenous corticosteroids and antibiotics. The relapse of an acute exacerbation was detected during the 84-day study periods in 7 out of 33 subjects (21.2%), and 2 were readmitted to our hospital due to relapsing exacerbation. Three patients could not be followed at the end of the study period since they had moved to other clinics for treatment for unrelated conditions.
Internal consistency and the distribution of test scores on the day of admission
The internal consistency assessed with Cronbach’s alpha coefficient was high (α=0.89) for the EXACT total score and that of the three domain scores ranged from α=0.64 to α=0.93 (table 1). The internal consistency of the D-12 total score was higher (α=0.97) than that of the CAT score and SGRQ total and three components (ranging from α=0.78 to α=0.93).
Frequency distribution histograms of the scores obtained are shown in figure 1. Although the EXACT total, CAT, SGRQ total and Hyland Scale scores obtained on day 1 appear to be normally distributed (Shapiro-Wilk tests, p=0.235, p=0.149, p=0.541 and p=0.052, respectively), the normality of the score distribution of the D-12 score was rejected using the Shapiro-Wilk tests (p=0.001). Neither floor nor ceiling effects were observed for the EXACT total and SGRQ total scores. While a floor effect appeared for the Hyland Scale and D-12, a slight ceiling effect was found for the CAT score (table 1).
The EXACT total score (possible range: 0–100) improved from 50.5±12.4 to 32.5±14.3 and the CAT score (possible range: 0–40) also improved from 24.4±8.5 to 13.5±8.4 during the first 2 weeks (table 2). The speed of the recovery was the fastest during the initial period (figure 2). After that, the EXACT total score made up the transition to 33.4±14.9, CAT score to 13.3±8.4, and the SGRQ total score from 55.1±19.8 to 42.1±23.3, D-12 score from 10.3±9.7 to 4.3±5.7, and Hyland Scale score from 42.6±20.7 to 61.0±21.6 during the first 28 days (table 2 and figure 3).
The responsiveness of the PRO measures from baseline to 7, 14, 21, 28, 56 and 84 days later was evaluated by ES and SRM (tables 3 and 4). During the first 2 weeks, the ES of the EXACT total and CAT scores were −1.40 and −1.36, respectively. The SGRQ, Hyland Scale and D-12 were less responsive, with respective ES of −0.59, 0.96 and −0.90 (table 3). For the same period, the SRMs of the EXACT total and CAT scores were −1.13 and −1.32, respectively. The SGRQ, Hyland Scale and D-12 were less responsive, with respective SRMs of −0.96, 0.98 and −0.85 (table 4). Therefore, the EXACT total and CAT scores showed the best responsiveness.
This is the first study to compare the responsiveness among several PRO measures during the recovery phase from severe exacerbation of COPD, and to draw a direct comparison between the details of the EXACT and CAT. First, the present study has achieved outstanding results for the responsiveness of both the EXACT and CAT compared with other PRO measures, including the SGRQ, Hyland Scale as well as the D-12. Second, we failed to identify any significant differences between the EXACT and CAT, although we examined internal consistency and the distribution of the scores on the day exacerbation occurred, magnitude of score change, and ES as well as SRM. Surprisingly, the CAT had a better response than expected despite the core concept that the CAT was developed to measure health status in subjects with stable COPD as opposed to those with AECOPD. Our hypothesis that the EXACT is more responsive than the CAT could not be supported in the present study.
The magnitude of the score change of the PRO measure during AECOPD may depend on many factors, including the severity of the AECOPD, the severity of the baseline COPD, the exacerbation patterns and speeds, and the kind of treatment and management. Andersson et al 44 first reported in 2002 that the mean SGRQ total score was 54 units at a severe exacerbation of COPD, and 49 units after 3 and 6 months, a difference of 5 units. Differences in the scores between an exacerbation and the stable period were very large in the Gemifloxacin Long-term Outcomes in Bronchitis Exacerbations (GLOBE) study.19 In exacerbators who did not relapse, the improvement of the SGRQ total score during the first 4 weeks and the subsequent 22 weeks was 11.8 and 5.2 units, respectively. The SGRQ total scores and the magnitude of the score improvement after AECOPD in the present study appeared similar to those recorded in previous studies.
On the other hand, the EXACT was developed as a symptom diary designed specifically to quantify exacerbations in COPD. That work conducted by Leidy and a collaborative group including the Food and Drug Administration demonstrated that its 14 items met the criteria for a unidimensional measure of exacerbation severity, providing a daily diary for detecting and quantifying exacerbation severity. Therefore, the EXACT has been used as one of the outcome measures in some large-scale clinical trials.45 46 Consequently, the EXACT is currently believed to provide valuable information on the course of AECOPD, although some still may hold strong views against it.29 Although the EXACT was developed to be administered by an electronic device such as a PDA, paper-based questionnaires were used in the present study. This might be related to underestimation of the responsiveness of the EXACT, compared with the CAT.
The speed of recovery after the onset of exacerbation was the fastest between day 1 and day 14 since ES and SRM were the largest for this period in the present study. Although Leidy et al 27 46 described the figures showing the mean EXACT total scores on days 1–27 of an exacerbation in their validation study, as well as pooled data from two 12-week phase II international, randomised controlled trials, the pattern of the EXACT total scores appears similar to that in the present study. The scores of the EXACT were likely kept constant after day 14 in both studies.
The CAT is a validated simple and short questionnaire designed to assess and quantify health status for subjects with stable COPD. Health status measurements have not been developed to detect changes during short periods, as recall periods are usually over 2 weeks or a month. For example, the most adequate recall period is thought to be 3 months for the SGRQ. Therefore, we were surprised that Mackay et al were successful in evaluating the usefulness of the CAT to assess exacerbation severity.24 In that study, the CAT scores rose from an average baseline value of 19.4 to 24.1 during AECOPD, and the median recovery time to baseline was 12 days. The magnitude and the speed of recovery as reported by the CAT were also similar to the present study but much faster than that reported with the SGRQ in the GLOBE study.19 The reasons for the outstanding responsiveness of the CAT are still unknown, but one possible explanation may have been due to the mode of the administration as a daily diary. In the present study, since paper-based questionnaires were returned after completion every evening during the hospitalisation, participants completed them without referring to the previous answers, or informed administration. The administration mode may not appear to be related to the high responsiveness of the CAT.
In the present study, dyspnoea, one of the COPD-specific symptoms, was measured by the D-12 and global perception of health-related quality of life assessed by the Hyland Scale. Although the main reason behind their inclusion was to compare between tools with different concepts, the responsiveness of both instruments was also shown to be high. This may be suggestive of the multicomponent condition of COPD and AECOPD.
We should mention that one of the main limitations of the current evaluation is that only a small proportion of hospitalised patients with AECOPD were included in the present study since there were a number of patients who were unable to provide answers due to their severe physical incapacity. Another problem was that the present study was limited by the small number of participants included. However, this represents all of the patients with AECOPD who were able to respond adequately to questionnaires in this hospital during the study period. Although it has been reported that the prevalence of COPD in Japan is similar to that in Western countries by a general population sample study, Japanese healthcare providers still feel that COPD is less frequent.47 48 The LOS in the hospital, which is less than 10 days in most Western countries, is clearly longer in Japan, although this value is about the same as the average LOS for general acute hospitalisations in Japan.48 49
Three main conclusions may be drawn from our findings. First, the EXACT total and CAT scores are shown to be the most responsive measures during the recovery phase from AECOPD among various PRO measures. Second, our hypothesis that the EXACT is more responsive than the CAT could not be supported in the present study since we failed to identify any significant differences between the EXACT and CAT, although we examined the internal consistency and the distribution of the scores on the day exacerbation occurred, magnitude of score change, and ES as well as SRM. Third, surprisingly the CAT had a better response than expected despite the core concept that the CAT was developed to measure health status in subjects with stable COPD as opposed to those with AECOPD. The reasons for the outstanding responsiveness of the CAT are still unknown.
The authors would like to thank Nancy Kline Leidy for permission to use the Japanese version of the EXACT.
Contributors KN contributed, as the principal investigator, to the study concept and design, analysis of the results, and writing of the manuscript. SN, MK and RS contributed to performance of the study and acquisition of data. KN contributed to statistical analysis. YH contributed to the interpretation and editing of the manuscript. TO contributed to statistical analysis, and the interpretation and editing of the manuscript. All authors read and approved the final manuscript.
Funding This study was partly supported by the Research Funding for Longevity Sciences (27-10) from the National Center for Geriatrics and Gerontology (NCGG), Japan.
Competing interests None declared.
Patient consent Obtained.
Ethics approval Approved by the Institutional Ethics Committee of the National Center for Geriatrics and Gerontology (no 638-4).
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement No additional data are available.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.