Patient-reported outcomes
The D-12 has 12 items with four-point scales of none, mild, moderate and severe in relation to breathlessness ‘these days’.11 The first seven items sum to the physical domain with scores from 0 to 21. The remaining five items sum to the affective domain scores from 0 to 15. All twelve items sum to the D-12 total scores from 0 to 36. Higher scores represent greater severity.11 Translation of the original English version of the D-12 into Norwegian followed international recommendations for forward-backwards translation of patient-reported outcome measures (PROMs)17 18 including two independent forward and backwards translations. The original time frame of ‘these days’ was retained following translation. Minor discrepancies between translations were resolved by discussion within the research group which included the translators. The Norwegian translation (online supplemental figure S1), including instructions, questions and scaling, was found to be acceptable and easy to understand following interviews with 20 patients with COPD who did not participate in the main study.
In addition to the D-12, the questionnaire included the COPD Assessment Test (CAT) and Hospital Anxiety and Depression Scale (HADS). The CAT has eight items with six-point scales with endpoint-only descriptors.19 Four items relate to symptoms and four relate to broader aspects of health and quality of life. The items sum to give a score from 0 to 40, where higher scores represent a greater impact of COPD on health. The CAT has been widely translated and has evidence for reliability, validity and responsiveness.19 20 The HADS was developed to assess mood disorder in non-psychiatric hospital outpatients and has 14 items with four-point descriptive scales that refer to the last week.21 Seven items sum to give the anxiety scale and the remainder sum to give the depression scale. Scores range from 0 to 21 with higher scores representing more symptoms. The HADS has been widely used alongside the D-129 and has evidence for internal consistency and validity in Norwegian patients with COPD.22
Clinical measures
Dyspnoea was assessed by the mMRC Dyspnoea Scale, as registered by physicians in the electronic medical record. The single-item mMRC assesses disability attributable to shortness of breath and has been widely used in Norwegian healthy and patient populations.10 Spirometry was conducted according to American Thoracic Society/European Respiratory Society recommendations using appropriate reference values.23 A 6 min walk test followed standard recommendations reporting walked distance in metres.24 Dyspnoea before and after the walk test was assessed using the Borg CR10 scale with a range of 0–10.25 Further information retrieved from medical records included age, sex, smoking status, employment status, body mass index, number of COPD-related hospitalisation in the last year, whether there was a current acute exacerbation, use of long-term oxygen treatment and comorbidities.
Statistical analysis
Descriptive statistics are presented with the mean (SD), median (range) or number (%), as appropriate. Missing data and floor and ceiling effects were assessed at the item and domain level. CFA with robust-weighted least squares (WLSMV)26 was used to assess the structural validity of the D-12, or the extent to which the item scores adequately contribute to physical, affective and overall domains.16 Model fit was assessed by the root mean square error approximation (RMSEA, acceptable fit if <0.06), the Comparative Fit Index (CFI, acceptable fit if >0.95, poor fit if <0.90, otherwise marginal) and the Tucker Lewis Index (TLI, acceptable fit if >0.95, poor fit if <0.90, otherwise marginal).16 27
Unidimensionality of D-12 domains was tested using the Rasch partial credit model. This extension of the Rasch model for polytomous items has separable item and person parameters, sufficient statistics and conjoint additivity allowing item and person comparisons.28 Overall and item fit statistics were used to assess whether items within the domains fitted the one-dimensional model. Item fit was assessed with the χ2 statistic, standardised residuals, which should be between ±2.5, and item characteristic curves. Local independence was assessed through examination of the residual correlation matrix, coefficients of ≥0.2 indicating item redundancy.29 30
Differential item functioning (DIF) occurs when different groups with the same levels of construct being measured, respond differently to an item. Consistent differences across the construct represent uniform DIF. Inconsistent differences that vary across the construct represent non-uniform DIF. Both forms of DIF were assessed for gender, pensionable age (67 years), receipt of disability payment and four comorbidities: anxiety/depression, asthma, hypertension and obesity. Differences of ≥0.5 logits in item difficulties were considered meaningful in interpretation of DIF.31 32
Internal consistency was assessed by the Person Separation Index (PSI)33 and Cronbach’s alpha.16 Estimates of 0.70 and 0.90 are necessary for group and individual comparisons respectively.16
Hypothesis testing was used to further assess the validity of D-12 scores through comparisons with those for the HADS and clinical measures.9–15 Criteria for expected levels of correlation followed those from a systematic review of PROMs.34 First, correlations ≥0.60 were expected for scores assessing the same construct: D-12 affective and HADS anxiety and depression. Second, correlations <0.60 and ≥0.30 were expected for scores assessing largely related but dissimilar constructs: D-12 domains/total and CAT. Third, correlations <0.50 and ≥0.20 were expected for moderately related but dissimilar constructs: D-12 physical/total and HADS, mMRC, Borg scores, FEV1, number of COPD hospitalisations in the last year and 6 min walk distance. Fourth, correlations <0.30 were expected for weakly related or unrelated constructs: D-12 affective and mMRC, Borg scores, FEV1, number of COPD hospitalisations in the last year and 6 min walk distance. At least 75% of the results should correspond with the hypotheses.16 34 Pearson’s correlation was used to aid comparisons with published studies, and findings are shown alongside those from a systematic review.9
Statistical analyses were undertaken using RUMM2020 V.4.1 (Rumm Laboratory, Perth, Western Australia), Lisrel version 7 (Muthe’n & Muthe’ n, Los Angeles, CA) and Stata V.15.0 (StataCorp).