Interstitial lung disease

Responsiveness and meaningful change thresholds of the Living with Pulmonary Fibrosis (L-PF) questionnaire Dyspnoea and Cough scores in patients with progressive fibrosing interstitial lung diseases

Abstract

Background The Living with Pulmonary Fibrosis (L-PF) questionnaire assesses symptoms and quality of life in patients with fibrosing interstitial lung diseases (ILDs). Its Dyspnoea and Cough domains, whose items’ responses are based on a 24-hour recall, have scores ranging from 0 to 100, with higher scores indicating greater symptom severity. We evaluated the ability of these domain scores to detect change and estimated their meaningful change thresholds in patients with progressive fibrosing ILDs.

Methods The INBUILD trial enrolled subjects with progressive fibrosing ILDs other than idiopathic pulmonary fibrosis. The L-PF questionnaire was completed at baseline and week 52. The responsiveness of the Dyspnoea and Cough scores was evaluated by comparing changes in these scores with 52-week changes in three anchors: forced vital capacity % predicted and two self-reported items, one for global physical health and one for global quality of life. We used a triangulation approach including anchor-based and distribution-based methods to estimate meaningful change thresholds.

Results The analyses included 542 subjects with an L-PF Dyspnoea score at baseline and week 52, and 538 subjects with an L-PF Cough score at baseline and week 52. The L-PF Dyspnoea and Cough scores were responsive to change over 52 weeks. Triangulation of anchor-based and distribution-based estimates resulted in meaningful change thresholds of 6 to 7 points for the L-PF Dyspnoea score and 4 to 5 points for the L-PF Cough score to differentiate subjects who were stable or improved from those who deteriorated.

Conclusion These analyses support the responsiveness, one aspect of validity, of the L-PF Dyspnoea and Cough domains scores as measures of symptom severity in patients with progressive fibrosing ILDs. Estimates for meaningful change thresholds in these domain scores may be of value in interpreting the effects of interventions in these patients.

Trial registration number NCT02999178.

Key messages

What is already known on this topic

  • The Living with Pulmonary Fibrosis (L-PF) questionnaire was developed to assess quality of life in patients with fibrosing interstitial lung diseases (ILDs), but little was known about its responsiveness to change.

What this study adds

  • In patients with progressive fibrosing ILDs, the L-PF Dyspnoea and Cough scores are responsive to changes in patients’ health status and quality of life.

How this study might affect research, practice or policy

  • These analyses provide estimates for meaningful change thresholds for the L-PF Dyspnoea and Cough scores that may be of value in interpreting the effects of interventions.

Introduction

Idiopathic pulmonary fibrosis (IPF) is an inexorably progressive fibrosing interstitial lung disease (ILD).1 A number of other ILDs may also be associated with a progressive fibrosing phenotype, characterised by an increasing extent of fibrosis, decline in lung function, worsening symptoms and quality of life and early mortality.2–4 In patients with fibrosing ILDs, dyspnoea, cough and fatigue can affect patients’ physical and emotional well-being and health-related quality of life (HRQL),5 which tends to decline as patients’ lung function worsens.6 7

Patient-centred outcomes are important tools for assessing the effects of disease and interventions on aspects of patients’ lives, including symptoms and HRQL.8 The Living with Idiopathic Pulmonary Fibrosis (L-IPF) questionnaire, which includes two modules that assess symptoms or their impacts, was developed to assess health status and quality of life in patients with IPF. This questionnaire demonstrated sound psychometric properties in these patients, including discrimination between those with different disease severities.9

The Living with Pulmonary Fibrosis (L-PF) questionnaire is a slightly modified version of the L-IPF questionnaire. Like L-IPF, it has two modules, Symptoms and Impacts; the Symptoms module comprises three domains: Dyspnoea, Cough and Fatigue. L-PF is intended to be used in patients with all forms of progressive fibrosing ILD, including IPF; thus, the goal is for this questionnaire to replace the L-IPF questionnaire. Debriefing interviews of patients with progressive fibrosing ILDs other than IPF indicate that the L-PF has excellent face validity, that its concepts are relevant and that its items are understood as intended.10

The responsiveness of a patient-reported outcome, that is, its capacity to detect change in the target construct, as determined by clinically relevant outcomes or patients’ perceptions (anchors) is an important aspect of its validation.11–13 In this study, we evaluated the responsiveness of the L-PF Dyspnoea and Cough domain scores and estimated meaningful change thresholds (minimal clinically important differences) in these scores in patients with progressive fibrosing ILDs other than IPF.

Methods

Trial design

The INBUILD trial enrolled subjects with progressive fibrosing ILDs other than IPF. The trial design has been described and the protocol is publicly available.14 Briefly, eligible subjects had a physician-diagnosed ILD other than IPF; reticular abnormality with traction bronchiectasis, with or without honeycombing, of >10% extent on high-resolution CT (HRCT); forced vital capacity (FVC) ≥45% predicted and diffusing capacity of the lungs for carbon monoxide ≥30% to <80% predicted. Subjects met one of the following criteria for ILD progression within the 24 months before screening despite management deemed appropriate in clinical practice: relative decline in FVC ≥10% predicted; relative decline in FVC ≥5 to <10% predicted and worsened respiratory symptoms and/or increased extent of fibrosis on HRCT; worsened respiratory symptoms and increased extent of fibrosis on HRCT. Subjects were randomised to receive nintedanib or placebo. The primary end point (annual rate of decline in FVC) was assessed over 52 weeks.

The L-PF questionnaire was completed at baseline and week 52. The L-PF questionnaire comprises 44 items: 23 in the Symptoms module and 21 in the Impacts module. Recall for items in the Symptoms module is the past 24 hours. Recall for items in the Impacts module is the past week. Domain and total scores range from 0 to 100, with higher scores indicating greater impairment. The L-PF questionnaire is accessible via: https://eprovide.mapi-trust.org/instruments/living-with-pulmonary-fibrosis-l-pf-impacts-questionnaire and https://eprovidemapi-trustorg/instruments/living-with-pulmonary-fibrosis-l-pf-symptoms-questionnaire.

Analyses

Analyses were conducted in subjects who received ≥1 dose of trial medication and had an L-PF Dyspnoea domain score (for analyses of this score) or L-PF Cough domain score (for analyses of this score) at baseline and at week 52. Data from the nintedanib and placebo groups were pooled. The responsiveness of the L-PF Dyspnoea and Cough scores was evaluated by comparing mean changes at week 52 across changes in three anchors at week 52: (1) absolute change from baseline in FVC % predicted; (2) absolute change from baseline in global physical health self-assessment (L-PF Impacts module, item 20: on average, over the last 7 days, how have you felt in terms of physical health? Scale: 0 (extremely poor) to 4 (excellent)), and 3) absolute change from baseline in the global quality of life self-assessment (L-PF Impacts module, item 21: on average, over the last 7 days, how has your quality of life been? Scale: 0 (extremely poor) to 4 (excellent)).

Changes in FVC % predicted at week 52 were categorised as follows: large deterioration (decline >10% predicted); moderate deterioration (decline >5% and≤10% predicted); minimal deterioration (decline >2% and≤5% predicted); stable (decline ≤2% predicted or increase ≤2% predicted); minimal improvement (increase >2% and≤5% predicted); moderate improvement (increase >5 and≤10% predicted); large improvement (increase >10% predicted). Changes in each global rating item at week 52 ranged from −4 to +4 and were categorised as follows: large deterioration (−3 or −4); moderate deterioration (−2); minimal deterioration (−1); stable (0); minimal improvement (+1); moderate improvement (+2); large improvement (+3, +4). We used one-way analysis of variance models to examine whether changes in the Dyspnoea and Cough domain scores differed significantly across anchor strata. Scheffe’s method was used for pairwise comparisons.

For the threshold analysis, point estimates for meaningful change were considered to be half-way between the mean changes in scores in subjects who were stable and in subjects who had minimal decline in FVC % predicted or minimal/moderate deterioration in the global rating anchors. To refine the thresholds of meaningful change for the global rating anchors, we also considered the half-way point between the mean changes in scores in the stable and minimal deterioration groups.

We used receiver operating characteristic (ROC) curves and Youden’s index15 to identify thresholds that maximised sensitivity and specificity of the Dyspnoea or Cough scores to differentiate subjects who deteriorated (decline in FVC >2% predicted or change in global rating anchors of −4 to −1) from those who were stable/improved (improvement or decline in FVC ≤2% predicted or change in global rating anchors of 0 to +4).

Distribution-based analyses were performed to provide supplementary results. We evaluated the SEM, estimated as the baseline SD of the measure multiplied by the square root of 1 minus its reliability coefficient, and 0.2×SD and 0.5×SD of the scores at baseline. One SEM may be considered a meaningful change threshold16 17 and changes of 0.5×SD and 0.2×SD may be considered upper and lower boundaries for a meaningful change.18

Patient and public involvement

Patients and/or the public were not involved in the design, conduct, reporting or dissemination plans of this research.

Results

A total of 663 subjects were enrolled in the INBUILD trial at 153 sites in 15 countries. Their baseline characteristics have been described.14 Briefly, mean (SD) age was 65.8 (9.8) years and FVC was 69.0 (15.6) % predicted; 53.7% of subjects were male and 62.1% had a usual interstitial pneumonia-like fibrotic pattern on HRCT. The most common diagnoses were hypersensitivity pneumonitis (26.1%), autoimmune disease-related ILDs (25.6%), idiopathic non-specific interstitial pneumonia (18.9%) and unclassifiable ILD (17.2%). A total of 542 subjects with an L-PF Dyspnoea score at baseline and week 52, and 538 subjects with an L-PF Cough score at baseline and week 52, were included in these analyses.

Responsiveness of L-PF Dyspnoea and Cough scores

There were large and statistically significant differences in changes in Dyspnoea and Cough scores between subjects with stable versus large deterioration in FVC % predicted and between subjects with minimal versus large deterioration in FVC % predicted (table 1). There was a statistically significant difference in change in Dyspnoea score between subjects with moderate versus large deterioration in FVC % predicted. There were no statistically significant differences in changes in Dyspnoea or Cough scores between subjects with stable versus minimal deterioration in FVC % predicted. Changes in Dyspnoea and Cough scores were significantly different between subjects with minimal/moderate deterioration versus minimal/moderate improvement in either global rating anchor (tables 2 and 3).

Table 1
|
Changes in L-PF Dyspnoea and Cough scores across strata of 52-week change in FVC % predicted strata
Table 2
|
Changes in L-PF Dyspnoea and Cough scores across strata of 52-week change in global physical health score
Table 3
|
Changes in L-PF Dyspnoea and Cough scores across strata of 52-week change in global quality of life score

Meaningful change thresholds in L-PF Dyspnoea score

For the Dyspnoea domain, the half-way points between changes in scores for subjects who were stable and those with minimal deterioration in FVC % predicted, global physical health score and global quality of life score were 1.1, 7.1 and 7.6, respectively. Similar half-way points were observed for the global rating anchors when minimal/moderate deterioration was considered instead of minimal deterioration (table 4).

Table 4
|
Meaningful change thresholds for L-PF Dyspnoea and Cough domain scores

For the Dyspnoea score, ROC analyses revealed meaningful change thresholds between deterioration and stability/improvement of 5.6 for FVC % predicted, 6.3 for global physical health and 1.7 for global quality of life (table 5).

Table 5
|
Results from receiver operating characteristic (ROC) curve analysis: sensitivity and specificity of L-PF questionnaire Dyspnoea and Cough scores to distinguish deterioration (vs stability/improvement) based on FVC % predicted, L-PF global physical health and quality of life scores

In distribution-based estimates of thresholds of meaningful change in the Dyspnoea score, the SEM was 4.4, 0.2×SD was 4.3 and 0.5×SD was 10.8.

Triangulation of the anchor-based and distribution-based estimates for the Dyspnoea domain score resulted in a meaningful change threshold of 6 to 7 points to differentiate subjects who were stable from those who deteriorated.

Meaningful change thresholds in L-PF Cough score

For the Cough domain, the half-way points between changes in scores for subjects who were stable and those with minimal deterioration in FVC % predicted, global physical health score and global quality of life score were −2.8, 2.8 and 3.5, respectively (table 4). Similar half-way points were observed for the global ratings anchors when minimal/moderate deterioration was considered instead of minimal deterioration (table 4).

For the Cough score, ROC analyses revealed meaningful change thresholds between deterioration and stability/improvement of 4.2 for FVC % predicted, 16.7 for global physical health and 4.2 for global quality (table 5).

In distribution-based estimates of thresholds of meaningful change in the Cough score, SEM was 8.6, 0.2×SD was 5.3 and 0.5×SD was 13.3.

Triangulation of the anchor-based and distribution-based estimates for the Cough domain score resulted in a meaningful change threshold of 4 to 5 points to differentiate subjects who were stable from those who deteriorated.

Discussion

Our analyses suggest that in patients with progressive fibrosing ILDs other than IPF, the Dyspnoea and Cough domain scores from the L-PF questionnaire Symptoms module are responsive to changes in disease severity and in patients’ perceptions of their physical health and quality of life. We observed significant differences in changes in L-PF Dyspnoea and Cough scores between subjects who had a large deterioration in FVC % predicted versus those with stable FVC % predicted, and between subjects who experienced deterioration versus improvement in global assessment anchors.

There is no consensus on the best approach to estimating meaningful change thresholds for patient-reported outcomes.13 18 Food and Drug Administration guidance recommends that anchor-based approaches incorporate ‘patient ratings’ of change12; however, such transition items, which require patients to assess their current state, recall their prior state and mentally subtract the difference (eg, “Is your shortness of breath a lot better/the same/a lot worse?”), are fraught with problems. Ideally, the correlation between the transition item and baseline score is equal and opposite to the correlation between the transition item and the score at follow-up, but with recall periods of longer than 4 weeks, transition ratings tend to be (inappropriately) highly correlated with the patient’s current state.19 The two patient response anchors we used alleviated this potential for bias by asking patients to rate their state at baseline and at week 52; we then performed the subtraction to yield the transition item.

For many transition items, stability and degree of change are arbitrarily defined by the investigator. Some investigators may consider ‘somewhat worse/better’ to represent a minimal change, while others may consider ‘a bit worse/better’ or ‘minimally worse/better’ to be a minimal change. How patients interpret such descriptors, and how investigators categorise anchors, can affect estimates of meaningful change thresholds. For example, when using a 15-point quality of life transition item with ratings ranging from −7 to +7, ratings of −1 to +1 have been considered to represent no change and ratings of −3 to –2, +2 and +3 to represent minimally important changes,20 21 but meaningful change estimates may have been different if stability had been defined as a rating of 0 and minimally important changes as ratings of −2 to –1, +1 and +2. For our global rating anchors, we considered a change of 0 to represent stability and changes of −1 to –2, +1 and +2 to represent minimal/moderate change. Some patients with transition scores of 0 may have changed minimally and some with transition scores of 1 or 2 may have been stable. We attempted to account for this inherent uncertainty by using a half-way point approach rather than simply subtracting mean scores between groups of interest.

As patients with progressive ILDs are unlikely to experience improvement in disease status, in the ROC analyses, we identified a change threshold between worsening and stability/improvement. This approach aligns with the clinical behaviour of progressive ILDs and with current therapeutic approaches, which slow rather than reverse disease progression.

Change in FVC is used as a primary end point in clinical trials to assess the efficacy of treatments for ILDs.14 22–25 A decline in FVC is associated with mortality.2 26–28 While there is no established definition of ILD progression, absolute declines of >5% or >10% in FVC % predicted are widely regarded as indicating progression,26 28–30 although smaller declines may also be relevant. Scores from patient-reported outcomes that assess symptoms or HRQL typically correlate weakly with FVC in patients with ILDs,31–33 suggesting that these measures yield information unique from physiological measures of ILD severity. This suggests that although commonly used as an anchor in validation studies, FVC may not be a suitable anchor in all circumstances.

Strengths of our analyses include the use of a large and heterogeneous population of subjects with progressive fibrosing ILDs. The use of triangulation that incorporated both anchor-based and distribution-based approaches aligns with accepted methodology, including from regulatory bodies, but we acknowledge that distribution-based methods may overestimate meaningful change thresholds.34 Limitations include that the trial was not designed to evaluate the measurement properties of patient-reported outcomes, so additional metrics that could have been used as anchors were not included. For example, another cough-specific patient-reported outcome would have been a more appropriate anchor for the Cough domain. The content validity of the L-PF questionnaire has not been demonstrated for all the languages and cultures that participated in the trial. Whether our findings are applicable to patients with fibrosing ILDs beyond those who met the inclusion criteria for the INBUILD trial is unknown.

In conclusion, our analyses support the responsiveness of the Dyspnoea and Cough domains of the L-PF questionnaire Symptoms module as measures of symptom severity in patients with progressive fibrosing ILDs. Estimates of meaningful change thresholds in these scores may be of value in interpreting the effects of interventions in these patients. Additional analyses are encouraged to confirm or refine these findings.