Article Text
Abstract
Background Long COVID (LC) is a novel multisystem clinical syndrome affecting millions of individuals worldwide. The modified COVID-19 Yorkshire Rehabilitation Scale (C19-YRSm) is a condition-specific patient-reported outcome measure designed for assessment and monitoring of people with LC.
Objectives To evaluate the psychometric properties of the C19-YRSm in a prospective sample of people with LC.
Methods 1314 patients attending 10 UK specialist LC clinics completed C19-YRSm and EuroQol 5D-5L (EQ-5D-5L) longitudinally. Scale characteristics were derived for C19-YRSm subscales (Symptom Severity (SS), Functional Disability (FD) and Overall Health (OH)) and internal consistency (Cronbach’s alpha). Convergent validity was assessed using the Functional Assessment of Chronic Illness Therapy (FACIT)—Fatigue Scale. Known groups validity was assessed for the Other Symptoms subscale as tertiles, as well as by hospitalisation and intensive care admission. Responsiveness and test–retest reliability was evaluated for C19-YRSm subscales and EQ-5D-5L. The minimal important difference (MID) and minimal clinically important difference (MCID) were estimated. Confirmatory factor analysis was applied to determine the instrument’s two-factor structure.
Results C19-YRSm demonstrated good scale characteristic properties. Item-total correlations were between 0.37 and 0.65 (for SS and FD), with good internal reliability (Cronbach’s alphas>0.8). Item correlations between subscales ranged between 0.46 and 0.72. Convergent validity with FACIT was good (−0.46 to −0.62). The three subscales discriminated between different levels of symptom burden (p<0.001) and between patients admitted to hospital and intensive care. There was moderate responsiveness for the three subscales ranging from 0.22 (OH) to 0.50 (SS) which was greater than for the EQ-5D-5L. Test–retest reliability was good for both SS 0.86 and FD 0.78. MID was 2 for SS, 2 for FD and 1 for OH; MCID was 4 for both the SS and FD. The factor analysis supported the two-factor SS and FD structure.
Conclusions The C19-YRSm is a condition-specific, reliable, valid and responsive patient-reported outcome measure for LC.
- COVID-19
- Patient Outcome Assessment
Data availability statement
All data relevant to the study are included in the article or uploaded as online supplemental information.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
WHAT IS ALREADY KNOWN ON THIS TOPIC
Long COVID or Post-COVID-19 syndrome is a multisystem, fluctuating condition. The modified COVID-19 Yorkshire Rehabilitation Scale (C19-YRSm) is the literature’s first condition-specific patient-reported outcome measure, which needed validation in a large population sample.
WHAT THIS STUDY ADDS
C19-YRSm is a valid, reliable, responsive and easy to administer measure and is able to show clinically meaningful change in the status of the condition.
HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY
C19-YRSm can be used in clinical and research settings to reliably capture the condition trajectory and the effect of interventions and help inform clinical policy.
Introduction
Long Covid (LC) or postacute sequelae of COVID-19 is a fluctuating, multisystem syndrome1 with an estimated prevalence of 1.9 million cases in the UK alone2 and what is estimated to be at least 100 million individuals worldwide.3 There have been more than 200 symptoms recorded in LC affecting 10 organ systems. The most commonly reported symptoms include fatigue, cognitive problems, pain, sleep problems and breathlessness.4 These symptoms may persist for extensive periods following the initial COVID-19 infection.5 This protracted course of LC leads to a significant negative impact on the individual, in terms of the persistent nature of symptoms and the associated functional disability and adverse health-related quality of life.2 6
The COVID-19 Yorkshire Rehabilitation Scale (C19-YRS) is a condition-specific patient-reported outcome measure designed to capture the symptoms of LC, as well as assess severity and monitor the persistence of symptoms to inform and guide the rehabilitation of affected patients.7–10 Since the initial validation of the instrument,7 11 it has been widely used in a variety of LC contexts including symptom evaluation in primary care and community settings,12–14 determining the need for LC rehabilitation interventions,8 9 15 as well as epidemiological assessments of post-COVID symptoms.16 17
The original 22-item C19-YRS underwent psychometric evaluation including both classical and modern psychometric evaluation methods, resulting in a 17-item modified tool, the C19-YRSm.18 Both the original 22-item and the modified version have undergone a limited degree of subsequent validation.10 19 The C19-YRS has been shown to have good construct validity but moderate responsiveness.10 The C19-YRSm has, by contrast, been demonstrated in a Croatian patient population to have good internal reliability and convergent validity.19 Therefore, the aims of this study were to further validate the C19-YRSm with a longitudinal sample of LC patients, as well as to identify minimally (clinically) important differences to inform its use in future randomised controlled trials and clinical practice.
Methods
Data
The data were collated from the LOng COvid Multidisciplinary consortium Optimising Treatments and servIces acrOss the NHS (LOCOMOTION) study, whose protocol has been published elsewhere.20 This was a prospective mixed methods study involving 10 LC services across the UK. Ethics approval for the LOCOMOTION study was obtained from the Bradford and Leeds Research Ethics Committee on behalf of Health Research Authority and Health and Care Research Wales (reference: 21/YH/0276; Trial registration number NCT05057260, ISRCTN15022307).20
Participants
Participants were included in the study if they had a clinical diagnosis of LC made by a qualified healthcare professional in 1 of the 10 participating LC clinics. Participants had to meet the National Institute for Health and Care Excellence case definition of having one or more persistent symptoms that develop during or after an infection consistent with COVID-19 and are not explained by alternative diagnosis.21
Participation in the study required participants to be registered on ELAROS, a digital patient-reported outcome measures platform.22 Informed consent and study data were collected on this platform. Participants were requested to complete the following patient-reported outcome measures (see below) every 3 months after registration. The first patient was registered on 23 November 2021 and the last was registered on 12 November 2023.
Patient and public involvement
Patients have been involved from the outset in the design and implementation of the LOCOMOTION study20 as part of a nine-member patient advisory group (PAG). The PAG has provided the LOCOMOTION study team with first-hand experience of people living with LC. Two coauthors (RM and DW) are members of the LOCOMOTION PAG.
Instruments
COVID-19 Yorkshire Rehabilitation Scale—Modified (C19-YRSm)
The C19-YRSm is a 17-item instrument18 designed to capture the key symptoms of LC and its impact on activities of daily living and overall health (online supplemental material). The items comprise four subscales: Symptom Severity (SS, 10 items), Functional Disability (FD, 5 items), Overall Health (OH, a single item) and Other Symptoms (OS).
Supplemental material
The items in the SS subscale comprise the following domains: breathlessness (four items), cough/throat sensitivity/voice change (two items), fatigue (one item), smell/taste (two items), pain/discomfort (five items), cognition (three items), palpitations/dizziness (two items), postexertional malaise (one item), anxiety/mood (five items) and sleep (one item). FD consists of five single items: communication, walking/moving around, personal care, other activities of daily living and social role.
Responses on the SS and FD subscales are rated on a 0 (no symptom or dysfunction) to 3 (severe life-disturbing symptom or dysfunction) Likert scale. For the SS subscale, the highest value within each of the domains (eg, breathlessness, pain/discomfort) is added to determine the score for that subscale. Higher scores on both these subscales indicate worse symptomatology and poorer functioning. Responses on the OH subscale are scored on a 0–10 Likert scale (0 being ‘worst health’ and 10 being ‘best health’) with higher scores indicating better health. OS over the last 7 days are also captured from a list of 25 additional symptoms.18
Functional Assessment of Chronic Illness Therapy—Fatigue Scale (FACIT-Fatigue)
The FACIT-Fatigue Scale is a 13-item instrument developed to evaluate fatigue and its impact on health-related quality of life and daily activities.23 Responses are scored on a 5-item Likert scale (from 0 to 4) with a maximum total score of 52 (range 0–52). Higher scores indicate better health-related quality of life. Items in the FACIT-Fatigue cover tiredness, fatigue, listlessness, lack of energy and the impact on daily and social activities. Although originally designed for patients with cancer, the FACIT-Fatigue Scale has been used to evaluate post-COVID fatigue.24
The EuroQol 5D-5L (EQ-5D-5L)
The EQ-5D-5L is a preference-based instrument with five domains: mobility, usual activities, self-care, pain/discomfort and anxiety/depression.25 It has five response categories ranging from 1 (no problems) to 5 (severe problems). Responses to each domain are collated into a profile score which is converted into a health utility or index score using a country-specific algorithm (tariff or value set). Utilities reflect societal preferences for health states and are measured on a metric from 0 (dead) to 1 (perfect health). Utility values less than 0, indicating states worse than dead, are also captured. The EQ-5D-5L scores were mapped onto the EQ-5D-3L using the crosswalk algorithm to derive UK utility values.26 The EQ-5D also comprises a visual analogue scale (VAS) measuring self-reported current health on a scale from 0 (‘worst health’) to 100 (‘best health’).
Statistical analysis
All analyses were performed using R Studio (R V.4.1.1). Descriptive summary statistics (mean, standard deviation (SD), count and percentage) were generated for the following patient demographic and clinical data: age, sex, ethnicity, smoking status, hospital admission, intensive care (ICU) admission. The scale characteristics for the three C19-YRSm subscales (SS, FD and OH) were derived including: mean (SD), median (inter-quartile range, IQR), score range and skewness (evaluated −0.5 to +0.5). Item characteristics, such as mean item score (SD), missing values, floor and ceiling effects, and item-total correlations were estimated for the SS and FD domains.
The internal reliability of the C19-YRSm SS and FD domains was evaluated using Cronbach’s alpha. A Cronbach’s alpha>0.7 was considered to be an indicator of adequate internal consistency and >0.8 was considered to be an indicator of good internal consistency.
Convergent validity—the degree to which items or domains on different instruments measure the same constructs—was assessed for the C19-YRSm subscales, SS, FD and OH using the FACIT-Fatigue. As both SS and FD are negatively scored (a higher score indicates worse symptoms or functioning), negative associations were anticipated between these and the FACIT-Fatigue Score. Conversely, positive associations were hypothesised between OH and FACIT-Fatigue. Associations were evaluated using Pearson’s product moment.
Known-groups validity was assessed for the C19-YRSm domains using the OS subscale split into tertiles: low number of symptoms (0–3), medium number of symptoms (4–7) and high number of symptoms (7+), as well as hospitalisation and admission to ICU (yes/no). Kruskal-Wallis and Mann-Whitney-Wilcoxon rank sum tests were used to evaluate differences in scores across these predefined groups.
Responsiveness of the three C19-YRSm subscales (SS, FD and OH) were evaluated using a subset of patients who had completed the instrument at two timepoints, namely the first assessment and at follow-up 30 days later (±10 days). The responsiveness of the EQ-5D-5L and EQ-5D VAS was also evaluated as a comparator for those patients who had completed both the C19-YRSm and EQ-5D-5L on the same day (at first assessment and 30 days (±10 days)).
Mean change from the first assessment was derived for these domains and an effect size was calculated (standardised mean response) by dividing this by the SD of the mean change scores. Intraclass correlations and test–retest reliability were also derived to evaluate stability in the instrument subscales over time. Test–retest reliability was evaluated against OH: the reliability coefficient was derived for patients with no change score on the OH between first assessment and day 30 (±10 days).
A half SD of the first assessment domain scores was applied as a putative minimally important difference (MID).27 In addition to this, the standard error of measurement (SEM) and Reliable Change Index (RCI) were calculated as follows for the SS and FD domains as indicators of minimal clinically important differences (MCIDs):
, where r is the test–retest reliability coefficient.
, where SDbase is the SD at first assessment.
The putative factor structure—a two-dimensional structure encompassing the SS and FD domains—was explored using a confirmatory factor analysis (CFA). A number of indices were employed to evaluate the goodness of fit of the model: root mean square error of approximation (RMSEA),28 comparative fit index (CFI),29 the Tucker-Lewis index (TLI)30 31 and the standardised root mean squared residual (SRMR).32 Various thresholds have been proposed to evaluate model fit. In this study, RMSEA<0.0833 was considered to be a reasonable fit, TLI and CFI>0.90 as acceptable fit30 and SRMR<0.08 as acceptable fit.32 As no single index provides sufficient evidence alone of model fit, four indices were evaluated in aggregate. The lavaan package in R was used for the CFA.
Results
Demographics
A total of 1314 patients (table 1) had completed the C19-YRSm on at least one occasion; 263 patients (20%) had completed the instrument at first assessment and at 30 days (±10 days) and 193 patients had completed the FACIT-Fatigue instrument at least once (15%). The C19-YRSm and EQ-5D-5L had been completed on the same day at both timepoints (first assessment and day 30 (±10 days)) by 98 patients. The majority (total sample) were Caucasian (76%) females (67%) with an average age of 48 years (SD: 13 years); 10% had been admitted to hospital as a result of COVID-19, and just over 2% had been admitted to ICU.
The mean subscale scores are shown in table 2. Both means for the SS (18.4, SD: 5.62) and FD (7.1, SD: 3.78) subscales suggested a moderate-to-high level of symptom burden and functional disability. Similarly, OH indicated that patients were at best in moderate health. These three subscales showed little skewness reflecting symmetrical score distributions. The mean number of OS was 5 with positive skew (fewer patients with large numbers of other symptoms).
The item means (table 3) for the SS and FD subscales ranged approximately between 1 and 2 indicating that patients were on average experiencing at least mild (to moderate) symptom burden and functional disability, although this varied across the items as reflected in the results of the floor and ceiling effects. Missing data were negligible (<3%).
Convergent validity
Table 4 shows the correlation matrix between the domains (see also online supplemental figure 1). There was a strong positive association between SS and FD. A moderate positive association was determined between OS and SS, and OS and FD. OH was negatively associated with the SS, FD and OS.
Supplemental material
There was a strong negative association between the total FACIT Score and the C19-YRSm Fatigue item (r=−0.58, p<0.001, 95% CI: −0.67 to −0.48), and similarly for SS (r=−0.61, p<0.001, 95% CI: −0.69 to −0.51), FD (r=−0.64, p<0.001, 95%CI: −0.72 to −0.55) and OS (r=−0.46, p<0.001, 95% CI: −0.56 to −0.34). The FACIT-Fatigue total was positively associated with OH (r=0.47, p<0.001, 95% CI: 0.36 to 0.58).
Known-groups validity
There was a linear increase in SS score as symptom burden (tertiles of the OS domain) increased in severity from low to high (table 5). A similar pattern was observed for FD, whereas OH showed a decrease as symptom burden increased. All these results were statistically significant (p<0.001).
Patients who had been hospitalised for COVID-19 showed higher SS and FD scores (online supplemental table 1A,B) compared with those who had not been hospitalised (p=0.04 and p=0.008, respectively). No differences between these groups were observed for OH. Statistically significant differences for both SS and FD (but not OH) were also observed between those who had and had not been admitted to ICU (online supplemental table 1C).
Supplemental material
Responsiveness
The mean change over 30 days (±10 days) was 1.9 (SD: 4.38) for the SS domain. Smaller changes were observed in both the FD and OH domains, 0.7 (SD: 2.53) and 0.3 (SD: 1.67) (table 5).
The intraclass correlation coefficients for the three domains ranged from 0.58 (OH) to 0.76 (FD) (table 5) suggesting moderate-to-strong content structure over time. A total of 70 patients had stable (unchanged) OH scores over the 30-day evaluation period (±10 days). The test–retest reliability coefficient for the SS domain was 0.86 and 0.78 for the FD domain, indicating good reliability.
All three subscales demonstrated a degree of responsive to change (effect sizes range: 0.22–0.50) (table 5). The responsiveness of the EQ-5D-5L Index was by comparison 0.14 and 0.18 for the VAS.
The 0.5 SD was applied as a metric for the MID. This resulted in the following MIDs: SS=2, FD=2 and OH=1. From table 5, it may be seen, for instance, that an MID was recorded for SS over the 30-day period following first assessment, but not for either FD or OH. The MCID estimate (based on the SEM) was 4 for both the SS and FD (table 5).
Factor structure
The results of the CFA showed an RMSEA of 0.10 (90% CI: 0.096 to 0.107) (figure 1). The SRMR was 0.066, CFI was 0.83 and TLI was 0.8. Taking all four indices together, these indicated reasonable model fit for the two-factor model. These factors were consistent with the interpretation of one factor measuring SS and the other measuring FD.
Discussion
The aim of this study was to undertake a further psychometric validation of the C19-YRSm. The results demonstrated good item and scale characteristics. There was good convergent validity with the FACIT-Fatigue Scale. Furthermore, the three subscales (SS, FD and OH) discriminated well between levels of symptom severity and between patients who had been hospitalised and admitted to ICU. There was also good internal reliability, test–retest reliability and stability of the subscale scores over time. Furthermore, the convergent correlations were as hypothesised.
The results of the responsiveness analysis showed that the instrument was able to detect changes as patients’ symptoms fluctuated and was more sensitive to change than the generic health-related quality of life measure, the EQ-5D-5L (both Index and VAS). This is a potentially important finding for future randomised controlled trials in LC. Although the effect sizes were modest, these must be evaluated in the context of a fluctuating condition,4 34 35 and it may therefore be that potentially larger effect sizes were being masked by frequent changes in symptoms. The results also suggest some initial metrics for the MID for SS (2), FD (2) and OH (1), as well as the MCID (4 SS and FD). Although these were based on distribution methods, and therefore remain to be confirmed using anchor-based approaches such as patient and clinician global impression of change, these provide useful initial metrics for interpreting meaningful changes in the C19-YRSm scores to aid both clinical interpretation as well as inform sample size considerations for prospective randomised controlled trials.
Some methodological limitations should be highlighted. First, although there was also some support for a two-factor structure, the statistics in isolation did not meet the predefined thresholds. Nevertheless, there are no definitive guidelines on what constitutes ideal fit and it is possible that model fit could be improved to a degree that may consequently also positively impact on responsiveness. Previous research has similarly determined moderate responsiveness at the item level for the C19-YRSm.10 Even though comparisons with EQ-5D were possible, we were unable to a contrast of responsiveness compared with the FACIT, a fatigue-specific measure, due to a lack of directly comparable data (assessments completed at the same time). Therefore, further research involving modern psychometric analysis such as Rasch or Item-response theory could explore this issue further and potentially identify individual items that may be removed and/or recalibrated to improve instrument responsiveness.
The results of this study are in line with a previous psychometric validation study of the C19-YRSm, which found both good internal reliability and convergent validity of the instrument,19 providing further evidence for the psychometric properties of the C19-YRSm with meaningful factors or domains, such as SS, FD and OH. The latter is further bolstered by the large sample size in this study and builds on the earlier development of the instrument,18 supporting its use as one of the few LC-specific patient-reported outcome measures. In addition, it is shorter than other condition-specific instruments such as the Symptom Burden Questionnaire for LC with 131 items,36 thereby minimising patient burden—a particularly important factor in people living with LC, who may present with fatigue and cognitive dysfunction. The instrument’s brevity and design lends its use for self-completion by patients, enabling the fluctuating nature of the condition to be monitored on a frequent basis for patients’ own awareness, for instance, in determining symptom triggers, as well as by clinicians to evaluate patients’ condition over time between clinics appointments.
Given the prevalence of LC, its associated persistence of debilitating symptoms, and the impact of the condition on patients’ health-related quality of life, valid and reliable condition-specific patient-reported instruments such as the C19-YRSm are of critical importance in the assessment of LC symptoms, as well as in helping to facilitate appropriate management and rehabilitation of patients suffering with the condition. The evidence presented in this study alongside other studies19 suggest that C19-YRSm is a condition-specific, reliable, valid and responsive patient-reported outcome measure for LC.
Data availability statement
All data relevant to the study are included in the article or uploaded as online supplemental information.
Ethics statements
Patient consent for publication
Ethics approval
Not applicable.
References
Supplementary materials
Supplementary Data
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Footnotes
X @DarrenWinch23
Collaborators LOCOMOTION consortium: Nawar Diar Bakerly, Mauricio Barahona, Alexander Casson, Jonathan Clarke, Vasa Curcin, Helen Davies, Helen Dawes, Brendan Delaney, Carlos Echevarria, Sarah Elkin, Rachael Evans, Zaccheus Falope, Darren Greenwood Ben Glampson, Stephen Halpin, Mike Horton, Joseph Kwon, Simon de Lusignan, Gayathri Delanerolle, Erik Mayer, Harsha Master, Ruairidh Milne, Jacqui Morris, Amy Parkin, Anton Pick, Nick Preston, Amy Rebane, Emma Tucker, Ana Belen Espinosa Gonzalez, Sareeta Baley, Annette Rolls, Emily Bullock, Megan Ball, Shehnaz Bashir, Mae Mansoubi, Joanne Elwin, Denys Prociuk, Iram Qureshi, Samantha Jones.
Contributors AS was responsible for the design of study, the analysis and interpretation of the study, as well as the drafting of the manuscript, and is the guarantor for the manuscript and its overall content. DG and MH were responsible for reviewing the data analysis and drafting the manuscript. TO, MG, RRL, DW, PW, RM and MS contributed to the drafting of the manuscript. All authors reviewed and approved the final version of the manuscript.
Funding This work was supported by National Institute for Health Research (NIHR) grant number Ref COV-LT-0016.
Competing interests None declared.
Patient and public involvement Patients and/or the public were involved in the design, or conduct, or reporting, or dissemination plans of this research. Refer to the Methods section for further details.
Provenance and peer review Not commissioned; externally peer reviewed.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.