Original ArticleQuality criteria were proposed for measurement properties of health status questionnaires
Introduction
The number of available health status questionnaires has increased dramatically over the past decades. Consequently, the choice of which questionnaire to use is becoming a major difficulty. Recently a large number of systematic reviews have been published of available questionnaires measuring a specific concept in a specific population, for example [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11]. In these systematic reviews, typically, the content and measurement properties of the available questionnaires are compared. In analogy to systematic reviews of clinical trials, criteria are needed to determine the methodological quality of studies on the development and evaluation of health status questionnaires. In addition, criteria for good measurement properties are needed to legitimize what the best questionnaire is.
Several articles offer criteria for the evaluation of questionnaires. Probably the best-known and most comprehensive criteria are those from the Scientific Advisory Committee (SAC) of the Medical Outcomes Trust [12]. The SAC defined eight attributes of instrument properties that warrant consideration in evaluation. These include (1) conceptual and measurement model, (2) validity, (3) reliability, (4) responsiveness, (5) interpretability, (6) respondent and administrative burden, (7) alternative forms, and (8) cultural and language adaptations (translations). Within each of these attributes, specific criteria were defined by which instruments should be reviewed. Similar criteria have been defined, e.g., by Bombardier and Tugwell [13], Andresen [14], and McDowell and Jenkinson [15]. What is often lacking in these criteria, however, are explicit criteria for what constitutes good measurement properties. For example, for the assessment of validity it is often recommended that hypotheses about expected results should be tested, but no criteria have been defined for how many hypotheses should be confirmed to justify that a questionnaire has good validity. No criteria have been defined for what constitutes good agreement (acceptable measurement error), good responsiveness, or good interpretability, and no criteria have been defined for the required sample size of studies assessing measurement properties.
As suggested by the SAC [12], we took on the challenge to further discuss and refine the available quality criteria for studies on the development and evaluation of health status questionnaires, including explicit criteria for the following measurement properties: (1) content validity, (2) internal consistency, (3) criterion validity, (4) construct validity, (5) reproducibility, (6) responsiveness, (7) floor and ceiling effects, and (8) interpretability. We used our criteria in two systematic reviews comparing the measurement properties of questionnaires for shoulder disability [1] and for visual functioning [4], and revised them based on our experiences in these reviews. Our criteria can also be used to detect shortcomings and gaps in knowledge of measurement properties, and to design validation studies.
In this article we define our quality criteria for measurement properties, discuss the difficult and sometimes arbitrary choices we made, and indicate future challenges. We emphasize that, just like the criteria offered by the SAC and others, our criteria are open to further discussion and refinement. Our aim is to contribute to the development of explicit quality criteria for the design, methods, and outcomes of studies on the development and evaluation of health status questionnaires.
Section snippets
Content validity
Content validity examines the extent to which the concepts of interest are comprehensively represented by the items in the questionnaire [16]. To be able to rate the quality of a questionnaire, authors should provide a clear description of the following aspects regarding the development of a questionnaire:
- –
Measurement aim of the questionnaire, i.e., discriminative, evaluative, or predictive [17]. The measurement aim is important, because different items may be valid for different aims. For
Internal consistency
Internal consistency is a measure of the extent to which items in a questionnaire (sub)scale are correlated (homogeneous), thus measuring the same concept. Internal consistency is an important measurement property for questionnaires that intend to measure a single underlying concept (construct) by using multiple items. In contrast, for questionnaires in which the items are merely different aspects of a complex clinical phenomenon that do not have to be correlated, such as in the Apgar Scale [20]
Criterion validity
Criterion validity refers to the extent to which scores on a particular instrument relate to a gold standard. We give a positive rating for criterion validity if convincing arguments are presented that the used standard really is “gold” and if the correlation with the gold standard is at least 0.70.
Construct validity
Construct validity refers to the extent to which scores on a particular instrument relate to other measures in a manner that is consistent with theoretically derived hypotheses concerning the concepts that are being measured [17], [19]. Construct validity should be assessed by testing predefined hypotheses (e.g., about expected correlations between measures or expected differences in scores between “known” groups). These hypotheses need to be as specific as possible. Without specific
Reproducibility
Reproducibility concerns the degree to which repeated measurements in stable persons (test–retest) provide similar answers. We believe that it is important to make a distinction between reliability and agreement [29], [30]. Agreement concerns the absolute measurement error, i.e., how close the scores on repeated measures are, expressed in the unit of the measurement scale at issue. Small measurement error is required for evaluative purposes in which one wants to distinguish clinically important
Responsiveness
Responsiveness has been defined as the ability of a questionnaire to detect clinically important changes over time, even if these changes are small [37]. A large number of definitions and methods were proposed for assessing responsiveness [38]. We consider responsiveness to be a measure of longitudinal validity. In analogy to construct validity, longitudinal validity should be assessed by testing predefined hypotheses, e.g., about expected correlations between changes in measures, or expected
Floor or ceiling effects
Floor or ceiling effects are considered to be present if more than 15% of respondents achieved the lowest or highest possible score, respectively [41]. If floor or ceiling effects are present, it is likely that extreme items are missing in the lower or upper end of the scale, indicating limited content validity. As a consequence, patients with the lowest or highest possible score cannot be distinguished from each other, thus reliability is reduced. Furthermore, the responsiveness is limited
Interpretability
Interpretability is defined as the degree to which one can assign qualitative meaning to quantitative scores [42]. Investigators should provide information about what (change in) score would be clinically meaningful. Various types of information can aid in interpreting scores on a questionnaire: (1) means and SD of scores of (subgroups of) a reference population (norm values); (2) means and SD of scores of relevant subgroups of patients who are expected to differ in scores (e.g., groups with
Population-specific ratings of measurement properties
A summary of the criteria for measurement properties of health status questionnaires is presented in Table 1. Each property is rated as positive, negative, or indeterminate, depending on the design, methods, and outcomes of the study. Measurement properties differ between populations and settings. Therefore, the evaluation of all measurement properties needs to be conducted in a population and setting that is representative for the population and setting in which the questionnaire is going to
Overview table
In the final comparison of the measurement properties of different questionnaires, one has to consider all ratings together when choosing between different questionnaires. We recommend to compose a table that provides an overview of all ratings, such as the example given in Table 2. In Table 2 the results are presented from our systematic review of all questionnaires measuring disability in patients with shoulder complaints (because there is no gold standard for disability, criterion validity
Discussion
We developed quality criteria for the design, methods, and outcomes of studies on the development and evaluation of health status questionnaires. Nine measurement properties were distinguished: content validity, internal consistency, criterion validity, construct validity, reproducibility, longitudinal validity, responsiveness, floor and ceiling effects, and interpretability.
Our criteria are mostly opinion based because there is no empirical evidence in this field to support explicit quality
Future challenges
One might argue that our criteria are not discriminative enough to distinguish between good and very high-quality questionnaires. This would be important when many high-quality questionnaires are available, but in our experience, within the field of health status and health-related quality of life measurement, this is not (yet) the case. Therefore, we believe that our criteria work well to separate the wheat from the chaff. The next step would be to further refine and complete the criteria,
References (54)
- et al.
A systematic search and critical review of measures of disability for use in a population survey of hand osteoarthritis (OA)
Osteoarthritis Cartilage
(2005) Criteria for assessing the tools of disability outcomes research
Arch Phys Med Rehabil
(2000)- et al.
Change to the subscales of two vision-related quality of life questionnaires are proposed
J Clin Epidemiol
(2005) - et al.
Responsiveness and validity in health status measurement: a clarification
J Clin Epidemiol
(1989) - et al.
Measuring change over time: assessing the usefulness of evaluative instruments
J Chronic Dis
(1987) - et al.
Assessing the responsiveness of functional scales to clinical change: an analogy to diagnostic test performance
J Chronic Dis
(1986) - et al.
Evaluating quality of life and health status instruments: development of scientific review criteria
Clin Ther
(1996) - et al.
Measurement of health status. Ascertaining the minimal clinically important difference
Control Clin Trials
(1989) - et al.
Defining clinically meaningful change in health-related quality of life
J Clin Epidemiol
(2003) - et al.
Assessing the quality of reports of randomized clinical trials: is blinding necessary?
Control Clin Trials
(1996)
Evaluating changes in health status: reliability and responsiveness of five generic health status measures in workers with musculoskeletal disorders
J Clin Epidemiol
Relative responsiveness of condition-specific and generic health status measures in degenerative lumbar spinal stenosis
J Clin Epidemiol
A comparison of different indices of responsiveness
J Clin Epidemiol
Clinimetric evaluation of shoulder disability questionnaires: a systematic review of the literature
Ann Rheum Dis
Measuring the psychological outcomes of falling: a systematic review
J Am Geriatr Soc
Reliable and valid self-report outcome measures in sexual (dys)function: a systematic review
Arch Sex Behav
Psychometric properties of vision-related quality of life questionnaires: a systematic review
Ophthalmic Physiol Opt
Quality of life instruments for caregivers of patients with cancer
Cancer Nurs
Patient-assessed health instrument for the knee: a structured review
Rheumatology
Spinal cord injury and quality of life measures: a review of instrument psychometric quality
Spinal Cord
Quality of life in older people: a structured review of generic self-assessed health instruments
Qual Life Res
Patient-assessed health in ankylosing spondylitis: a structured review
Rheumatology (Oxford)
A review of quality of life instruments used in dementia
Qual Life Res
Assessing health status and quality-of-life instruments: attributes and review criteria
Qual Life Res
Methodological considerations in functional assessment
J Rheumatol
Development standards for health measures
J Health Serv Res Policy
Measuring health related quality of life
Ann Intern Med
Cited by (7167)
Translation, cross-cultural adaptation, reliability and validity of the Brazilian Portuguese version of the ‘Fit to Dance?’ survey
2024, Journal of Bodywork and Movement TherapiesDevelopment and validation of the patient-reported outcome for older people living with HIV/AIDS in China (PROHIV-OLD)
2024, Health and Quality of Life OutcomesPsychometric properties of the Disability of Arm Shoulder and Hand (DASH) in subjects with frozen shoulder: a reliability and validity study
2024, BMC Musculoskeletal DisordersConstruct validity of EQ-5D-5L among patients with inflammatory bowel disease—a study based on real-world data from the Swedish Inflammatory Bowel Disease Registry
2024, Journal of Patient-Reported Outcomes