Quality criteria were proposed for measurement properties of health status questionnaires

doi:10.1016/j.jclinepi.2006.03.012

Journal of Clinical Epidemiology

Volume 60, Issue 1, January 2007, Pages 34-42

https://doi.org/10.1016/j.jclinepi.2006.03.012 Get rights and content

Abstract

Objectives

Recently, an increasing number of systematic reviews have been published in which the measurement properties of health status questionnaires are compared. For a meaningful comparison, quality criteria for measurement properties are needed. Our aim was to develop quality criteria for design, methods, and outcomes of studies on the development and evaluation of health status questionnaires.

Study Design and Setting

Quality criteria for content validity, internal consistency, criterion validity, construct validity, reproducibility, longitudinal validity, responsiveness, floor and ceiling effects, and interpretability were derived from existing guidelines and consensus within our research group.

Results

For each measurement property a criterion was defined for a positive, negative, or indeterminate rating, depending on the design, methods, and outcomes of the validation study.

Conclusion

Our criteria make a substantial contribution toward defining explicit quality criteria for measurement properties of health status questionnaires. Our criteria can be used in systematic reviews of health status questionnaires, to detect shortcomings and gaps in knowledge of measurement properties, and to design validation studies. The future challenge will be to refine and complete the criteria and to reach broad consensus, especially on quality criteria for good measurement properties.

Introduction

The number of available health status questionnaires has increased dramatically over the past decades. Consequently, the choice of which questionnaire to use is becoming a major difficulty. Recently a large number of systematic reviews have been published of available questionnaires measuring a specific concept in a specific population, for example [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11]. In these systematic reviews, typically, the content and measurement properties of the available questionnaires are compared. In analogy to systematic reviews of clinical trials, criteria are needed to determine the methodological quality of studies on the development and evaluation of health status questionnaires. In addition, criteria for good measurement properties are needed to legitimize what the best questionnaire is.

Several articles offer criteria for the evaluation of questionnaires. Probably the best-known and most comprehensive criteria are those from the Scientific Advisory Committee (SAC) of the Medical Outcomes Trust [12]. The SAC defined eight attributes of instrument properties that warrant consideration in evaluation. These include (1) conceptual and measurement model, (2) validity, (3) reliability, (4) responsiveness, (5) interpretability, (6) respondent and administrative burden, (7) alternative forms, and (8) cultural and language adaptations (translations). Within each of these attributes, specific criteria were defined by which instruments should be reviewed. Similar criteria have been defined, e.g., by Bombardier and Tugwell [13], Andresen [14], and McDowell and Jenkinson [15]. What is often lacking in these criteria, however, are explicit criteria for what constitutes good measurement properties. For example, for the assessment of validity it is often recommended that hypotheses about expected results should be tested, but no criteria have been defined for how many hypotheses should be confirmed to justify that a questionnaire has good validity. No criteria have been defined for what constitutes good agreement (acceptable measurement error), good responsiveness, or good interpretability, and no criteria have been defined for the required sample size of studies assessing measurement properties.

As suggested by the SAC [12], we took on the challenge to further discuss and refine the available quality criteria for studies on the development and evaluation of health status questionnaires, including explicit criteria for the following measurement properties: (1) content validity, (2) internal consistency, (3) criterion validity, (4) construct validity, (5) reproducibility, (6) responsiveness, (7) floor and ceiling effects, and (8) interpretability. We used our criteria in two systematic reviews comparing the measurement properties of questionnaires for shoulder disability [1] and for visual functioning [4], and revised them based on our experiences in these reviews. Our criteria can also be used to detect shortcomings and gaps in knowledge of measurement properties, and to design validation studies.

In this article we define our quality criteria for measurement properties, discuss the difficult and sometimes arbitrary choices we made, and indicate future challenges. We emphasize that, just like the criteria offered by the SAC and others, our criteria are open to further discussion and refinement. Our aim is to contribute to the development of explicit quality criteria for the design, methods, and outcomes of studies on the development and evaluation of health status questionnaires.

Section snippets

Content validity

Content validity examines the extent to which the concepts of interest are comprehensively represented by the items in the questionnaire [16]. To be able to rate the quality of a questionnaire, authors should provide a clear description of the following aspects regarding the development of a questionnaire:

–
Measurement aim of the questionnaire, i.e., discriminative, evaluative, or predictive [17]. The measurement aim is important, because different items may be valid for different aims. For

Internal consistency

Internal consistency is a measure of the extent to which items in a questionnaire (sub)scale are correlated (homogeneous), thus measuring the same concept. Internal consistency is an important measurement property for questionnaires that intend to measure a single underlying concept (construct) by using multiple items. In contrast, for questionnaires in which the items are merely different aspects of a complex clinical phenomenon that do not have to be correlated, such as in the Apgar Scale [20]

Criterion validity

Criterion validity refers to the extent to which scores on a particular instrument relate to a gold standard. We give a positive rating for criterion validity if convincing arguments are presented that the used standard really is “gold” and if the correlation with the gold standard is at least 0.70.

Construct validity

Construct validity refers to the extent to which scores on a particular instrument relate to other measures in a manner that is consistent with theoretically derived hypotheses concerning the concepts that are being measured [17], [19]. Construct validity should be assessed by testing predefined hypotheses (e.g., about expected correlations between measures or expected differences in scores between “known” groups). These hypotheses need to be as specific as possible. Without specific

Reproducibility

Reproducibility concerns the degree to which repeated measurements in stable persons (test–retest) provide similar answers. We believe that it is important to make a distinction between reliability and agreement [29], [30]. Agreement concerns the absolute measurement error, i.e., how close the scores on repeated measures are, expressed in the unit of the measurement scale at issue. Small measurement error is required for evaluative purposes in which one wants to distinguish clinically important

Responsiveness

Responsiveness has been defined as the ability of a questionnaire to detect clinically important changes over time, even if these changes are small [37]. A large number of definitions and methods were proposed for assessing responsiveness [38]. We consider responsiveness to be a measure of longitudinal validity. In analogy to construct validity, longitudinal validity should be assessed by testing predefined hypotheses, e.g., about expected correlations between changes in measures, or expected

Floor or ceiling effects

Floor or ceiling effects are considered to be present if more than 15% of respondents achieved the lowest or highest possible score, respectively [41]. If floor or ceiling effects are present, it is likely that extreme items are missing in the lower or upper end of the scale, indicating limited content validity. As a consequence, patients with the lowest or highest possible score cannot be distinguished from each other, thus reliability is reduced. Furthermore, the responsiveness is limited

Interpretability

Interpretability is defined as the degree to which one can assign qualitative meaning to quantitative scores [42]. Investigators should provide information about what (change in) score would be clinically meaningful. Various types of information can aid in interpreting scores on a questionnaire: (1) means and SD of scores of (subgroups of) a reference population (norm values); (2) means and SD of scores of relevant subgroups of patients who are expected to differ in scores (e.g., groups with

Population-specific ratings of measurement properties

A summary of the criteria for measurement properties of health status questionnaires is presented in Table 1. Each property is rated as positive, negative, or indeterminate, depending on the design, methods, and outcomes of the study. Measurement properties differ between populations and settings. Therefore, the evaluation of all measurement properties needs to be conducted in a population and setting that is representative for the population and setting in which the questionnaire is going to

Overview table

In the final comparison of the measurement properties of different questionnaires, one has to consider all ratings together when choosing between different questionnaires. We recommend to compose a table that provides an overview of all ratings, such as the example given in Table 2. In Table 2 the results are presented from our systematic review of all questionnaires measuring disability in patients with shoulder complaints (because there is no gold standard for disability, criterion validity

Discussion

We developed quality criteria for the design, methods, and outcomes of studies on the development and evaluation of health status questionnaires. Nine measurement properties were distinguished: content validity, internal consistency, criterion validity, construct validity, reproducibility, longitudinal validity, responsiveness, floor and ceiling effects, and interpretability.

Our criteria are mostly opinion based because there is no empirical evidence in this field to support explicit quality

Future challenges

One might argue that our criteria are not discriminative enough to distinguish between good and very high-quality questionnaires. This would be important when many high-quality questionnaires are available, but in our experience, within the field of health status and health-related quality of life measurement, this is not (yet) the case. Therefore, we believe that our criteria work well to separate the wheat from the chaff. The next step would be to further refine and complete the criteria,

References (54)

K.S. Dziedzic et al.
A systematic search and critical review of measures of disability for use in a population survey of hand osteoarthritis (OA)
Osteoarthritis Cartilage
(2005)
E.M. Andresen
Criteria for assessing the tools of disability outcomes research
Arch Phys Med Rehabil
(2000)
M.R. de Boer et al.
Change to the subscales of two vision-related quality of life questionnaires are proposed
J Clin Epidemiol
(2005)
G.H. Guyatt et al.
Responsiveness and validity in health status measurement: a clarification
J Clin Epidemiol
(1989)
G. Guyatt et al.
Measuring change over time: assessing the usefulness of evaluative instruments
J Chronic Dis
(1987)
R.A. Deyo et al.
Assessing the responsiveness of functional scales to clinical change: an analogy to diagnostic test performance
J Chronic Dis
(1986)
K.N. Lohr et al.
Evaluating quality of life and health status instruments: development of scientific review criteria
Clin Ther
(1996)
R. Jaeschke et al.
Measurement of health status. Ascertaining the minimal clinically important difference
Control Clin Trials
(1989)
R.D. Crosby et al.
Defining clinically meaningful change in health-related quality of life
J Clin Epidemiol
(2003)
A.R. Jadad et al.
Assessing the quality of reports of randomized clinical trials: is blinding necessary?
Control Clin Trials
(1996)

D.E. Beaton et al.

Evaluating changes in health status: reliability and responsiveness of five generic health status measures in workers with musculoskeletal disorders

J Clin Epidemiol

(1997)

G. Stucki et al.

Relative responsiveness of condition-specific and generic health status measures in degenerative lumbar spinal stenosis

J Clin Epidemiol

(1995)

J.G. Wright et al.

A comparison of different indices of responsiveness

J Clin Epidemiol

(1997)

S.D. Bot et al.

Clinimetric evaluation of shoulder disability questionnaires: a systematic review of the literature

Ann Rheum Dis

(2004)

E.C. Jorstad et al.

Measuring the psychological outcomes of falling: a systematic review

J Am Geriatr Soc

(2005)

G. Daker-White

Reliable and valid self-report outcome measures in sexual (dys)function: a systematic review

Arch Sex Behav

(2002)

M.R. de Boer et al.

Psychometric properties of vision-related quality of life questionnaires: a systematic review

Ophthalmic Physiol Opt

(2004)

B. Edwards et al.

Quality of life instruments for caregivers of patients with cancer

Cancer Nurs

(2002)

A.M. Garratt et al.

Patient-assessed health instrument for the knee: a structured review

Rheumatology

(2004)

P. Hallin et al.

Spinal cord injury and quality of life measures: a review of instrument psychometric quality

Spinal Cord

(2000)

K.L. Haywood et al.

Quality of life in older people: a structured review of generic self-assessed health instruments

Qual Life Res

(2005)

K.L. Haywood et al.

Patient-assessed health in ankylosing spondylitis: a structured review

Rheumatology (Oxford)

(2005)

T.P. Ettema et al.

A review of quality of life instruments used in dementia

Qual Life Res

(2005)

Scientific Advisory Committee of the Medical Outcomes Trust

Assessing health status and quality-of-life instruments: attributes and review criteria

Qual Life Res

(2002)

C. Bombardier et al.

Methodological considerations in functional assessment

J Rheumatol

(1987)

I. McDowell et al.

Development standards for health measures

J Health Serv Res Policy

(1996)

G.H. Guyatt et al.

Measuring health related quality of life

Ann Intern Med

(1993)

Cited by (7167)

Translation, cross-cultural adaptation, reliability and validity of the Brazilian Portuguese version of the ‘Fit to Dance?’ survey
2024, Journal of Bodywork and Movement Therapies
The ‘Fit to Dance?’ survey has been used in a number of studies to understand the health and wellbeing of dancers. These data have not been collected in Brazil as there is no validated questionnaire available in Brazilian Portuguese, culturally validated in Brazil with a scope as broad and comprehensive as that of ‘Fit to Dance?‘.
Translate into Brazilian Portuguese and culturally validate the questionnaire ‘Fit to Dance?’ in Brazil.
This was a validity and reliability study of the Brazilian Portuguese version of the ‘Fit to Dance?’ Survey. The stages of the research were: translation into the target language (Brazilian Portuguese), translation synthesis, translation validation and cross-cultural adaptation by a committee of experts in Dance Medicine and Science (DMS), reverse translation into English, pilot study (test/retest), and final version of the questionnaire.
The questionnaire was applied to 21 dancers of different dance genres, with an age average of 25 ± 7.0 years. Cronbach's alpha (0.705), ICC (0.984) and Kappa (0.794) results reached adequate values.
The Brazilian Portuguese version of the questionnaire ‘Fit to Dance?’ is effective, has adequate levels of validity and reliability, and can be used to report injuries and aspects of health and well-being of Brazilian dancers.
Minimal detectable change of gait and balance measures in older neurological patients: estimating the standard error of the measurement from before-after rehabilitation data thanks to the linear mixed-effects models
2024, Journal of NeuroEngineering and Rehabilitation
Development and validation of the patient-reported outcome for older people living with HIV/AIDS in China (PROHIV-OLD)
2024, Health and Quality of Life Outcomes
Psychometric properties of the Disability of Arm Shoulder and Hand (DASH) in subjects with frozen shoulder: a reliability and validity study
2024, BMC Musculoskeletal Disorders
Reliability and validity of a graphical computerized adaptive test Longshi scale for rapid assessment of activities of daily living in stroke survivors
2024, Scientific Reports
Construct validity of EQ-5D-5L among patients with inflammatory bowel disease—a study based on real-world data from the Swedish Inflammatory Bowel Disease Registry
2024, Journal of Patient-Reported Outcomes

View all citing articles on Scopus

View full text

Original ArticleQuality criteria were proposed for measurement properties of health status questionnaires

Abstract

Objectives

Study Design and Setting

Results

Conclusion

Introduction

Section snippets

Content validity

Internal consistency

Criterion validity

Construct validity

Reproducibility

Responsiveness

Floor or ceiling effects

Interpretability

Population-specific ratings of measurement properties

Overview table

Discussion

Future challenges

Osteoarthritis Cartilage

Arch Phys Med Rehabil

J Clin Epidemiol

J Clin Epidemiol

J Chronic Dis

J Chronic Dis

Clin Ther

Control Clin Trials

J Clin Epidemiol

Control Clin Trials

J Clin Epidemiol

J Clin Epidemiol

J Clin Epidemiol

Clinimetric evaluation of shoulder disability questionnaires: a systematic review of the literature

Ann Rheum Dis

Measuring the psychological outcomes of falling: a systematic review

J Am Geriatr Soc

Reliable and valid self-report outcome measures in sexual (dys)function: a systematic review

Arch Sex Behav

Psychometric properties of vision-related quality of life questionnaires: a systematic review

Ophthalmic Physiol Opt

Quality of life instruments for caregivers of patients with cancer

Cancer Nurs

Patient-assessed health instrument for the knee: a structured review

Rheumatology

Spinal cord injury and quality of life measures: a review of instrument psychometric quality

Spinal Cord

Quality of life in older people: a structured review of generic self-assessed health instruments

Qual Life Res

Patient-assessed health in ankylosing spondylitis: a structured review

Rheumatology (Oxford)

A review of quality of life instruments used in dementia

Qual Life Res

Assessing health status and quality-of-life instruments: attributes and review criteria

Qual Life Res

Methodological considerations in functional assessment

J Rheumatol

Development standards for health measures

J Health Serv Res Policy

Measuring health related quality of life

Ann Intern Med

Original Article
Quality criteria were proposed for measurement properties of health status questionnaires