Article Text

Download PDFPDF

Patient-reported outcome measures in community-acquired pneumonia: a systematic review of application and content validity
  1. Melanie Lloyd1,2,
  2. Emily Callander3,
  3. Amalia Karahalios4,
  4. Lucy Desmond5 and
  5. Harin Karunajeewa1,5
  1. 1Melbourne Medical School, Western Precinct, University of Melbourne, Melbourne, Victoria, Australia
  2. 2Physiotherapy, Western Health, Melbourne, Victoria, Australia
  3. 3School of Medicine, Griffith University, Southport, Queensland, Australia
  4. 4Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, University of Melbourne, Melbourne, Victoria, Australia
  5. 5General Internal Medicine, Western Health, Melbourne, Victoria, Australia
  1. Correspondence to Dr Melanie Lloyd; melanie.lloyd{at}


Introduction Patient-reported outcome measures (PROMs) are a vital component of patient-centred care. Community-acquired pneumonia (CAP) is a significant contributor to morbidity, mortality and health service costs globally, but there is a lack of consensus regarding PROMs for this condition.

Methods We searched MEDLINE, EMBASE and Cochrane Collaboration for studies, both interventional and observational, of adult recovery from CAP that applied at least one validated PROM instrument and were published before 31 December 2017. The full text of included studies was examined and data collected on study design, PROM instruments applied, constructs examined and the demographic characteristics of the populations measured. For all CAP-specific PROM instruments identified, content validity was assessed using the COnsensus based Standards for selection of health Measurement INstruments guidelines (COSMIN).

Results Forty-two articles met the inclusion criteria and applied a total of 17 different PROM instruments including five (30%) classified as CAP specific, six (35%) as generic and six (35%) that measured functional performance or were specific to another disease. The 36-Item Short Form Survey (SF-36) was the most commonly used instrument (15 articles). Only one of 11 (9%) patient cohorts assessed using a CAP-specific instrument had a mean age ≥70 years. The CAP-Sym and CAP-BIQ questionnaires had sufficient content validity, though the quality of evidence for all CAP-specific instruments was rated as very low to low.

Discussion PROM instruments used to measure recovery from CAP are inconsistent in constructs measured and have frequently been developed and validated in highly selective patient samples that are not fully representative of the hospitalised CAP population. The overall content validity of all available CAP-specific instruments is unclear, particularly in the context of elderly hospitalised populations. Based on current evidence, generic health instruments are likely to be of greater value for measuring recovery from CAP in this group.

  • patient-reported outcome measure
  • content validity
  • systematic review
  • community-acquired pneumonia

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

View Full Text

Statistics from

Key messages

  • What patient-reported outcome measures (PROMs) have been used in studies of community-acquired pneumonia (CAP), what constructs do they measure and do they have adequate content validity?

  • PROMs are a vital valuation tool in the current era of unsustainable growth in health costs; however, there is little consistency in the instruments used and constructs measured in studies of CAP, and no CAP-specific instrument has high-quality evidence to support its content validity.

  • The limited evidence supporting existing instruments developed to measure patient-reported recovery from CAP must be addressed if we are to reduce the considerable health burden associated with this disease.


Patient-reported outcome measures (PROMs) are critical endpoints for assessing the effectiveness of patient care, and their purpose is a more sensitive and meaningful evaluation of the patient illness experience,1 thereby quantifying and identifying changes in health.2 This defining characteristic is of crucial importance as it offers a departure from system or process metrics that have historically been relied on for performance measurement.3 PROMs are therefore integral to determining value in the allocation of health resources as they measure the actual health outcomes produced.4 Their role is particularly relevant for conditions that are generating large and increasing health service costs, as PROMs help quantify the actual health outcomes that are achieved relative to monetary investment. The momentum surrounding PROMs usage has grown in recent years with the establishment of international collaborations for standardisation of outcome measurement5 6 and increasing integration of PROMs into routine clinical care within national health systems.7

Community-acquired pneumonia (CAP) is a common and complex disease with high mortality, morbidity and health service expenditure.8 9 The incidence of CAP is strongly linked to ageing, and the costs attributable to this condition will increase into the future as elderly populations grow in most high-income countries.10 In Australia, over 40% of the half-million annual hospital bed days attributable to adult CAP occur in those aged over 80 years,11 a trend which is reflected across the USA and Europe.12 13 The potential of PROMs to drive innovation and improve patient-centred care should be realised for this high burden condition14; however, there is ongoing debate around which instrument should be used, the relevance of alternative constructs measured and methods of application.15

Measurement of outcomes in CAP is complicated by the complexity of the illness. By manifesting in elderly populations, CAP often occurs in the context of multiple overlapping comorbidities and multisystem complications are common.16 17 While recent progress has been made to define patient-reported end-points for registrational drug trials in CAP,18 19 consideration must be given to the demographics of the sample used in psychometric evaluation of PROM instruments to ensure results are generalisable to the target population. Additionally, disease-specific tools may be of less use in pragmatic studies and routine clinical practice due to complex interactions between conditions, and this may be reflected in the decision of many CAP researchers to apply alternative generic measures of patient-reported health status. A review of PROM instruments for CAP is timely to determine what constructs have been measured in studies of recovery to date and the generalisability and relative quality of different instruments. Without access to suitably relevant and robust measurement tools, identifying optimal patient-centred management strategies for this significant and common illness is difficult.20

The objectives of this systematic review are to identify, appraise and synthesise the available literature regarding the use of psychometrically validated PROMs in studies of recovery from adult CAP. Specifically, this review aims to determine which PROM instruments have been used in evaluation studies relating to this illness, the constructs measured and the settings and characteristics of CAP populations where each instrument has been applied. The secondary aim is to determine, for CAP-specific instruments identified, the settings and characteristics of populations in which these PROMs have been developed and psychometrically tested and their content validity using an established evaluation framework.21


Protocol and registration

This systematic review was prospectively registered on The International Prospective Register of Systematic Reviews (PROSPERO (CRD42018099739)). It was completed under the guidance of the COnsensus based Standards for selection of health Measurement INstruments (COSMIN) guidelines for systematic reviews of PROMs,22 and the manuscript was prepared in accordance with the Preferred Reporting Items for Systematic Reviews and Meta Analyses (PRISMA) checklist.23

Patient and public involvement

This research was done without patient involvement. Patients were not invited to participate in study design and interpretation of results or asked to contribute to the writing or editing of this paper for readability and accuracy.

Eligibility criteria

Eligibility of articles

The eligibility criteria for inclusion of an article in the review were based on published recommendations.21 23 Articles were included if they reported on a study that was: (1) conducted in an adult (≥18 years) population with a diagnosis of CAP and (2) included an evaluative application at least one validated PROM that satisfied the PROM eligibility criteria listed below. Evaluative application refers to the measurement of outcomes after the commencement of treatment of a particular disease.24 Articles reporting on the development or validation of PROMs in the target population were also included. Exclusion criteria were as follows: (1) case studies or papers published in publications that are not peer reviewed, (2) conference proceedings, (3) articles with no available English language version, (4) articles exclusively investigating hospital associated, ventilator associated or aspiration pneumonias, (5) articles without independent reporting of CAP cohort data if multiple disease processes are examined together and (6) articles with diagnostic, prognostic and prophylactic objectives, where outcomes are not measured after the commencement of treatment for CAP.

Eligibility of PROM instruments

PROM instruments were eligible for inclusion in the review if they measured a construct related to: (1) symptoms; (2) general quality of life or (3) function, disability or mobility. PROMs are defined as ‘a direct subjective assessment by the patient of elements of their health’25 and both disease specific and generic instruments were included provided one of the above constructs is measured. The PROM in question must have undergone at least partial formal assessment of its measurement properties22 and its use adequately described in the literature to enable it to be reproduced in subsequent studies. Therefore, articles that did not use PROMs that satisfied these preliminary conditions were excluded from the systematic review. The elements of recovery from acute illness from the perspective of the patient should be considered separately from patient-reported satisfaction measures which focus on the humanity of care, rather than the outcome of treatment.4 Instruments that measured patient-reported satisfaction with care alone were therefore excluded. For the secondary objective, a PROM was defined as CAP specific if the instrument was developed explicitly for use in individuals with a diagnosis of CAP.

Study identification and selection

Two databases (MEDLINE and EMBASE) were searched from their inception until 31 December 2017. Additionally, the Cochrane Library of Clinical Reviews ( was searched for any systematic reviews listed under the keyword ‘pneumonia’ that provided a relevant source of reference studies. The search strategy for identification of relevant publications was developed in conjunction with a biomedical librarian and is included in the online supplementary material 1.

A three-step search process was used to identify relevant articles. First, duplicates were removed and then two authors (ML and LD) independently reviewed the title and abstracts of the articles, excluding studies that did not meet the inclusion criteria. Next, full-text versions of the remaining articles were obtained and reviewed to confirm eligibility. Any discrepancy was resolved through discussion with a third author (HK). Care was taken to identify situations where multiple reports were compiled from a single study to avoid repeated reporting of the same data. Where multiple articles reported data from the same participant group, we refer to the ‘cohort’ and only include that participant data once in the analysis. A list of included articles and associated cohorts is included in the online supplementary material 2. Finally, reference lists of included articles and any literature reviews or summary papers identified in the full-text screen were hand searched to identify potentially relevant articles that might have been missed by the search strategy (online supplementary material 3). A list of excluded full-text studies, with reasons for exclusion, is provided in the online supplementary material 4.

Data collection

Data were extracted through the use of customised electronic forms. The following study characteristics were recorded for all articles meeting the inclusion criteria: participant demographics (ie, number, age, sex and other demographics if described), disease severity, conditions other than CAP included in study population (eg, heart failure, influenza, chronic obstructive pulmonary disease; note that CAP data must be presented separately), setting (hospitalised/managed in outpatient setting/both/other), geographical location of study (country), study design, study date and duration, name of PROM(s) utilised, citation of development and validation studies and timing of outcome measurement. In addition to the above, the following data were extracted from studies validating CAP-specific PROM instruments: mode of administration and recall period.

The following information was obtained for all included PROM instruments via references given in the included articles: target population, concepts measured, number of domains and items, scales and scoring and language. This was to inform the understanding of the different patient-reported constructs that have been measured in studies of CAP.

Measurement framework for assessment of content validity of CAP-specific PROM instruments

CAP-specific PROMs identified through the inclusion criteria (above) were subject to a formal assessment of content validity in accordance with the COSMIN guidelines.22 This framework has been established to promote consistency in the assessment and reporting of measurement properties and has been published in conjunction with comprehensive guidance.26 Under the COSMIN framework, each development and validation study for a particular PROM instrument is assessed individually for methodological quality, and findings are pooled to determine the sufficiency of the measurement property. The COSMIN framework separately examines PROMs development studies (concept elicitation, cognitive interview and pilot studies involving participants from the target population) from validity studies (where both patients and professionals may be involved in testing).

As relevancy, comprehensiveness and comprehensibility are key elements for patient-reported instruments; content validity is considered the most important measurement property and is assessed first.24 The quality of the evidence supporting each measurement property is also assessed separately, using the Grading of Recommendations Assessment, Development and Evaluation (GRADE) criteria.27 Two authors (ML and LD) independently completed the quality scoring framework (outlined below) for CAP-specific PROMs identified through the search process. Where disagreement was identified between scores, a third author (EC or HK) reviewed the assessments and provided consensus.


PROM instruments used in evaluation studies of CAP populations

Forty-two articles were included in this review, and these articles reported results from 17 different PROM instruments meeting the inclusion criteria (figure 1). The most commonly used instrument was the 36-Item Short Form Survey (SF-36)28 (15 articles, 11 cohorts), although a variety of generic and function-specific instruments were applied including instruments that were developed for other respiratory conditions (table 1). Five validated CAP-specific instruments were identified.19 29–32

Figure 1

Preferred Reporting Items for Systematic Review and Meta-Analysis flow chart. *More than one reason for exclusion may apply to each article.

Table 1

PROM instruments used in studies of recovery in CAP populations

Characteristics of CAP populations examined using PROMs

The CAP-specific instruments have been applied in relatively young populations; only one of the 11 (9%) cohorts reported a mean age of ≥70 years (table 1). In contrast, each of the four functional status instruments has been applied in at least one CAP cohort with a mean age ≥70 years. PROMs had predominantly been applied in majority hospitalised cohorts, and less than half of the study cohorts were measured beyond 1 month of diagnosis.

Constructs measured

All of the CAP-specific instruments had a primary focus on patient-reported symptoms, although two instruments (CAP-BIQ and CAP Score) also included items relating to general well-being, or psychosocial impacts and physical functioning (table 2). The generic instruments examined constructs relating to performance of activities of daily living and overall health-related quality of life. The relative length of the different questionnaires varied greatly.

Table 2

Design characteristics of PROM instruments used in studies of recovery from CAP

Development and content validity of CAP-specific PROM instruments

Development and validation populations

The CAP-Sym was developed for the most inclusive population (any disease severity, any age >18 years) (table 3), although a relatively young cohort was used in the concept elicitation study (mean age 52 years).29 The CAP-BIQ target population was aged over 50 years, and the concept elicitation cohort was recruited primarily (60%) from the outpatient setting.19 The CAP Score underwent psychometric evaluation in a hospitalised cohort (mean age 56 years),30 while the Metlay symptom scores were developed for use in low-risk outpatients only.31 32

Table 3

Settings and populations used in content validity evaluation of CAP-specific PROM instruments

Development and validation methodologies and quality of evidence

In accordance with the COSMIN framework, both the CAP Score and Metlay Score 1 received an inadequate rating for concept elicitation as the target population were not involved in the selection of some or all items, respectively (table 4). While patients were involved in concept elicitation studies for the CAP-Sym and Metlay Score 2, these were of doubtful quality owing to lack of detail provided for the qualitative methodologies used and the questionable representativeness of the study participants. Only the CAP-BIQ concept elicitation study was judged as adequate. All CAP-specific instruments were given an inadequate rating for overall quality of the PROM development study, except for the CAP-BIQ which was rated as doubtful quality. A cognitive interview or other pilot test was conducted for the CAP Score and CAP-BIQ, though both were assessed to be of doubtful quality (online supplementary material 5).

Table 4

Instrument development and concept elicitation study quality ratings for CAP-specific PROM instruments

While formal validation studies were conducted with patients for all instruments, these were, in all cases, inadequate for determining the content validity of the instrument. Using the GRADE criteria,27 the quality of evidence for measurement of content validity was judged to be of very low quality for the CAP Score and Metlay Scores 1 and 2 (table 5). For these instruments, under the COSMIN framework, the ratings for relevance, comprehensiveness and comprehensibility are therefore based entirely on the opinion of the authors of this review. The marginally higher ratings for evidence quality for the CAP-Sym and CAP-BIQ are attributable to completion of a content validity study in professionals (CAP-Sym) and concept elicitation or pilot study of adequate quality (CAP-BIQ).

Table 5

Results of overall content validity evaluation of CAP-specific PROM instruments

Content validity

The items contained in all instruments were found to be relevant to the construct of interest for this review: symptoms, physical function and quality of life of adults recovering from CAP. The inconsistent rating for relevance given to both Metlay Scores is attributable to a lack of clear description of the construct to be measured by the instrument. Both the CAP-Sym and CAP-BIQ were found to be sufficiently comprehensive, while comprehensiveness of the other instruments was judged by the reviewers to be insufficient or inconsistent. The CAP-BIQ, CAP-Score and CAP-Sym were found to have sufficient comprehensibility, though in the case of the latter, this is based on the opinion of the reviewers, not patients themselves. The comprehensibility of the Metlay Scores could not be assessed due to the unavailability of sample data collection instruments.


This study is the first, to the authors’ knowledge, to systematically review and document the availability, quality and use of PROM instruments in CAP. The significance of this illness, in terms of both morbidity and health service cost, promotes a degree of urgency for the establishment of a robust outcome framework to both understand the health gains produced and the value associated with care. Consensus on the outcomes of most importance in CAP is lacking, and ongoing debate has failed to resolve the question of the appropriateness of various clinical and patient-centred measures.33 34 PROMs are often overlooked in favour of objective clinical measures such as mortality or time to clinical stability which do not necessarily reflect the outcomes of most value to the patient.4 Recent work relating to development of standardised outcome frameworks for antimicrobial drug trials in CAP, including design of the CAP-BIQ, is certainly a move in the right direction but is yet to be completed or extrapolated to settings outside pharmaceutical registrational trials.18

The results of this review have identified that the patient-reported constructs measured in studies of recovery from CAP to date are inconsistent, and the quality of evidence to support the CAP-specific instruments applied is suboptimal. The relatively recent CAP-BIQ concept elicitation study found that CAP survivors reported a wide range of symptoms and problems, including a significant need for caregiver assistance during recovery.19 The latter construct was not measured by any of the other CAP-specific instruments but may be captured by alternative measures such as the Barthel Index or generic quality of life instruments. In the absence of an adequately developed and validated CAP-specific instrument, clinicians have relied on these generic instruments, or those designed for other respiratory illnesses. The consequence of inconsistency in the constructs measured and instruments applied is that the value of new interventions, clinical tools and models of care to patients themselves cannot be determined or compared.

Despite the existence of five validated CAP-specific instruments, none were supported by high-quality evidence of their content validity. Four of the instruments were developed more than a decade ago, prior to establishment of advanced methodologies for concept elicitation and instrument evaluation, and none have been further validated since their development.29–32 Only the newer CAP-BIQ questionnaire underwent adequate instrument development methodologies, but has yet to undergo (to the authors’ knowledge) formal evaluation of content validity.19 Additionally, other important measurement properties, such as structural and cross-cultural validity, responsiveness, reliability and measurement error,22 have not been evaluated for this instrument.

The other key issue with the existing CAP-specific instruments relates to the fundamental disconnect between the population used in development and testing and the population generating the bulk of the CAP health burden—those of advanced age and with complex multimorbidity.8 9 None of the CAP-specific instruments identified in the review were explicitly designed for use in an elderly inpatient cohort. Each instrument has been applied in only one validation study, all of which were conducted in populations with a mean age under 65 years. The only instrument that has been designed specifically for hospitalised populations (CAP Score) did not include patients in concept elicitation. Older adults report vastly different patterns of symptoms35 36 and functional disability37 than younger adults and therefore are likely to find measurement of alternative constructs of greater relevance. Population heterogeneity makes striking a balance between relevance and comprehensiveness difficult, due to the multisystem nature of the CAP illness, broad spectrum of symptoms reported, complexity of underlying comorbid disease and the range of age groups afflicted. For these reasons, the validity of existing CAP-specific instruments tested in younger patient groups cannot be generalised to the elderly population cohort generating the highest CAP-associated costs, where the role of PROMs is arguably most important.

The primary limitation of this study was that only CAP-specific instruments were assessed for their content validity, and the relative quality of the generic instruments, despite their frequency of application, was therefore not assessed. Disease-specific instruments that have been designed explicitly to measure recovery from CAP should be expected to have increased sensitivity to detect changes in health for sufferers of this illness when compared with generic instruments. This relies, of course, on the CAP-specific instruments having undergone appropriate design and validation methodologies in a sample that is representative of the general CAP population. Our review found that (1) none of the CAP-specific instruments have been adequately designed and validated and (2) the patient samples used in concept elicitation and validation do not represent the hospitalised CAP population. For these reasons, and on the basis of existing evidence, well-designed generic instruments such as the SF-36 and EuroQol-five dimension (EQ-5D) are likely to be more reliable and valid for use in elderly hospitalised CAP populations.38–40 Self-reported activities of daily living performance instruments, such as the Barthel and Katz Indexes, are another alternative, although prior studies are critical of these, with concerns raised regarding the impact of ceiling effects41 and changing gender roles.42

There were several other limitations to this study. First, a focus on recovery from CAP means that instruments used in prognostic, prophylactic or diagnostic applications were not considered. Second, only English language studies were examined, meaning that good quality instruments that have been developed and validated in other languages might exist. Finally, application of the COSMIN assessment framework relies on detailed documentation of methodologies used in PROM development studies. For each of the CAP-specific instruments, the level of methodological detail provided was suboptimal, which made assessment of overall PROM quality difficult.

In summary, there is a lack of consistency and consensus regarding which constructs are important to measure when evaluating recovery from CAP from the patient’s perspective. This may be due to the complexity of the underlying population and the degree to which different problems are important to various subgroups. Additionally, the overall content validity of all available CAP-specific instruments is unclear, particularly in the context of elderly hospitalised populations who constitute a significant proportion of the overall healthcare burden from this condition. Based on current evidence, the generic instruments are likely to be of greater value in situations where representative elderly CAP populations are undergoing measurement.


The authors would like to acknowledge the Western Health librarian, Evelyn Hutcheon, for her help with developing the review search strategy. The authors thank Professor Edward Janus for his ongoing support of the Western Health CAP research program.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
  30. 30.
  31. 31.
  32. 32.
  33. 33.
  34. 34.
  35. 35.
  36. 36.
  37. 37.
  38. 38.
  39. 39.
  40. 40.
  41. 41.
  42. 42.
  43. 43.
  44. 44.
  45. 45.
  46. 46.
  47. 47.
  48. 48.
  49. 49.
  50. 50.
  51. 51.
  52. 52.
  53. 53.
View Abstract


  • Contributors ML conceived the study, coordinated data collection and analysis and drafted the manuscript. LD, EC, AK and HK contributed to the study design, data collection and analysis and reviewed the manuscript critically for intellectually important information.

  • Funding ML was supported by an Australian Government Research Training Scheme Scholarship at the University of Melbourne.

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement All data relevant to the study are included in the article or uploaded as supplementary information.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.