Article Text

Validity and reliability of outcome measures to assess dysfunctional breathing: a systematic review
  1. Vikram Mohan1,
  2. Chandrasekar Rathinam2,3,
  3. Derick Yates3,
  4. Aatit Paungmali4 and
  5. Christopher Boos5,6
  1. 1Department of Rehabilitation and Sports Sciences, Faculty of Health and Social Sciences, Bournemouth University, Bournemouth, UK
  2. 2University of Birmingham, Birmingham, UK
  3. 3Birmingham Women's and Children's NHS Foundation Trust, Birmingham, UK
  4. 4Department of Physical Therapy, Faculty of Associated Medical Sciences, Chiang Mai University, Chiang Mai, Thailand
  5. 5Cardiology Department, University Hospitals Dorset NHS Foundation Trust, Poole, UK
  6. 6Faculty of Health and Social Sciences, Bournemouth University, Bournemouth, UK
  1. Correspondence to Chandrasekar Rathinam; c.rathinam{at}


Objective This study aimed to systematically review the psychometric properties of outcome measures that assess dysfunctional breathing (DB) in adults.

Methods Studies on developing and evaluating measurement properties to assess DB were included. The study investigated the empirical research published between 1990 and February 2022, with an updated search in May 2023 in the Cochrane Library database of systematic reviews and the Cochrane Central Register of Controlled Trials, the Ovid Medline (full), the Ovid Excerta Medica Database, the Ovid allied and complementary medicines database, the Ebscohost Cumulative Index to Nursing and Allied Health Literature and the Physiotherapy Evidence Database. The included studies’ methodological quality was assessed using the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) risk of bias checklist. Data analysis and synthesis followed the COSMIN methodology for reviews of outcome measurement instruments.

Results Sixteen studies met the inclusion criteria, and 10 outcome measures were identified. The psychometric properties of these outcome measures were evaluated using COSMIN. The Nijmegen Questionnaire (NQ) is the only outcome measure with ‘sufficient’ ratings for content validity, internal consistency, reliability and construct validity. All other outcome measures did not report characteristics of content validity in the patients’ group.

Discussion The NQ showed high-quality evidence for validity and reliability in assessing DB. Our review suggests that using NQ to evaluate DB in people with bronchial asthma and hyperventilation syndrome is helpful. Further evaluation of the psychometric properties is needed for the remaining outcome measures before considering them for clinical use.

PROSPERO registration number CRD42021274960.

  • Patient Outcome Assessment
  • Respiratory Measurement
  • Physical Examination
  • Asthma

Data availability statement

Data are available in a public, open access repository. All the relevant data were available at

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


  • Clinicians commonly use various outcome measures to examine dysfunctional breathing (DB). Currently, no review is available that examines these outcome measures psychometric properties.


  • The psychometric properties of the available DB outcome measures in adults are reviewed. Nijmegen Questionnaire (NQ) is the only available outcome measure graded as ‘very high’ quality and evaluated by the COnsensus-based Standards for the selection of health Measurement INstruments tool.


  • The existing outcome measures need to establish content validity and other psychometric properties prior to consideration for clinical use. NQ can be used to assess DB in the adult population.


The normal breathing pattern consists of thoracic and abdominal cavity expansion during inhalation and retraction during exhalation.1 Dysfunctional breathing (DB) deviates from the typical biomechanical pattern.2 3 Barker and Everard (2015) proposed a definition for DB as ‘an alteration in the normal biomechanical patterns of breathing that results in intermittent or chronic symptoms that may be respiratory and/or non-respiratory’.3 The DB subtypes include thoracic and extrathoracic patterns.2 3 Thoracic DB is often observed in hyperventilation and extrathoracic DB in patients with paradoxical vocal cord dysfunction.3 A DB has historically been identified under a variety of nomenclature; a few examples include thoracoabdominal asynchrony, breathing pattern dysfunction, breathing pattern disorder, unexplained breathlessness, psychological breathlessness, panic breathing, apical breathing, periodic deep sighing, hyperventilation and paradoxical breathing.3 4 DB has an estimated prevalence of 29% and 8% in people with and without asthma, respectively.5 This signifies that the general adult population and those with lung disease may experience DB with symptoms that may improve with treatment, contributing to improved quality of life (QoL).6

Several respiratory disorders, such as bronchial asthma, sleep apnoea and chronic obstructive pulmonary disease, are reported to be linked with DB.7–9 Breathlessness, chest tightness, anxiety, light-headedness and fatigue can occur in people with these illnesses and DB.6–9 QoL, anxiety, sense of coherence and asthma control are significantly reduced in patients with DB, and breathing retraining has been shown to improve DB and health-related QoL.10 11 Even though the DB is non-specific in some instances, it can lead directly to misdiagnosing respiratory disease in many situations.4 Despite the clinical importance of evaluating DB, a consensus on the assessment method still needs to be reached. The potential impacts of DB on constructs like bodily biochemistry, psychological functioning and social aspects must also be considered in a comprehensive evaluation.6 12–14

Clinical judgement and outcome measures enhance symptom-specific DB evaluation. An outcome measure that examines DB is required to guide suitable treatments. A range of objective evaluation instruments are available, including respiratory movement measuring instruments and respiratory inductive plethysmography.15 16 These laboratory-based measurement methods offer identification of DB, and they have excellent reliability and validity.16 17 However, these outcome measures cannot be used in routine clinical practice, especially in the community, due to time consumption, expensive equipment and the need for specific clinical environments. Clinicians often use various outcome measures to assess DB.18–20 These include Hi-Lo breathing,21 the Manual Assessment of Respiratory Motion (MARM),21 the Self-Evaluation of Breathing Questionnaire (SEBQ),22 the Breathing Pattern Assessment Tool (BPAT),23 the Total Faulty Breathing Scale (TFBS)24 and Nijmegen Questionnaire (NQ).25

The available outcome measures use various methods to detect DB. For example, in MARM, the examiners use the palpation method to detect DB21; Hi-Lo and TFBS assess breathing motion through observation26 and NQ through self-reported measures.15 21 25 Before any outcome measure is viable for routine clinical practice, validity and reliability must be established to ensure clinicians’ confidence in the measurement. To determine best practices for the assessment of DB, a systematic review of the existing literature to explore the reliability and validity of outcome measures is imperative. The systematic retrieval and appraisal of all literature about DB with a quantitative synthesis will lead to best practice guidelines for clinicians and researchers. This systematic review aims to provide a synthesis of outcome measures used to evaluate DB and appraise the psychometric properties of these outcome measures.


This study used the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) guidelines for systematic reviews of patient-reported outcome measures.27–29 The methods of this systematic review follow the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) recommendations for systematic reviews and outcome measurement instrument selection, which are currently being piloted.30 We registered this review protocol on PROSPERO (CRD42021274960) and updated the amendments regularly.

Search strategy

An experienced medical librarian (DY) carried out literature searches in the Cochrane Library database of systematic reviews and the Cochrane Central Register of Controlled Trials, the Ovid Medline (full), the Ovid Excerta Medica Database (Embase), the Ovid Allied and complementary medicines database, the Ebscohost Cumulative Index to Nursing and Allied Health Literature and the Physiotherapy Evidence Database (PEDro).

To perform the literature searches, a construct (DB), instrument (assessment instruments) and outcome (validity and reliability) framework were employed. Following a scoping search, relevant synonyms were found and validated as suitable and informative by the review team’s clinicians and academics. Searches were carried out to identify the relevant subject headings for those databases with a subject thesaurus (MeSH or Emtree) and text words in each database’s title and abstract fields. Proximity operators were used to combine search words together in the title and abstract fields to increase search sensitivity. To increase the precision of the results returned by the searches, the review team decided to include a NOT operator in the search strategies to screen out papers related to sleep apnoea at the database search stage.

Searches were run in February 2022 and repeated in May 2023 before study completion to ensure the review considered the most recently published research. Due to the limited search functionality of the PEDro, this was searched using separate individual search phrases to identify relevant research on DB. On 22 February 2022, five of those phrases were identified as abstracts, and these were ‘dysfunctional breathing’, ‘breathing disorder’, ‘thoracoabdominal synchrony’, ‘apical breathing’ and ‘respiratory dysfunction’. These phrases were searched again on 11 May 2023. Date limits were applied to screen out papers published before 1990. The rationale for this decision was that the term DB or breathing pattern dysfunction, only came into existence and began to be used commonly in the medical literature in 1990. A copy of the full search strategy run in Ovid Medline and other databases is available (online supplemental file S1). The resulting references identified by the database searches were uploaded into the Endnote reference management software package to allow for an initial screening.

Study selection

The following inclusion criteria were considered: (1) an outcome measure that investigated the validity and/or reliability of DB in the adult population (18+ years) using clinician-reported and patient-reported outcome measures and (2) full articles and service evaluation reports published in a peer-reviewed journal in English. Exclusion criteria were studies that used laboratory-based outcome measures, systematic reviews, conference abstracts, research letters, commentaries and letters to the editor.

Data extraction (selection and coding)

Two independent reviewers (VM and CR) screened the titles and abstracts for relevancy using the inclusion and exclusion criteria. Reference lists of all included studies were also searched for relevant titles. The authors (VM and CR) retrieved full-text articles that met the study criteria. The first author (VM) article (TFBS) was included in this review; to mitigate conflict of interest and reduce bias, only CR investigated the articles related to TFBS. The PRISMA flow diagram of this procedure is depicted in figure 1 using the PRISMA 2020 statement.31

Figure 1

Preferred Reporting Items for Systematic reviews and Meta-Analyses flow chart. AMED, Ovid allied and complementary medicines database; CINAHL, Ebscohost Cumulative Index to Nursing and Allied Health Literature; EMBASE, Ovid Excerta Medica Database; MEDLINE, Medical Literature Analysis and Retrieval System Online; PEDro, Physiotherapy Evidence Database.

Risk of bias and quality of results

The team used the COSMIN methodology for systematic reviews of patient-reported outcome measures (PROM) and clinician-reported outcome measures to evaluate the psychometric characteristics of outcome measures used in persons with DB.27–29 The COSMIN PROM recommends using an outcome measure with ’sufficient' content validity and internal consistency.27–29 The reviewers (VM and CR) individually extracted and evaluated the data for the first nine attributes listed in the COSMIN tool.

The COSMIN checklist was used to assess the methodological rigour of each outcome measure across the measurement attributes. These include reliability, validity and other psychometric properties. The methodologies provided for evaluating the measurement properties of all the outcome measures are included in this systematic review. Study quality was assessed separately for each measurement property using a four-point rating system (very good, adequate, doubtful, inadequate or not applicable).29 The 'worst score counts' principle was used, where the overall rating for each measurement property is given by the lowest rating of any standard in the box.28 29 The results of individual studies on measurement characteristics were compared with COSMIN criteria for good measurement qualities. Each outcome was graded as sufficient (+), insufficient (−) or indeterminate (?). Relevance, comprehensiveness and comprehensibility criteria were used to grade the quality of the results in research reporting on content validity.

The result of each study on a measurement property is rated using the most recent standards for good measurement properties. The total ratings of the study outcomes for each measurement property per outcome measure were summarised as sufficient (+), insufficient (−), indeterminate (?) or inconsistent (±). An overall rating was calculated by summing the scoring of each study; if 75% of the studies had the same scoring, that scoring became the overall rating (+ or −). However, if <75% of the studies had the same scoring, the overall rating would become inconsistent (±). If more than two articles were available, a summary of the overall evidence for measuring the properties of the outcome measures was determined. The lowest and highest results for each measurement property of an outcome measure are displayed to illustrate a set of findings that have been qualitatively aggregated.

The evidence’s quality was rated using a modified Grading of Recommendations Assessment, Development and Evaluation (GRADE) system, with grades of ‘high’, ‘moderate’, ‘low’ or ‘very low’.27 The quality of the evidence was not rated for studies with an uncertain overall rating. For the quality assessment, two reviewers (VM and CR) independently worked on each stage while taking into account factors including the risk of bias, inconsistency, imprecision and indirectness. Starting with high-quality evidence, the quality of the evidence was reduced while considering all factors for the outcome measures. Disagreements were addressed by discussion and/or consultation with a third reviewer (AP).

Patient and public involvement

Patients were not involved in this review due to the complexity of evaluating the psychometric properties of the DB tools.


Our first search (22 February 2022) yielded 1735 references. After removing duplicates, 1246 references were included for title and abstract screening. In our second search (11 May 2023), we identified 144 references. After removing duplicates, 96 references were included for title and abstract screening. Sixteen papers met inclusion criteria, seven through database searching and nine through searching reference lists of included studies (figure 1).

Overview of outcome measures

Our search identified the following ten outcome measures that have examined reliability and/or validity components: Breathing Vigilance Questionnaires (Breathe-VQ),32 MARM,15 21 NQ,33–37 BPAT,38 39 Hi Lo test,21 clinical assessments of increased work of breathing,40 Milstein Breathing Pattern Assessment Index (M-BPAI),41 SEBQ,14 22 TFBS24 26 and Dyspnoea-12 (D12) questionnaire.38 The Hi-Lo and D-12 scales were not included in this review for evaluation because they are not the primary scales used to assess DB.21 38 Of the 16 studies, only nine included participants with DB, and the remaining seven included healthy participants. The COSMIN guidelines recommend testing the measurement properties on the target population.27 However, the identified studies have used these outcome measures in patients and healthy people. Therefore, these groups’ measurement properties were given separately (table 1 (online supplemental file S2).

Table 1

Characteristics of outcome measures in studies involving patients

Developmental and content validity studies

Developmental studies

The evidence synthesis of the developmental and content validity of available outcome measures is summarised in table 2. Of the eight outcome measures, only two were reported to have developmental and content validity properties.32 33 35 36 A representative patient sample and a cognitive interview are required to develop an outcome measure. A cognitive interview study offers information on the items’ depth, especially their readability as an outcome measure. However, this was only followed in some of the included studies. All four studies involved experts32 34–36 and three involved patients.34–36 Concept elucidation was deemed ‘inadequate’ for Breathe-VQ and Korean-NQ because only healthy participants engaged in the studies.32 33 Other NQ trials were rated ‘very good’ since the patients involved were typical of the target population.

Table 2

Evidence synthesis of developmental, content validity of Nijmegen Questionnaire and Breathing Vigilance Questionnaire using COnsensus-based Standards for the selection of health Measurement INstruments checklist

Content validity studies

Three of the four articles on the content validity of NQ involved patients34–36 and all four involved experts.33–36 Of these three studies, patients’ relevance, comprehensiveness and comprehensibility were evaluated for one study only.34 A cognitive interview was conducted for the Breathe-VQ, but the quality was ‘inadequate’ as it was not conducted in a patient population.32 No studies on the development and content validity of TFBS, MARM or SEBQ were found. Only the NQ has been considered for rating, and it was judged as ‘sufficient’.

Risk of bias assessment rating of other measurement properties

The evidence synthesis for all outcome measures and additional measurement properties is summarised in table 3 (online supplemental file S3) and Supplementary file S4—

Table 3

Methodological quality and rating of psychometric properties in studies involving patients

Internal structure

Among the included studies, only three reported structural validity.32 34 37 Two studies explored the structural validity of the NQ measure,34 37 and one study explored the Breathe-VQ measure.26 NQ structural validity was examined using the Rasch model and exploratory factor analysis with ‘very good’ and ‘inadequate’ quality.34 37 For structural validity, the Breathe-VQ study employed exploratory/confirmatory factor analysis of ‘very good’ quality.32 The internal consistency of the NQ, as measured by Cronbach’s alpha, ranged from >0.70 to 0.92 with a ‘very good’ quality and ‘sufficient’ rating.34 35 However, Rasch and factor analysis do not apply to other outcome measures, especially clinician-reported outcome measures.


In total, 10 studies reported reliability measures. M-BPAI41 and TFBS24 26 were rated to have ‘very good’ methodological quality and ‘sufficient’ rating.41 The methodological quality and rating were the same as the MARM and SEBQ.15 21 22 Only one study that reported the reliability of clinical assessment of the work of breathing exhibited ‘adequate’ methodological quality and was ‘indeterminate’ for the rating.40 The test-retest reliability values for NQ were in the range of 0.90 and 0.98, corresponding to ‘very good’ to ‘adequate’ quality.34–36 It was also judged that the NQ’s overall rating was ‘sufficient’. The correlation value ranges from 0.81 to 0.82 when analysing NQ’s hypothesis testing for construct validity, indicating ‘very good’ quality and ‘sufficient’ rating.34 35 A more comprehensive evidence synthesis for these and other outcome measures is available in Supplementary file S4—

GRADE quality

The reviewers used GRADE to assess the quality of studies that involved participants with respiratory disease since the clinical application would be acceptable in the actual patient population. As a result, only the NQ that included individuals with asthma and hyperventilation syndrome was included in the GRADE quality assessment. The evidence quality is ‘high’ for the NQ in reliability and hypothesis testing for construct validity domains but ‘low’ for cross-cultural and structural validity domains. The GRADE quality assessment cannot be applied to the remaining outcome measures.


This systematic review presented an overview of outcome measures used to assess DB and evaluated the psychometric properties of outcome measures used in healthy and DB populations. NQ is the only outcome measure with sufficient psychometric properties to be considered by clinicians for the DB assessment.

Nijmegen Questionnaire

NQ’s measurement properties have gained much attention due to its long record and frequent use in DB assessment, notably in conditions including bronchial asthma and hyperventilation syndrome.39 42 The available evidence indicated that the NQ had been evaluated using rigorous methods, and its content validity, internal consistency and reliability were commonly reported. This outcome measure has been translated into other languages, but for one of the translated versions, the PROM development and content validity were not well documented.34 However, other measurement properties were well established.34

PROM development and content validity studies were not consistent across the included studies. This is due to variations in the methodological description, and it was the least reported psychometric property, followed by structural validity and hypothesis testing for construct validity. Despite this, the reviewers have used the COSMIN tool to infer the quality of PROM and content validity, and the NQ was found to have most of the measurement properties with ‘sufficient’ quality. This is an area that needs further exploration in future studies. In addition, the language and structure of the items used in the NQ need improvement. For instance, item NQ14 (cold hands or feet) does not fit the structural validity, and similarly, item NQ9 (bloated feeling in the stomach) also does not fit the Rasch model.37 Since NQ looks at many DB dimensions, these factors could be considered for prospective use.

Breathe-VQ and BPAT

Breathe-VQ is the next potential outcome measure that can be used in the DB population because the measurement properties, such as structural validity, internal consistency, reliability, measurement error, criterion validity and hypothesis testing for construct validity, are well established.32 The Breathe-VQ is best suited to assess changes related to excessive conscious breathing rather than as an outcome measure for diagnosing the DB disorder. In contrast to the NQ, the Breathe-VQ has only been examined in one study; therefore, more research is required to determine its use in the DB population before considering it for clinical use.32 It may be helpful to use NQ with Breathe-VQ to identify excessive conscious breathing caused by anxiety. The same comments apply to the BPAT, which has proven criterion validity for patients with asthma, breathing pattern disorder and post-COVID breathless individuals.38 39 BPAT is more suitable for evaluating breathing irregularities in the DB population. However, BPAT is still in the trial phase, and its clinical utility has yet to be determined.

Other outcome measures

The remaining outcome measures, such as MARM, clinical evaluation of increased effort of breathing, TFBS, SEBQ and M-BPAI, had examined only a few psychometric properties.15 21 22 24 26 40 41 The reviewers could only comment on its clinical utility once the remaining properties had been thoroughly investigated.


This review excluded grey literature, conference abstracts, poster abstracts and dissertations; therefore, potential studies could have been missed. The second ordered reference check was not done, which may lead to missing other relevant studies. Only English-language studies were considered for this review, which may have reduced the number of potentially acceptable studies in other languages in the DB population. Another limitation is the lack of primary data, which prevented the review team from conducting a meta-analysis. The reviewers had no specific training to use the COSMIN. Instead, they relied on their clinical and scholarly experience to reach an agreement. This might affect how studies are rated for quality. However, the review team mitigated this by sending the collected data to the corresponding authors of the included studies for verification, comments and triangulation.

Future consideration

Only five papers in our review briefly described the process of developing outcome measures and content validity. Determining whether the outcome measure development process had been rigorously carried out or was selectively reported is challenging. This might imply that the available outcome measures do not satisfy practitioners’ expectations or recognise the researchers’ requirements. An outcome measure with ‘inadequate’ content validity or a lack of evidence of content validity has questionable use in clinical practice. Therefore, particular attention should be given to determining the content validity of those outcome measures that do not possess this property. Detailed information on the outcome measure development process and content validity should be provided in future research. The reviewers recommended addressing these aspects in future studies.

It should be noted that the COSMIN checklist is both comprehensive and rigorous in its quality. Any other outcome measures considered here are unlikely to fulfil the standards. As a result, some of the outcome measures are rated as ‘inadequate’ quality. However, the authors recommend considering these measurement properties when constructing an outcome measure that fulfils the stringent criteria.


This review found 10 outcome measures used to assess DB. The NQ is the only outcome measure that showed evidence quality to be ‘high’ for internal consistency and hypothesis testing for construct validity and reliability. The evidence quality is ‘low’ for NQ structural validity and cross-cultural validity. The measurement properties of NQ are sufficient to recommend its use as part of a clinical application of DB. Most outcome measures have examined only a few psychometric properties; a more comprehensive investigation of all psychometric properties is needed before considering their clinical use. Future research on the existing outcome measure or developing a new outcome measure may follow the COSMIN guidelines.

Data availability statement

Data are available in a public, open access repository. All the relevant data were available at

Ethics statements

Patient consent for publication

Ethics approval

Not applicable.


We thank Rachel Senior, Specialist Physiotherapist, Dorset Adult Integrated Respiratory Service, Dorset, for her expertise and comments about the outcome measures; Dr Kathryn Collins, Bournemouth University, for her contribution in editing and all the authors of the included studies for extracted data verification and corrections.


Supplementary materials


  • X @VikramMohan10, @DJY-LIB-EBP

  • Contributors VM, CR, AP and CB were involved in study conceptualisation. VM and CR were responsible for screening, selecting articles and data entry, data interpreting, reporting and for preparing the final manuscript. DY was responsible for constructing search strategy and conducting searches. VM and CR are acting as guarantors for the work. All authors read, provided feedback and approved the final manuscript.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests VM and AP are authors of some of the included articles. They were not engaged in evaluating the methodological quality of these articles. They have no other competing interests. The first author (VM) article (TFBS) was included in this review; to mitigate conflict of interest and risk of bias, only CR investigated the articles related to TFBS.

  • Patient and public involvement statement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.