Background The European Respiratory Society (ERS) lung sounds repository contains 20 audiovisual recordings of children and adults. The present study aimed at determining the interobserver variation in the classification of sounds into detailed and broader categories of crackles and wheezes.
Methods Recordings from 10 children and 10 adults were classified into 10 predefined sounds by 12 observers, 6 paediatricians and 6 doctors for adult patients. Multirater kappa (Fleiss' κ) was calculated for each of the 10 adventitious sounds and for combined categories of sounds.
Results The majority of observers agreed on the presence of at least one adventitious sound in 17 cases. Poor to fair agreement (κ<0.40) was usually found for the detailed descriptions of the adventitious sounds, whereas moderate to good agreement was reached for the combined categories of crackles (κ=0.62) and wheezes (κ=0.59). The paediatricians did not reach better agreement on the child cases than the family physicians and specialists in adult medicine.
Conclusions Descriptions of auscultation findings in broader terms were more reliably shared between observers compared to more detailed descriptions.
- Clinical Epidemiology
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
How can we improve the agreement on lung sound descriptions?
Although auscultation of the lungs is important in medical diagnosis and decision-making, disagreement on the use of terms describing the sounds weakens the diagnostic value of the adventitious lung sounds for chest diseases.
Poor agreement was found when 12 observers classified lung sounds from video recordings of 20 patients with lung diseases into detailed categories, whereas acceptable agreement was obtained when the terms were combined into broader categories.
The stethoscope is the quintessential iconic symbol of the medical profession. However, the reputation of this 200-year-old instrument as a useful diagnostic tool in lung disease has been declining since chest radiography became available.1 Reports on the limited diagnostic value of chest auscultation in conditions like pneumonia2 and heart failure3 have contributed to the low standing of chest auscultation among medical experts today. Guidelines from the Global initiative for Obstructive Lung Disease (GOLD) give little credit to lung sounds, and auscultation findings are not even listed among clues to early diagnosis of chronic obstructive pulmonary disease (COPD).4 In the Global Initiative for Asthma (GINA) guidelines, however, wheezing is mentioned to be a main symptom of asthma.5
Clinical studies have shown that lung auscultation is far from useless. Crackles predict radiographically confirmed pneumonia more strongly than any single respiratory symptom,6 and wheezes are heard more frequently with increasing severity of bronchial airflow limitation.7 However, the results of such studies may be challenged when one takes into consideration the interobserver variation in the description of lung sounds between clinicians. Efforts to standardise the terminology led to a statement from the International Lung Sound Association (ILSA) in 1987.8 The European Respiratory Society (ERS) established a Task Force on Lung sounds in 1999, and a report aiming at the standardisation of computerised analysis of lung sounds, including a chapter on terminology, was published in 2000.9 An overview of our present knowledge on lung sounds and recommended terminology has recently been published.10
In a review on interobserver agreement on chest findings and their diagnostic value published in 2010, the authors found great variation between the published studies in terms of lung sounds, with mostly fair to moderate agreement in the use of terminology.11 In a study from primary care in 12 European countries in 2007, great variation between the countries was found in the use of lung sound terms in patients with acute cough.12 It seems, therefore, that earlier attempts to standardise lung sounds terminology have failed to improve agreement between physicians on how to describe the lung sounds they hear during chest auscultation. This is important because lung auscultation is still commonly used in clinical practice, and the findings have an impact on the treatment of patients. Hearing crackles, for instance, strongly predicts antibiotic prescribing.13 ,14 The current nomenclature system with its distinction between coarse and fine crackles and between rhonchi and wheezes10 may make it more difficult to reach agreement on the use of terms.15 The differences in terminology between different languages hampers meaningful exchange of chest auscultation findings between clinicians and researchers from different countries. These considerations prompted the institution of another ERS Task Force in 2012 to further standardise the use of lung sound terminology between clinicians and researchers from countries with different languages, based on a repository of audiovisual recordings of lung auscultation.
An initial reference collection of 20 audiovisual recordings has recently been made available online as part of the ERS site Learning resources. We wanted to determine the interobserver variation in the classification of these sounds, with particular attention to more or less detailed descriptions of adventitious sounds. We also wanted to assess a potential influence of the professional background, that is, paediatrician versus family physician or adult medical specialist, on the classification of recordings from children and adults.
Material and methods
This is a study of interobserver variation between medical doctors in classifying lung sounds from video recordings performed at respiratory units in Spain, Greece, Norway, the Netherlands and Canada during 2013. The recording equipment was standardised, using video cameras with external microphone input (Canon Legria HF series, Panasonic HC-X900) and quality electret microphones which were placed in the tubes of standard stethoscopes. More details have already been published.16 Each centre obtained ethics approval at their institution and prepared consent forms in their required formats.
The quality of 80 video recordings were rated by five of the six task force members, leaving out the member who had done the actual recordings, and they recommended whether or not to include the recording on the repository. Twenty recordings, 10 of children and 10 of adults, were judged to be of sufficient quality. The recordings, each of approximately 15 s duration, were classified independently by the six task force members on an online questionnaire (FluidSurveys.com, Ottawa, Canada). Five of the task force members were paediatricians and one a general practitioner (GP). Their age ranged from 52 to 64 years. Subsequently, the 20 recordings were classified by a convenient sample of six additional physicians who used the same online questionnaire. Three were internationally recognised lung sound researchers aged 59–71 years, among whom one was a paediatrician, and three Norwegian GPs aged 38–49 years. The latter had postgraduate clinical experience of 6 years or more, but no exposure to lung sound research. The observers downloaded the video files to their computers and were asked to classify the recordings according to recommended English language nomenclature (fine and coarse crackles, high-pitched and low-pitched wheezes).8 We also included the category of rhonchi, although it is used interchangeably with low-pitched wheezes.10 Identification by respiratory phase thus offered 10 non-exclusive choices. The questionnaire also offered free-text options to describe adventitious sounds by other terms.
All observers reported normal hearing capacity, except for one 71-year-old expert who reported some trouble with speech perception at higher frequencies, but not with lung sounds that were typically below 1000 Hz. No instructions were given regarding the volume setting for audio playback.
Agreement among the majority of observers (seven or more) on each of the 10 terms in each of the 20 cases was identified. To further evaluate this interobserver agreement, Fleiss' multirater kappa (κ) with 95% CIs was calculated. The κ values were interpreted as follows: 0–0.20 slight, 0.21–0.40 fair, 0.41–0.60 moderate, 0.61–0.80 substantial, and 0.80–1.0 almost perfect agreement.17 After calculating agreement on the 10 predefined categories, multirater κ was calculated after combining fine and coarse crackles in a common category, rhonchi with high-pitched and low-pitched wheezes, and finally the respective inspiratory and expiratory sounds into simply crackles and wheezes. Agreement in reporting other sounds than the 10 fixed options were elevated for sounds reported by seven or more observers to be present in the same case. We also calculated Fleiss' κ among the paediatricians and the observers familiar with examining adults in subsamples of children and adult patients. SPSS V.22 and R statistical package were used in the analyses.
Frequencies of agreement
High-pitched expiratory wheeze was the predefined sound category most frequently reported by the observers, and in 5 of the 20 cases the majority (seven or more) of the observers reported this sound. The majority of observers never reached this level of agreement on the terms expiratory fine crackles, inspiratory or expiratory rhonchi, and inspiratory low-pitched wheezes. The term low-pitched wheezes was more frequently used than rhonchi and when these interchangeable terms were combined, better agreement was reached (figure 1), and it was even better when combined with high-pitched wheezes. Likewise, when fine and coarse crackles were combined into one category, agreement among the majority of the task force members occurred more frequently (figure 1). Such agreement on the presence of one or more of the four sound categories (inspiratory and expiratory crackles and wheezes) was reached in 16 of the 20 cases. The majority agreed on more than one of the four categories in 8 of the 20 cases, in 2 adult cases and 6 child cases. In one case, the majority of observers reported pleural rub (table 1).
Multirater κ agreements
Slight multirater kappa agreement was found for 5 of the 10 basic descriptions of lung sounds (κ≤0.20), fair agreement for four of the categories (κ 0.21–0.40) and moderate agreement for one category only, high-pitched inspiratory wheezes with κ=0.43 (figure 2). After combining fine and coarse crackles and high-pitched and low-pitched wheezes together with rhonchi, moderate agreement was reached for three of the four categories (figure 3). An even better agreement was reached after further lumping inspiratory and expiratory sounds, with kappas for crackles and wheezes of 0.62 and 0.59, respectively (figure 3). The agreement on pleural rub reached a κ of 0.52.
Impact of age of cases and kind of physician
The agreement tended to be stronger on the adult cases than on the child cases. The paediatricians did not reach better agreement on the child cases than the doctors familiar with examining adults (figure 4).
Only slight to fair agreement was found for detailed descriptions of the adventitious sounds. In contrast, moderate to substantial agreement was reached for the combined categories of crackles and wheezes. For the wheezes, there was also moderate to substantial agreement when differentiating between inspiratory and expiratory sounds. The paediatricians did not agree better than the other doctors on the lung sounds from children. Overall, agreement on paediatric lung sounds was poorer than that on lung sounds from adults.
Similar or somewhat stronger agreements on the presence of crackles and wheezes were found in this study compared to most previous studies.18–29 In most previous studies, the observers listened to real patients and registered crackles and wheezes without specifying the respiratory phase. Moderate agreement with κ values between 0.41 and 0.60 have usually been found, also in a study where a teaching stethoscope was used, allowing four observers to listen simultaneously.25 In a few studies, the observers have listened to audiotapes and reached similar or somewhat better agreements.29 ,30 No previous interobserver study of auscultation findings has been based on high-quality audiovisual recordings with a microphone inserted into the tubing of a regular stethoscope. The most comparable previous study had observers watching video recordings and registering wheezing heard without a stethoscope. In that study, the multirater kappa agreement was 0.36.31 The moderate to substantial agreement found in this and previous interobserver studies on lung sounds is not inferior to agreement reached in other examinations frequently used in pulmonary medicine, like consolidation on chest radiography (κ 0.4–0.6)32 and the CT patterns of bronchiectasis, emphysema and honeycombing (κ 0.42–0.59).33
In many previous studies, detailed descriptions of the sounds have been based on computerised analysis. Fine and coarse crackles have been identified by the duration of each crackle as shown on a phonogram.34 It is more difficult to differentiate the types of crackles by listening. It is, for instance, possible that the amplitude of crackles plays a role in how the sound is perceived.35 Loud crackles may be perceived to be coarser than faint crackles of the same duration.
It can also be difficult to differentiate between low-pitched and high-pitched wheezes by listening. Separation based on a 200 Hz cut-off has been recommended.36 Few observers can perfectly determine the pitch, and observer disagreement on this subclassification is therefore not surprising. Agreement on the presence of rhonchi was particularly poor in our study. This is in accordance with previous studies, where the agreement on rhonchi was much weaker than the agreement on crackles and wheezes.24 ,37 Since the terms ‘rhonchi’ and ‘low-pitched wheezes’ are used interchangeably,10 low agreement could be expected. The task force decided anyway to include ‘rhonchi’ as a separate category, thinking the term could be preferred for low-pitched wheezes that do not sound musical but more like snoring, With our results, it seems difficult to agree on this division of low-pitched continuous sounds.
Other characteristics of crackles and wheezes may be more important and also easier to identify than those applied in our classification. The number of crackles may be of importance and also the timing during inspiration.34 In terms of wheezes, both sound intensity and the duration of the wheezes in each respiratory phase are of importance. Shim and Williams38 found that the three characteristics high pitch, intensity and spanning the entire phase were linked to decreased peak expiratory flow rate in patients with asthma.
The 20 cases were from real patients with various lung diseases, both children and adults, and in eight cases more than one category of adventitious sound was presented. Six of the eight complex cases were recordings from children, and these might have been more difficult to classify than the recordings with only one kind of adventitious sound. This may have led to poorer agreement in the paediatric cases than in the adult cases.
To avoid misunderstanding on chest findings by auscultation, the use of combined terms of crackles and wheezes, or similar categories in other languages, should be encouraged when health workers communicate with each other. This does not mean that more precise terms should be discarded. Distinguishing between fine and coarse crackles and high-pitched wheezes and low-pitched wheezes/rhonchi may be important for some diagnoses,34 for example, during early stages of interstitial lung fibrosis when fine inspiratory crackles are heard.39 This may also be relevant in various obstructive airway diseases of young children where ‘wheeze’ is too broad a term to adequately characterise their lung sounds.40 However, large studies with relevant outcomes that take other easily available information into account need to be carried out to prove the clinical usefulness of such differentiation.
Strengths and limitations
The video cases were selected from a larger group of files to ensure only high-quality recordings with few artefacts. Although artefacts are also common in real life, we expect that agreement on all recordings without the application of quality criteria would have been poorer than the results presented here.
The mixture of observers increased the probability of transferable results. The task force members had some clinical information on the cases they had contributed to, but although this could have had some influence on their rating of these cases, this knowledge could not have had any significant effects on the agreements.
The statistical strength of the differences in κ values between detailed and lumped categories of sounds may be questioned. The CI of the κ for inspiratory crackles (figure 3) did not include the κ values for fine or coarse inspiratory crackles (figure 2), and that of expiratory crackles did not include the κ of fine expiratory crackles. Likewise, the CIs for inspiratory and expiratory wheezes did not include the respective κ values for rhonchi and low-pitched wheezes (figures 2 and 3). This indicates that improved agreements after lumping were statistically significant.
We found only slight to fair agreement for detailed descriptions of crackles and wheezes. Broader terms were more reliably shared between the observers.
The authors would like to thank the patients who have taken part in the video recording, the medical student Raimonda Einarsen and Stian Andersen for shooting videos at the University Hospital of North Norway, Drs Steve Kraman, Andrew Bush, Päivi Pirilä, Peder A. Halvorsen, Magnus Hjortdahl and Anne H. Davidsen for the classification of the sounds, Dr Juan Carlos Aviles Solis for statistical assistance and Dr Peter M. A. Calverley for promoting the establishment of the ERS Task Force for Lung Sounds.
Contributors HM takes final responsibility for the content of this manuscript, including the data and analysis. HP designed the questionnaire, and all authors classified the sounds. HM, LG-M, HP and PB contributed to data analysis, and HM, LG-M, HP, PB, ME and KP contributed to the interpretation of results and writing of the manuscript. All authors approved the final version.
Funding Financial support has been given by The European Respiratory Society
Competing interests None declared.
Ethics approval Each centre obtained ethics approval at their institution and prepared consent forms in their required formats.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement No additional data are available.