Article Text

Download PDFPDF

Influence of observer preferences and auscultatory skill on the choice of terms to describe lung sounds: a survey of staff physicians, residents and medical students
  1. Abraham Bohadana1,
  2. Hava Azulai2,
  3. Amir Jarjoui2,
  4. George Kalak2 and
  5. Gabriel Izbicki2
  1. 1Medicine, Pulmonary Institute, Shaare Zedek Medical Center, and the Hebrew University Hadassah Medical School, Jerusalem, Israel
  2. 2Pulmonary Institute, Shaare Zedek Medical Center, Jerusalem, Jerusalem, Israel
  1. Correspondence to Dr Abraham Bohadana; abraham.bohadana{at}


Background In contrast with the technical progress of the stethoscope, lung sound terminology has remained confused, weakening the usefulness of auscultation. We examined how observer preferences regarding terminology and auscultatory skill influenced the choice of terms used to describe lung sounds.

Methods Thirty-one staff physicians (SP), 65 residents (R) and 47 medical students (MS) spontaneously described the audio recordings of 5 lung sounds classified acoustically as: (1) normal breath sound; (2) wheezes; (3) crackles; (4) stridor and (5) pleural friction rub. A rating was considered correct if a correct term or synonym was used to describe it (term use ascribed to preference). The use of any incorrect terms was ascribed to deficient auscultatory skill.

Results Rates of correct sound identification were: (i) normal breath sound: SP=21.4%; R=11.6%; MS=17.1%; (ii) wheezes: SP=82.8%; R=85.2%; MS=86.4%; (iii) crackles: SP=63%; R=68.5%; MS=70.7%; (iv) stridor: SP=92.8%; R=90%; MS=72.1% and (v) pleural friction rub: SP=35.7%; R=6.2%; MS=3.2%. The 3 groups used 66 descriptive terms: 17 were ascribed to preferences regarding terminology, and 49 to deficient auscultatory skill. Three-group agreement on use of a term occurred on 107 occasions: 70 involved correct terms (65.4%) and 37 (34.6%) incorrect ones. Rate of use of recommended terms, rather than accepted synonyms, was 100% for the wheezes and the stridor, 55% for the normal breath sound, 22% for the crackles and 14% for the pleural friction rub.

Conclusions The observers’ ability to describe lung sounds was high for the wheezes and the stridor, fair for the crackles and poor for the normal breath sound and the pleural friction rub. Lack of auscultatory skill largely surpassed observer preference as a factor determining the choice of terminology. Wide dissemination of educational programs on lung auscultation (eg, self-learning via computer-assisted learning tools) is urgently needed to promote use of standardised lung sound terminology.

  • not applicable

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

View Full Text

Statistics from

Key messages

  • How observer preferences and lack of auscultatory skill contribute to the choice of terms used to describe lung sounds?

  • We found that lack of auscultatory skill largely surpassed observer preferences in the choice of terms used to describe lung sounds among staff physicians, residents and medical students from a university-based hospital.

  • To gain awareness that educational programmes on lung auscultation should be conceived and widely disseminated to promote use of standardised lung sound terminology


Lung auscultation has been traditionally considered an essential part of chest examination. Over the past two centuries, the stethoscope has evolved from the wooden cylinder first invented by Laennec1 to more sophisticated tools, some of which allow sound recording and transmission of the data to computers for further analysis, making auscultation a more scientific procedure.2

In contrast with the technical progress of the stethoscope, lung sound terminology has remained confused, jeopardising the usefulness of auscultation. The ‘American College of Chest Physicians (ACCP) and American Thoracic Society (ATS) Joint Committee on Lung Sound Nomenclature’3 and an ad hoc committee established by the ‘International Lung Sound Association’ (ILSA)4 made recommendations for the use of a standardised terminology. Recently, the European Respiratory Society established a Task Force to build a reference collection of audiovisual recordings of lung sounds involving the investigation of the nomenclature in 29 languages, in 33 European countries.5

Periodic nomenclature surveys are important to evaluate the dissemination of the recommended terminology. For this purpose, an acceptable approach is to ask observers to describe audio or visual displays of lung sounds.6–10 Using this method, studies of lung sound nomenclature have shown that lung terminology varies widely among physicians, residents and physiotherapists working in the same hospital6–8 or in different countries.9 Interestingly, agreement on the detailed classification of crackles and wheezes was found to be poor, improving when the two terms were combined into broader categories.10 At closer inspection, data from these studies suggest that, while part of the interobserver variation in terminology results from observer preferences (ie, disagreement between correct raters—eg, use of ‘rales’ or ‘crackles’), part might result from lack of auscultatory skill (ie, disagreement due to incorrect sound classification—eg, use of ‘normal breath sound’ in presence of ‘bronchial breathing’). Although this aspect has not been discussed previously, distinction between these two components of variation has practical importance because different interventions are necessary to deal with them. With these considerations in mind, we carried out a study to examine how observer preferences and lack of skill influence the choice of terms to describe lung sounds. We invited staff physicians (SP), residents (R) and medical students (MS) to spontaneously classify the audio recordings of five common lung sounds. First, we determined their ability to identify the sounds; then, we compared the terms used by correct and incorrect raters separately from one another.

Material and methods


We surveyed 31 SP, 65 R and 47 MS working at Shaare Zedek Medical Center, affiliated with the Hebrew University of Jerusalem. Information about the study was provided through word-of-mouth. The study was submitted to the hospital’s Ethics Committee, but no informed consent was deemed necessary.


Participants who volunteered were invited to complete an anonymous questionnaire to identify background information, including demographics, medical status, years of practice and medical specialty. Next, they wrote ‘free-form’ answers in English and Hebrew while listening through loudspeakers to five lung sound files stored in a computer placed in a silent room. The sound files were taken from a set of audio files published previously.11 No sonograms or waveform analysis were provided to substantiate the nature of each sound. All participants received the same, standardised instructions to: (1) fill in the general information, anonymous form about demographics, medical status, specialty, years of practice with auscultation and so on, (2) listen to the audio samples numbered 1–5 and describe each sound in the appropriate column using their own words; (3) pay attention to the site of sound recording, indicated in a diagram and to the fact that all recordings started from an inspiration. The sound files were played once or, on request, twice.

Preference and auscultatory skill

This was determined by comparing the observers’ response with the true classification based on interpretation of the waveforms obtained by computer-based analysis taken as gold-standard (figure 1).11 A rating was considered correct if a recommended term or an accepted synonym was used to describe it (term use ascribed to preference). The use of any incorrect term was ascribed to lack of auscultatory skill.

Figure 1

The left column shows typical values for the frequency (Hz) and duration (ms) of the five lung sounds. The middle column lists the site of sound recording. The two rightward columns show, respectively, the amplitude-time plots in unexpanded and time-expanded modes (amplitude is measured in arbitrary units, and time in seconds). The unexpanded plots contain screenshots of 3 respiratory cycles, starting always by inspiration. The red, horizontal line below each sound shows the place where the time-expanded sections were obtained. The unexpanded waveform of the normal breath sound (sample #1) shows a strong inspiratory component relative to the expiratory component. The time-expanded waveform shows random fluctuations is similar to those of a white noise. The unexpanded waveform of the wheeze (sample #2) has a strong expiratory component, which appears as sinusoidal oscillations characteristic of musical sounds in the time-expanded waveform. In unexpanded waveform, the fine crackles (sample #3) appear as spikes that correspond with rapidly damped wave deflections seen in the time-expanded waveform. The unexpanded waveform of the stridor (sample #4) shows a strong inspiratory component that appears as sinusoidal oscillations in the time-expanded waveform. The pleural friction rub (sample #5) has an unexpanded waveform characterised by a series of vertical spikes in a pattern that is indistinguishable from that produced by crackles. The greater amplitude and longer duration of the pleural friction rub can be seen in the time-expanded plot.

Data analysis

Baseline characteristics are presented as mean (SD) and proportions. The differences between the three groups of observers in the proportion of correct answers were tested using the χ² test; a p<0.05 was considered significant. For each group, the terms used by correct raters were contrasted with those used by incorrect raters.

Patient and public involvement

Patients or the public were not involved in the design of the study

Licence for publication

The corresponding author has the right to grant on behalf of all authors and does grant on behalf of all authors, an exclusive licence (or non-exclusive for government employees) on a worldwide basis to the BMJ Publishing Group to permit this article to be published in BJO and any other BMJPGL products and sublicences such use and exploit all subsidiary rights, as set out in our licence (


Characteristics of participants

The mean (SD) age of staff physicians was 48.4 years (10.4); that of residents was 32.5 years (3.5) and that of students was 28.4 years (4.5). Seventeen staff physicians (54.8%) declared more than 20 year experience with auscultation, while 60 residents (92%) and 47 students (100%) declared <5 year experience.

Auscultatory skill

Figure 2 shows the ability of the three groups to classify the sound samples. Overall, the rates of correct identification were high for the wheeze (SP=82.8%; R=85.2%; MS=86.4%) and the stridor (SP=92.8%; R=90.0%; MS=72.1%), fair for the crackles (SP=63.0%; R=68.5%; MS=70.7%) and low for the normal breath sound (SP=21.4%; R=11.6%; S=17.1%) and the pleural friction rub (SP=35.7%; R=6.2%; MS=3.2%). No significant intergroup differences existed in the ability to identify the sounds, except for the better performance of SPs to identify the pleural friction rub (p=0.003).

Figure 2

The relative proportion of correct and incorrect answers was similar across the groups. The exception concerned the pleural friction rub, for which the proportion of correct answers was significantly greater among staff physicians than among the other groups. The proportion of correct answers was high for the wheeze and the stridor, fair for the crackles and low for the normal breath sound and the pleural friction rub.

Preference versus lack of skill

Altogether, the raters form the three groups used 66 terms to describe the 5 sound samples. Of these, 17 were correct (25.7%)—being therefore ascribed to preference—while 49 (74.3%) were incorrect, being ascribed to lack of skill. Table 1 lists the correct and incorrect terms for each sound sample.

Table 1

Terms used simultaneously by the three groups of observers in cases of correct and incorrect rating

Correct and incorrect terms by group

Sample sound #1 (Normal breath sound): Of 129 participants describing this sound sample, 20 (15.5%) correctly classified it as ‘normal breath sound’. They used the term ‘normal sound’ on 11 occasions (SP=3; R=4; MS=4) and ‘vesicular sound’ on 7 occasions (SP=3; R=2 and MS=2). Two, non-conventional terms, namely ‘good air entry’ and ‘alveolar sound’ were considered as conveying the idea of normalcy; they were used by one resident and one medical student, respectively. The 109 observers (84.5%) who failed to identify the sound used 10 terms as follows: crepitation, crackles, bronchial breathing, friction, rhonchi, rubbing, wheeze, rale, gurgling and pleural effusion.

Sound sample #2 (Wheezes): Of 134 participants, 114 (85.1%) correctly identified the wheezes of this sound sample; all—including 24 staff physicians, 52 residents and 38 medical students—used the term wheeze. The 20 observers (14.9%) who failed the identification used 10 terms, namely: stridor, rhonchi, crepitation, bronchial breathing, pleural friction rub, prolonged expiration, sighing, murmur, systolic murmur and the initials LLL.

Sound sample #3 (Fine crackles): Of 122 participants describing this sound sample, 82 (67.2%)—including 16 staff physicians, 37 residents and 28 medical students—correctly detected the crackles it contained; however, none qualified them as ‘fine’ or specified their location in the respiratory cycle. Correct raters used the term crepitation on 39 occasions, crackles and rales on 18 occasions each and crepitus on 4. Additionally, three participants used the qualifying term Velcro, while one used the term alveoli. The 40 raters (32.8%) who failed to identify the sound used 11 terms, as follows: bronchial breathing, normal sound, snoring, wheeze, rhonchi, bronchial breathing, friction rub, coarse, harsh, air entry and decreased air entry.

Sound sample # 4 (Stridor): Of 132 observers describing this sound sample, 112 (84.8%)—including 26 staff physicians, 55 residents and 31 medical students—correctly identified the stridor it contained. Of these, 110 (98.2%) used the term stridor, while 1 resident and 1 medical student used the term ‘croup’. The 20 observers (15.2%) who failed the identification used 7 terms: wheeze, cat meow, hoarseness, whistling, speech, bronchospasm and musical.

Sound sample #5 (Pleural friction rub): Of 108 observers participating, 14 (13%)—including 10 physicians, 3 residents and 1 medical student—correctly identified the pleural friction rub of the sample. They used the term friction rub on six occasions, pleural rub on five, pleural friction rub on two and pleural crackle on one. The 94 subjects (87%) who failed the sound identification used 11 terms as follows: crackle, crepitation, rale, rhonchi, wheeze, bronchial breathing, pericardial friction, consolidation, bronchial, pericardial rub and pulmonary ‘longesk’.

Three-group agreement on correct and incorrect terms

Table 1 shows that, on 107 of 140 possible occasions (76.4%)—corresponding to the total number of responses provided by the smallest group, that is, staff physicians—the three groups used identical terms to describe the sound samples. However, further analysis showed that on 70 such occasions (65.4%), the agreement concerned correct terms, while on 37 (34.6%), it involved incorrect terms. Among correct terms, the most used were stridor (on 26 occasions) and wheeze (on 24 occasions). Among incorrect terms, the most used were crepitation (on nine occasions), crackles (on six occasions) and rhonchi (on six occasions).


This study showed that the ability of staff physicians, residents and medical students to describe five common lung sounds was similar across the groups and varied with the nature of the sound: it was high for the wheezes and the stridor, fair for the crackles and poor for the normal breath sound and the pleural friction rub. Second, the study showed that, for a given sound, whatever the group, the number of terms used by incorrect raters was almost three times that used by correct raters. Third, the study found that the terms used in roughly one-third of instances of complete intergroup agreement were incorrect. Finally, it showed that correct raters used the recommended terminology rather parsimoniously.

The high observer ability to identify the wheezes and the stridor was not surprising. By their long duration (typically >80–100 ms) tonal quality, and relatively high amplitude12–15 adventitious musical sounds are easier to recognise by the human ear than non-musical sounds of shorter duration (typically 5–15 ms) and lower amplitude like, for instance, the crackles.16 17

Several factors could have hampered the observers’ auscultatory skill. First and foremost was the approach to auscultation. Use by some observers of curious terms such as ‘cat meow’, ‘whistling’, ‘gurgling’ and ‘good air entry’ is reminiscent of the onomatopoeic approach used by Laennec,1 although less ingeniously. More rewarding could have been the approach proposed by Forgacs,12 whereby the auscultated sounds are related with other clinical parameters (eg, body position, phases of the respiratory cycle, site of auscultation and so on). By adopting it, our observers might have noticed that sound energy of the normal breath sound was essentially inspiratory, thus avoiding its mistaken classification—made by 17 observers—as bronchial breathing, which is usually perceived in the expiratory phase as well.18 19 Similarly, the description of the crackles could have been more precise had the observers considered that fine crackles are typically heard on mid-to-late inspiration, while coarse crackles tend to predominate on early inspiration,11 20 a feature unnoticed by the 134 observers classifying that sample. Finally, the performance in the identification of the pleural friction rub could have been improved, had the raters noticed that this sound (i) was biphasic, with the inspiratory and expiratory sequence of sounds mirroring one another; (ii) was of much higher amplitude than the crackles and (iii) had been recorded over the axillary region, three characteristic features of the pleural friction rub.11 20 Another factor that might have hampered the accuracy to identify the sounds was the fact that audio listening deprives the observers of the clinical context. Indeed, the possibility to auscultate over different chest regions, during different manoeuvres (eg, deeper inspirations) could have helped in the identification of certain sounds, especially the normal breath sound. A third and final factor was the possibility of artefacts—eg, hissing sounds or ambient sounds commonly heard in the clinical setting (eg, ventilators and so on)—that could have caused confusion. In this respect, our sound files had been filtered previously for the purpose of another publication.11 Furthermore, had they been present, artefacts could not entirely explain the results: for instance, while they could have caused the mistaken classification of the normal breath sound as crackles, they could hardly explain the classification of the crackles as normal sound.

Use of incorrect terms by the three groups was consistent with data from the literature. In an earlier study, Wilkins and colleagues6 found that only 32% of pulmonary physicians (n=233) and 11% of physicians with other specialties (n=54) were able to identify the pleural friction rub of their sound sample #6. That study also found that 20% of the physicians surveyed failed to identify the normal breath sound of sample sound #7. However, the authors gave no information about the incorrect terms used. Most recently, Hafke-Dys and colleagues10 asked 185 observers, including physicians and medical students, to listen to audio recordings of 24 respiratory sounds and match their assessment with descriptions provided by a team of specialists. Consistent with our findings, they found that the percentage of correct rating was low, the best results being noticed for wheezes (40%) and the stridor (30%). Interestingly, their observers used practically the whole list of proposed terms to describe the sound samples. As a result, on many occasions, the sound recordings were matched with incorrect terms. To quote but a few examples, sound samples #3 and #6—labelled by the experts as ‘normal breath sound’—were matched by many observers with terms like ‘wheeze’, ‘rhonchi’ or ‘coarse crackles’; similarly, samples #2 and #17—labelled as ‘inspiratory wheezes’—were matched with several terms, including ‘normal breath sound’, ‘prolonged expiration’ and ‘crackles’.

The high rate (35%) of three-group agreement on incorrect terms has clinical relevance. In daily practice, especially in ward visits, a circumstance where sound recording and analysis are usually not readily available, auscultatory doubts are usually solved by consensus among two or more observers, especially senior physicians. In this setting, complete agreement on incorrect terms can carry detrimental consequences to the patients. For instance, had it occurred in real patients, our observers’ agreement on the occurrence of non-existent crackles and bronchial breathing—reported on 15 occasions (table 1)—could have motivated the ordering of unnecessary and costly examinations—including chest X-rays and CT scans—or treatments.

Use of recommended terms3 4 by correct raters was suboptimal. The best results involved the use of the terms ‘wheeze’ and ‘stridor’ by virtually 100% of raters describing sample sounds #2 and #4. In contrast, the recommended term ‘normal breath sound’ was used by only 55% of correct raters, 35% of them preferring the old, tenacious ‘vesicular sound’. In the same vein, the traditional term ‘pleural friction rub’ was used by only 2 of the 14 correct raters, the others preferring combinations of the terms ‘pleural’, ‘friction’ and ‘rub’ and, on one occasion, ‘crackle’. Finally, more problematic was the use of no fewer than six correct terms to describe the crackles. Illustrating the need for more effort to encourage the use of standardised terminology, the inappropriate term ‘crepitation’ was preferred by a majority of 50% of raters, while the recommended term ‘crackle’ and the traditional term ‘rale’ were each used by merely 20% of observers.

It may be argued that our definition of correct answer artificially decreased the number of terms used by correct raters. We do not believe this to be so. First, the only way to define a correct answer is by comparison with some sort of gold standard. Instead of expert opinion, preferred by others,8 we opted by a comparison with computer-based sound analysis.11 Second, in our view, the smaller number of terms used by correct raters resulted at least partly from the fact that terminology is not totally independent from auscultatory skill. Conceivably, since, by definition, correct raters are familiar with the sounds, it would be plausible to postulate they picked their descriptive terms from the recommended list likely known to them, which is fairly small;3 4 this explanation fits the high rates of correct use of the terms ‘wheezes’ and ‘stridor’ quoted above. Similarly, as they were likely unaware of recommendations, less skilled observers would tend to choose terms from a virtual, imaginary list. Incidentally, our (intentional) choice of spontaneous sound identification may have maximised this tendency. Finally, it may be argued that, because it was recruited from the same hospital, our group of observers was relatively homogeneous, weakening the results. However, to compensate for this factor, we took care to choose a group quite heterogeneous in terms of background and clinical experience, not to mention education in medical schools from different countries.

The current data have clinical relevance. They show that in a group of observers typical of a university-based hospital, lack of auscultatory skill surpassed observers’ preferences as a factor determining the choice of sound terminology. If generalisable, this finding would suggest that recommendations for terminology alone are unlikely to ensure the universal adoption of a standardised nomenclature, even one as simple as that of ‘crackles’ and ‘wheezes’ for discontinuous and continuous adventitious sounds respectively, proposed by Robertson and Coope more than 60 years ago.21 Instead, our data strongly suggest that, to improve their efficiency, recommendations for terminology should be coupled with measures aiming to improve education on lung auscultation. In practice, this could be achieved, for instance, via the development and wide dissemination of computer-assisted learning tools (CALTs) for use by educators and in self-directed learning. Ideally, these CALTs should address the basic concepts of normal and adventitious sounds, and contain real clinical cases enriched by sound recordings and graphic display of waveforms for analysis. Among the existing tools, the recently developed Computerized Lung Auscultation—Sound Software (CLASS)22 meets the required features and, apparently, has the potential to achieve this goal.


The authors thank Dr Steve Kraman for his encouraging comments and revision of the manuscript; Mr Yossi Freier-Dror for statistical revision and comments; the group of raters for their participation and Mrs Yael Bitan for her technical support.


View Abstract


  • Contributors Original idea/study design: AB. Data collection: HA, AJ, GK, YB. Data interpretation: AB, HA, GI. Grant application: GI. Drafting: AB. Critical review: All authors. All contributors meet the criteria for authorship.

  • Funding The study was supported by an unrestricted grant from GSK, Israel.

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Patient consent for publication Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement Data sharing not applicable as no datasets generated and/or analysed for this study. No data are available. All data relevant to the study are included in the article or uploaded as supplementary information. Data sharing is not applicable; no humans were included in the study.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.