Material and methods
This is a study of interobserver variation between medical doctors in classifying lung sounds from video recordings performed at respiratory units in Spain, Greece, Norway, the Netherlands and Canada during 2013. The recording equipment was standardised, using video cameras with external microphone input (Canon Legria HF series, Panasonic HC-X900) and quality electret microphones which were placed in the tubes of standard stethoscopes. More details have already been published.16 Each centre obtained ethics approval at their institution and prepared consent forms in their required formats.
The quality of 80 video recordings were rated by five of the six task force members, leaving out the member who had done the actual recordings, and they recommended whether or not to include the recording on the repository. Twenty recordings, 10 of children and 10 of adults, were judged to be of sufficient quality. The recordings, each of approximately 15 s duration, were classified independently by the six task force members on an online questionnaire (FluidSurveys.com, Ottawa, Canada). Five of the task force members were paediatricians and one a general practitioner (GP). Their age ranged from 52 to 64 years. Subsequently, the 20 recordings were classified by a convenient sample of six additional physicians who used the same online questionnaire. Three were internationally recognised lung sound researchers aged 59–71 years, among whom one was a paediatrician, and three Norwegian GPs aged 38–49 years. The latter had postgraduate clinical experience of 6 years or more, but no exposure to lung sound research. The observers downloaded the video files to their computers and were asked to classify the recordings according to recommended English language nomenclature (fine and coarse crackles, high-pitched and low-pitched wheezes).8 We also included the category of rhonchi, although it is used interchangeably with low-pitched wheezes.10 Identification by respiratory phase thus offered 10 non-exclusive choices. The questionnaire also offered free-text options to describe adventitious sounds by other terms.
All observers reported normal hearing capacity, except for one 71-year-old expert who reported some trouble with speech perception at higher frequencies, but not with lung sounds that were typically below 1000 Hz. No instructions were given regarding the volume setting for audio playback.
Statistical analysis
Agreement among the majority of observers (seven or more) on each of the 10 terms in each of the 20 cases was identified. To further evaluate this interobserver agreement, Fleiss' multirater kappa (κ) with 95% CIs was calculated. The κ values were interpreted as follows: 0–0.20 slight, 0.21–0.40 fair, 0.41–0.60 moderate, 0.61–0.80 substantial, and 0.80–1.0 almost perfect agreement.17 After calculating agreement on the 10 predefined categories, multirater κ was calculated after combining fine and coarse crackles in a common category, rhonchi with high-pitched and low-pitched wheezes, and finally the respective inspiratory and expiratory sounds into simply crackles and wheezes. Agreement in reporting other sounds than the 10 fixed options were elevated for sounds reported by seven or more observers to be present in the same case. We also calculated Fleiss' κ among the paediatricians and the observers familiar with examining adults in subsamples of children and adult patients. SPSS V.22 and R statistical package were used in the analyses.