Listening panel agreement and characteristics of lung sounds digitally recorded from children aged 1–59 months enrolled in the Pneumonia Etiology Research for Child Health (PERCH) case–control study

Introduction Paediatric lung sound recordings can be systematically assessed, but methodological feasibility and validity is unknown, especially from developing countries. We examined the performance of acoustically interpreting recorded paediatric lung sounds and compared sound characteristics between cases and controls. Methods Pneumonia Etiology Research for Child Health staff in six African and Asian sites recorded lung sounds with a digital stethoscope in cases and controls. Cases aged 1–59 months had WHO severe or very severe pneumonia; age-matched community controls did not. A listening panel assigned examination results of normal, crackle, wheeze, crackle and wheeze or uninterpretable, with adjudication of discordant interpretations. Classifications were recategorised into any crackle, any wheeze or abnormal (any crackle or wheeze) and primary listener agreement (first two listeners) was analysed among interpretable examinations using the prevalence-adjusted, bias-adjusted kappa (PABAK). We examined predictors of disagreement with logistic regression and compared case and control lung sounds with descriptive statistics. Results Primary listeners considered 89.5% of 792 case and 92.4% of 301 control recordings interpretable. Among interpretable recordings, listeners agreed on the presence or absence of any abnormality in 74.9% (PABAK 0.50) of cases and 69.8% (PABAK 0.40) of controls, presence/absence of crackles in 70.6% (PABAK 0.41) of cases and 82.4% (PABAK 0.65) of controls and presence/absence of wheeze in 72.6% (PABAK 0.45) of cases and 73.8% (PABAK 0.48) of controls. Controls, tachypnoea, >3 uninterpretable chest positions, crying, upper airway noises and study site predicted listener disagreement. Among all interpretable examinations, 38.0% of cases and 84.9% of controls were normal (p<0.0001); wheezing was the most common sound (49.9%) in cases. Conclusions Listening panel and case–control data suggests our methodology is feasible, likely valid and that small airway inflammation is common in WHO pneumonia. Digital auscultation may be an important future pneumonia diagnostic in developing countries.

Introduction Paediatric lung sound recordings can be systematically assessed, but methodological feasibility and validity is unknown, especially from developing countries. We examined the performance of acoustically interpreting recorded paediatric lung sounds and compared sound characteristics between cases and controls. Methods Pneumonia Etiology Research for Child Health staff in six African and Asian sites recorded lung sounds with a digital stethoscope in cases and controls. Cases aged 1-59 months had WHO severe or very severe pneumonia; age-matched community controls did not. A listening panel assigned examination results of normal, crackle, wheeze, crackle and wheeze or uninterpretable, with adjudication of discordant interpretations. Classifications were recategorised into any crackle, any wheeze or abnormal (any crackle or wheeze) and primary listener agreement (first two listeners) was analysed among interpretable examinations using the prevalence-adjusted, bias-adjusted kappa (PABAK). We examined predictors of disagreement with logistic regression and compared case and control lung sounds with descriptive statistics. results Primary listeners considered 89.5% of 792 case and 92.4% of 301 control recordings interpretable. Among interpretable recordings, listeners agreed on the presence or absence of any abnormality in 74.9% (PABAK 0.50) of cases and 69.8% (PABAK 0.40) of controls, presence/absence of crackles in 70.6% (PABAK 0.41) of cases and 82.4% (PABAK 0.65) of controls and presence/ absence of wheeze in 72.6% (PABAK 0.45) of cases and 73.8% (PABAK 0.48) of controls. Controls, tachypnoea, >3 uninterpretable chest positions, crying, upper airway noises and study site predicted listener disagreement. Among all interpretable examinations, 38.0% of cases and 84.9% of controls were normal (p<0.0001); wheezing was the most common sound (49.9%) in cases.
conclusions Listening panel and case-control data suggests our methodology is feasible, likely valid and that small airway inflammation is common in WHO pneumonia. Digital auscultation may be an important future pneumonia diagnostic in developing countries.

IntroductIon
Paediatric pneumonia is a major cause of global mortality. 1 The WHO case management algorithm for childhood pneumonia was developed to be practical and diagnostically sensitive, so frontline health workers in low-resource settings could empirically treat possible bacterial pneumonia using basic skills. 2 While the algorithm's high sensitivity has increased antibiotic use in children previously untreated for bacterial pneumonia, 3 it comes at the expense of misdiagnosis and likely antibiotic overuse. [4][5][6][7] During pneumonia, a complex inflammatory cascade causes the lung's gas exchange units, alveoli, to collapse. 8 When a child with pneumonia inhales, alveoli can explosively reopen, causing popping sounds called crackles. 9 When crackles are present, the likelihood of pneumonia increases. 10 11 Although a traditional stethoscope is inexpensive and can be used to identify crackles, interpretations are plagued by subjectivity between listeners evaluating the same lung sound and also from breath-to-breath inspiratory and expiratory variations in children Open Access that can change the character of lung sounds during and between examinations. 12 13 Presumably due to these factors, and the challenge of teaching auscultation, the WHO did not incorporate auscultation as a diagnostic into its pneumonia management algorithm for frontline practitioners. 2 Technological advances may help to overcome interpretation inconsistencies by producing high-quality, permanent lung recordings that can be systematically interpreted by humans or computers. 14 15 Modern digital stethoscope designs allow sounds to be transduced with high fidelity and recorded and saved as an audio file. 16 Amplification and filtering techniques can optimise sound quality for acoustic human interpretation and, if computerised acoustic analysis techniques are also applied, visual interpretation. 17 Moreover, mathematical methods can now deconstruct sound data into quantitative patterns for computer analysis, bypassing human interpretation altogether. 18 Automated interpretation of lung sounds with a handheld device could be especially powerful in low-resource settings that lack trained paediatric healthcare providers. Digital auscultation research to date has largely focused on adults in high-income settings 18 and that research is not likely relevant to the application of digital auscultation to children in low-income countries. If digital auscultation is to have a role in low-resource countries, relevant paediatric research is needed.
We recorded lung sounds with digital stethoscopes from a subset of children aged 1-59 months in six African and Asian sites in the Pneumonia Etiology Research for Child Health (PERCH) study, 19 aiming to determine the feasibility of recording quality lung sounds from children in noisy settings, to develop and assess a method for adjudicating lung sound examinations acoustically interpreted by humans, identify predictors of listener disagreement to inform future research methodology in developing countries and, lastly, describe and compare digitally recorded lung sound characteristics among cases and controls in PERCH.

PAtIents And Methods
Perch enrolment PERCH was a 2 year case-control study of severe childhood pneumonia aetiology in seven countries in Africa and Asia. 19 Eligible cases were hospitalised children aged 1-59 months with WHO-defined severe or very severe pneumonia (panel). 20 Wheezing cases whose chest indrawing resolved after bronchodilators were excluded. 20 Randomly selected, age-matched children were enrolled as community controls if they did not meet the case definition, even if they had respiratory symptoms (panel). 21 Staff were trained on clinical measurements and specimen collection (table 1). 20 digital auscultation enrolment This substudy was prospectively conducted during a 14-month period (December 2012-January 2014), at six sites in a subset of PERCH cases and community controls; sampling varied by site, as described below.
bangladesh Study physicians enrolled all PERCH cases in Dhaka and Matlab and about five controls per month from September to December 2013. 19 the Gambia Study physicians enrolled a subset of PERCH cases and controls in Basse, time permitting, from December 2012 to October 2013. 19 Kenya Nurses and clinical officers enrolled all cases, time permitting and controls when digital auscultation-trained staff conducted field visits in Kilifi from December 2012 to November 2013. 19 south Africa Nurses enrolled cases not requiring mechanical ventilation, except on weekends, and controls were enrolled non-systematically according to staff workload in Soweto between December 2012 and August 2013. 19 thailand Nurses enrolled all cases and controls in Sa Kaeo and Nakhon Phanom between March 2013 and January 2014. 19 Zambia A physician or clinical officer enrolled all cases and the first five controls per month in Lusaka between November 2012 and October 2013. 19 sound recording procedure We trained staff at the African sites to record lung sounds with a digital stethoscope (ThinkLabs ds32a®) from May to June 2012, followed by pilot data collection through October 2012. Thai and Bangladeshi staff were trained in January and June of 2013, and piloted procedures for 1 month each. Trainings were 1 day and introduced the equipment, recording and uploading procedures, troubleshooting and supervised practice. Staff recorded one lung examination per child that included lung sounds from nine sequential locations across each child's back (four), axilla (two), chest (two) and cheek, corresponding to all lung lobes and the upper airway (figure 1). Sounds were recorded for >7 s to capture at least two respiratory cycles per location and to limit the entire procedure to about 1 min. An external microphone affixed to the stethoscope recorded ambient noise. Recordings were deidentified and then uploaded from the sound recorder to study servers. Unwanted ambient noises were removed Open Access using a novel automated multiband denoising filter developed and validated by Johns Hopkins University sound engineers (ME, DE and JEW) and physicians. 22 expert listening panel and lung sound definitions A listening panel of six paediatricians (WCB, TGP, LG, DO, WPV and CV) and two pediatric-experienced physicians (AD and JM), all highly practiced caring for African and/or Asian children with pneumonia, convened in June 2014 to formulate and refine consensus lung sound definitions for acoustic interpretation (panel). The listening panel consolidated the sound definitions in the American Thoracic Society guidelines in order to make them more pragmatic and clinically applicable for children in low-resource settings with WHO pneumonia. 17 23 In July 2014, the panel was standardised to interpret lung sounds using a library of >100 reference digital lung sounds. Reference recordings were from children aged 1-59 months at Johns Hopkins Hospital in Baltimore, USA, collected immediately after a paediatric pulmonologist (EDM) confirmed the lung examination with a traditional stethoscope, and processed with the same denoising software used for cases and controls in PERCH. expert listening panel lung examination results and adjudication process After denoising, patient digital lung sound examinations were randomly assigned to two panellists (ie, the primary listeners) for acoustic interpretation with Audacity software. Listeners were masked to one another and patient information including case and control status. The cheek position was used to assess whether chest sounds were contaminated with vocalisations or upper airway noises like nasal secretions or stridor. For each patient, all eight chest position interpretations constituted one lung examination result: normal, crackle, wheeze, crackle and wheeze or uninterpretable. For example, if any chest position had a crackle, wheeze or both, then the overall lung  Open Access examination result included that abnormal designation, even if other positions were normal or uninterpretable. A single chest position was uninterpretable if no full breath sounds could be distinguished. Overall, lung examination results were uninterpretable if none of the eight chest positions were interpretable by the listener. If the two primary listeners disagreed on the lung examination result, then a third panellist blinded to previous assessments was randomly selected to independently interpret the lung examination. If the third listener's lung examination result agreed with either of the primary listeners, then the third listener's interpretation was considered final. If not, then one panellist (DO) and a paediatric pulmonologist (EDM) decided the final result by consensus. Five per cent of case lung examinations were randomly reassigned to the same primary listener to estimate intralistener agreement at least 3 months after the initial interpretation.
Institutional review boards at the Johns Hopkins School of Public Health and all local study sites approved this research. All participants provided written informed consent.

statistical analysis
We evaluated agreement between and within primary listeners after grouping final panel results into dichotomous categories positive or negative for a specific lung sound (ie, any crackle, any wheeze or abnormal (any crackle or wheeze)) and including all lung examinations classified by both primary listeners as interpretable. Agreement was measured using Cohen's kappa statistic and a kappa statistic adjusted for prevalence and listener bias (prevalence-adjusted, bias-adjusted kappa (PABAK)). 24 Agreement strength was interpreted by the scale: ≤0, poor; 0.01-0.19, slight; 0.20-0.39, fair; 0.40-0.59, moderate; 0.60-0.79, substantial; and 0.80-1.0, perfect. 24 To characterise predictors of between-listener disagreement, we used logistic regression models to evaluate associations of demographic and sound file characteristics with primary listener disagreement. Full models retained characteristics associated with disagreement in unadjusted models at the significance level of 0.20. Case and control lung sound recordings were compared using descriptive statistics.

results
A total of 1093 patients (792 cases and 301 controls) had their lung examinations recorded, denoised and evaluated by the listening panel. To evaluate the panel's overall performance when acoustically interpreting digitally recorded lung examinations, we assessed agreement at the primary listener level for cases. For this, we excluded 83 cases with lung examinations considered uninterpretable by either primary listener, even if that examination was later arbitrated to be interpretable by the panel, leaving 709 cases (table 2) We found primary listener agreement, beyond that expected by chance, to be moderate for examinations with or without either crackle and/or wheeze (PABAK 0.50), with or without any crackle (PABAK 0.41) and with or without any wheeze (PABAK 0.45). We also evaluated the interlistener agreement by individual primary listener (table 2). Estimated agreement between the eight primary expert listeners ranged from fair to substantial; PABAKs 0.40-0.65 for examinations with or without either crackle and/or wheeze, 0.25-0.55 for examinations with or without crackle and 0.31-0.61 for examinations with or without wheeze. While overall intralistener agreement was substantial (PABAK 0.62-0.68), reflecting the reproducibility of examination interpretations, there was modest variability from panellist to panellist, irrespective of the examination result analysed. The panel's overall between listener PABAKs for control interpretations were in the moderate to substantial range (0.40-0.65), depending on the lung sound model (see online supplementary table  1).
We also sought to identify predictors of primary listener disagreement in 987 cases and controls, after omitting 83 cases and 23 controls with uninterpretable examinations by at least one primary listener (

dIscussIon
We leveraged the largest paediatric pneumonia aetiology study in nearly 30 years to collect digitally recorded lung sounds from 1093 children with and without pneumonia in six high pneumonia burden African and Asian countries. Our data demonstrate that recording quality digital lung sounds from children in a range of noisy, crowded clinical settings in low-resource countries is feasible. This study also showed that with a panel of standardised paediatric experts, adjudication procedures modelled after the WHO chest radiograph process, 25 and lung sound definitions pragmatically adapted from American Thoracic Society guidelines, 17 23 paediatric digital lung sound examinations can be interpreted acoustically by humans with moderate reliability; case-control comparisons suggest this methodology is valid. The reliability achieved by this study's expert listening panel compares favourably to paediatric literature from resource-rich settings. Two small studies from the USA and UK examined agreement levels between paediatricians acoustically identifying abnormal lung examinations in children with standard stethoscopes. These authors reported kappas of 0.18-0.70 for wheeze, 0.46 for crackle and 0.30 for all abnormal sounds. 8 26 Our panel achieved moderate agreement for these lung sound categories (kappas of 0.40-0.45, PABAKs of 0.41-0.50). In addition to using a traditional stethoscope, these studies differed from ours by using unstandardised listeners who were not blinded to patients and had smaller sample sizes with wider CIs. A recent study of 120 German infants included digital auscultation and examined agreement levels for expiratory wheezing between three blinded physicians. 27 The authors reported moderate agreement (Fliess' kappa 0.54 (95% CI 0.52 to 0.57)) from recordings collected in a quiet clinical setting. Our listener agreement for wheezing was also moderate (kappa 0.45, PABAK 0.45), despite recording in noisy clinical environments. After comparison with the published literature, our results suggest that this study's methodology, which includes the application of a novel software program that filters ambient noises from lung recordings, 22 and interpretation procedures may improve the overall consistency of between listener reliability, compared with traditional auscultation, and may perform with comparable reliability to lung sounds recorded by digital stethoscopes in quieter environments. Our study had several unanticipated results. Our panellists found a surprisingly high proportion of controls without ARI (14.2%) to have abnormal digital lung sound examinations. Two factors may explain these findings. First, lung sound recordings from sensitive digital stethoscopes may capture more subclinical abnormalities in healthy subjects compared with acoustic stethoscopes. Subclinical wheezing and crackles, especially in asymptomatic children in developing countries, may reflect ongoing small airway inflammation triggered by asthma or poor air quality, 28 for example, or pre-existing lung damage or resolving inflammation from a prior illness like pneumonia. 29 Non-pathological crackles are also possible and have been reported to frequently occur during inspiration in healthy adults after deeply exhaling to the lung's residual volume. 30 Interestingly, the authors of a recent systematic review also found wheezing in 1%-5% and crackles in 7%-37% of healthy, asymptomatic adults from studies in the USA, but noted a gap in paediatric data, as there were no published studies that included healthy children. 31 Second, our expert listeners were blinded to the case-control status and also the visual cues that exist during a live patient encounter, and this may have increased false positive examinations; by looking at and listening to the patient at the same time, visual cues can help a clinician distinguish between wheezes, upper airway sounds such as cries, normal vocalisations, or transmitted nasal congestion (ie, the clinician can see the child crying, vocalising or rhinorrhea), or crackles and movement artefact (ie, the clinician can see the child moving), all of which can overlap in their amplitude and frequency profiles. 32 Notably, the majority of control lung sounds were recorded from Thailand (58.8% (167/284)), as this was the only site to enrol all controls into the digital auscultation substudy. While it is possible that the over-representation of controls from Thailand may have further reduced the proportion of abnormal sounds heard among controls (91.0% of lung recordings from Thailand controls were normal), Thailand controls, since they were enrolled consecutively, Open Access were less susceptible to selection bias than other PERCH sites. For this reason, we feel the Thailand controls, and therefore the overall control population in this dataset, are more likely to be representative of the true control population. While PERCH cases all met severe or very severe WHO clinical pneumonia criteria, 38.0% had normal digital lung sound examinations, suggesting an absence of lower respiratory disease. By design, the WHO algorithm for the management of severe and very severe pneumonia is non-specific for pneumonia and as expected we found that a significant fraction of the children meeting these definitions do not have auscultatory findings of lower respiratory tract disease. [4][5][6][7] This observation is further reflected by over a third of all PERCH cases having a normal chest radiograph. 33 However, selection bias may also have influenced lung sound findings; The Gambia and South Africa sampled cases by convenience, and Thailand, Bangladesh, The Gambia and South Africa enrolled cases into the digital auscultation substudy for less than 12 months, increasing the likelihood of seasonal variation in their data. Our data did reveal, as expected, that cases had a markedly higher proportion of abnormal lung examinations compared with all controls. Wheezing was decidedly prevalent among cases, regardless of PERCH site, which may imply that small airway inflammation due to viruses and asthma are common in WHO pneumonia. While crackles were also common, they were more frequently heard with wheezes than alone, a pattern characteristic of viral bronchiolitis. 34 Importantly, while this study was not designed to assess the validity of digital lung examinations compared with a reference standard, our finding that more than twice as many controls, compared with cases, had normal lung sounds suggests that the methodology we developed is valid. Forthcoming analyses will explore the associations between digital lung examinations and etiological and radiographic disease endpoints to provide a better understanding of how digital auscultation may perform as a respiratory diagnostic in low-resource settings. Additionally, we plan to analyse how interpretations of digital lung recordings compare with interpretations of lung sounds using traditional acoustic stethoscopes. If we find interpretations between these two listening modalities to be comparable, this would support lung recordings as a possible educational tool for healthcare providers who have limited training in lung auscultation but use traditional acoustic stethoscopes to examine children.
Our multivariate models suggest areas of procedural or device limitations that are specific to children, and if strengthened, could improve paediatric lung sound recording quality and listener agreement. For instance, our data suggest that innovations to filter out upper airway noises like vocalisations or nasal secretions are needed. Devices or procedures that are more 'child friendly' may also help practitioners with less paediatric experience, soothe the child and collect sounds not contaminated by crying or movements. While not identified as a predictor of disagreement, panellists requested cues to help listeners better identify the respiratory phase as either inspiration or expiration. Other studies have used respiratory belts for this purpose, and this could be explored for feasibility and effectiveness in low-resource settings. 27 In conclusion, this multicountry study provides evidence that quality lung sounds can be recorded from children in noisy clinical environments and interpreted by paediatric experts with moderate reliability. With further research and refinement of digital auscultation technology, recorded lung sounds may eventually play a greater role in the identification, categorisation and management of children with respiratory illness in developing countries.
Open Access auscultation study, to the PERCH staff for collecting recordings and to the PERCH study group. The findings and conclusions in this report are those of the authors and do not necessarily represent the views of the Centers for Disease Control and Prevention. We acknowledge Stavros Papadopoulos for his assistance processing and segmenting files, Drs Niranjan Bhat, William Checkley, Carolyn Scrafford, Luke Mullany, Joanne Katz, Laura Ellington and Jim Tielsch for significant contributions to the design and development of the project and to Ms Azwi Mudau for conducting data collection activities in Soweto.
contributors EDM conceptualised and designed the study, provided overall supervision of data collection, supervised the expert listening panel, collected lung sound reference sounds, drafted the initial manuscript, reviewed and revised the manuscript and approved the final manuscript as submitted. DEP conceptualised and designed the study, reviewed and revised the manuscript and approved the final manuscript as submitted. NLW conceptualised and designed the study, carried out the analyses, reviewed and revised the manuscript and approved the final manuscript as submitted. WCB interpreted lung sounds, reviewed and revised the manuscript and approved the final manuscript as submitted. CB supervised data collection at one site, reviewed and revised the manuscript and approved the final manuscript as submitted. AD interpreted lung sounds, reviewed and revised the manuscript and approved the final manuscript as submitted. BEE supervised data collection at one site, reviewed and revised the manuscript and approved the final manuscript as submitted. ME supervised denoising of the lung sounds, reviewed and revised the manuscript and approved the final manuscript as submitted. DE denoised the lung sounds, reviewed and revised the manuscript and approved the final manuscript as submitted. AJG-P interpreted lung sounds, reviewed and revised the manuscript and approved the final manuscript as submitted.
LG interpreted lung sounds, reviewed and revised the manuscript and approved the final manuscript as submitted. LH supervised data collection at one site, reviewed and revised the manuscript and approved the final manuscript as submitted. SAM supervised data collection at one site, reviewed and revised the manuscript and approved the final manuscript as submitted. DPM supervised data collection at one site, reviewed and revised the manuscript and approved the final manuscript as submitted. JM collected lung sounds at one site, interpreted lung sounds, reviewed and revised the manuscript and approved the final manuscript as submitted. DO interpreted lung sounds, reviewed and revised the manuscript and approved the final manuscript as submitted. JOA supervised data collection at one site, reviewed and revised the manuscript and approved the final manuscript as submitted. WPV interpreted lung sounds, reviewed and revised the manuscript and approved the final manuscript as submitted. CV interpreted lung sounds, reviewed and revised the manuscript and approved the final manuscript as submitted. JEW supervised denoising of the lung sounds, reviewed and revised the manuscript and approved the final manuscript as submitted. MDK supervised the overall PERCH study and digital auscultation study, reviewed and revised the manuscript and approved the final manuscript as submitted. KLO supervised the overall PERCH study and digital auscultation study, reviewed and revised the manuscript and approved the final manuscript as submitted. DRF supervised the overall PERCH study and digital auscultation study, conceptualised and designed the digital auscultation study reviewed and revised the manuscript and approved the final manuscript as submitted. LLH supervised the overall PERCH study and digital auscultation study, conceptualised and designed the digital auscultation study reviewed and revised the manuscript and approved the final manuscript as submitted.