Discussion
We report the performance of a reading panel applying the WHO chest radiograph interpretation methodology to 9723 images obtained from rural Bangladeshi children aged 3–35 months with possible community-acquired pneumonia. Our overall results demonstrate high imaging quality, effective application of the WHO methodology by the reading panel, and that WHO PEP is common among rural Bangladeshi children with signs and symptoms of acute respiratory illness.
In addition to raw percentage agreement, in this study we analysed interobserver and intraobserver agreement using two additional approaches, the Cohen’s kappa statistic and the prevalence and bias-adjusted kappa. In most cases, we observed differences between these two estimates. The Cohen’s kappa statistic estimates the percentage of agreement not attributable to chance.14 The magnitude of the Cohen’s kappa statistic can be influenced by two factors, the prevalence of the condition of interest (PEP) and observer bias (primary readers).13 If prevalence is low, then the probability that two readers will agree simply by chance is higher, and this lowers the kappa’s magnitude.15 While in our study the proportion of radiographs with PEP was frequent from an epidemiological standpoint, as we discuss below, prevalence was infrequent from the perspective of the kappa statistic. In statistics, bias is the systematic tendency to overestimate or underestimate the true prevalence.15 With respect to kappa, where the true prevalence is unknown, bias may be estimated as the difference in the overall prevalence estimated by two individuals whose ratings are being compared.13 15 If the difference in prevalence distribution between two readers is low, then bias is low.15 If the difference is high, then bias is high.15 Relative to having high bias, low bias lowers Cohen’s kappa.15 In our study, bias is likely negligible because our eight different primary readers were randomly assigned to the roles of the two primary readers, and the prevalence estimates for the two raters in the calculation are thus identical.
As name suggests, the prevalence and bias-adjusted kappa statistic is an adaption of the Cohen’s kappa statistic that accounts for the effects of both prevalence and bias.13 This adjusted kappa should not be interpreted in isolation, but instead by whether and how much it may differ in magnitude from the Cohen’s kappa.15 In this study, we frequently observed that the prevalence and bias-adjusted kappa was higher than the Cohen’s kappa. This difference in magnitude implies that effects from having a low prevalence—from a kappa perspective—and low reader bias are likely present. The prevalence and bias-adjusted kappa is also a tool that facilitates comparisons between studies with different underlying disease prevalence and/or reader tendencies.
Based on raw percentage agreement, Cohen’s kappa statistic, and the prevalence and bias-adjusted kappa, the performance of this study’s reading panel compares favourably with other studies applying the WHO methodology. In the Pneumonia Aetiology Research for Child Health (PERCH), a case–control child pneumonia aetiology study from seven low-income settings, primary readers agreed on the presence or absence of PEP in 77.8% of 3497 interpretable images (unadjusted Cohen’s kappa, 0.50; prevalence and bias-adjusted kappa, 0.56).11 In comparison, primary readers in our study achieved 79.0% interobserver agreement (unadjusted kappa, 0.35; adjusted kappa, 0.58). As a measure of interpretation reproducibility, PERCH readers achieved 91% intraobserver agreement (unadjusted kappa, 0.82) for the presence or absence of PEP.11 Comparatively, our readers achieved 85.1% intraobserver agreement (unadjusted kappa, 0.68). In a seminal study that reported the interpretation performance of a WHO working group against reference chest radiograph images interpreted according to WHO methodology, the working group achieved 84% sensitivity, 89% specificity, and an unadjusted Cohen’s kappa of 0.65 for clinicians and 0.73 for radiologists detecting PEP.5 The expert reader in our study is a member of the global WHO working group for Chest Radiography in Epidemiological Studies.7 When considering the expert reader interpretation as the reference standard, our panel achieved a sensitivity of 77%, specificity of 96% and an unadjusted Cohen’s kappa of 0.75. Taken together, these results support the notion that our panel effectively applied the WHO interpretation methodology.
Notably, we observed that the proportion of images classified as PEP increased by 28.1% after adding a second arbitrator to our interpretation schema, and this may have important implications on our PCV effectiveness analyses. Other studies have also demonstrated that the number of arbitrators used to interpret chest radiographs, as well as the types of images obtained, can influence final imaging conclusions. In PERCH, for example, a schema with two arbitrators, compared with one arbitrator, changed the final reading panel conclusion in 27.5% of 4172 images.11 Given these findings, we plan to use radiographical conclusions applying both methods, method 1 (one arbitrator) and method 2 (two arbitrators), in our case–control and incident trend PCV effectiveness analyses.
We found that PEP was common—from an epidemiological perspective—among our cohort of predominantly ambulatory Bangladeshi children aged 3–35 months with clinically suspected pneumonia, regardless of the methodology applied (16.6%, method 1; 21.3%, method 2). PERCH, a primarily hospital-based study, reported 27% of 3587 interpretable chest radiographs had PEP.16 PERCH also included two sites from Bangladesh with a mix of ambulatory and hospitalised children. The authors reported that about 19% of children from the urban site in Dhaka and 17% of children from the rural site in Matlab had PEP, consistent with our findings from Sylhet. A recent systematic review that included 15 studies from low-income to high-income settings outside of the USA reported a prevalence of 37% (95% CI, 26% to 50%) for radiographic pneumonia among children with an acute respiratory illness.17 The four studies including children <5 years old from Asia highlight the range of radiographic pneumonia prevalence within the region. The authors reported a prevalence of 63% from 541 Chinese children, 15% from 1782 Pakistani children, 29% from 199 Philippine children and 7% from 1396 Thai children. Our findings suggest that radiographic PEP among children aged 3–35 months in rural Bangladesh is an important condition.
Limitations of our study include the use of a single frontal chest radiograph rather than both frontal and lateral chest radiographs and the absence of serial imaging. Our approach was carefully considered and based on three principles. First, the WHO methodology does not recommend lateral chest radiographs or serial imaging and use of either would be inconsistent with WHO methodology. Second, evidence supporting the use of lateral imaging among children is mixed. One randomised clinical trial from 570 children seeking care at an emergency department in the USA found no improvement in the sensitivity and specificity of clinician-identified radiographic alveolar consolidation after the addition of a lateral chest radiograph, compared with a frontal chest radiograph alone.18 However, a non-randomised retrospective study based on radiologist interpretations found conflicting results.19 In this study, the authors reported that the addition of a lateral chest radiograph improved the identification of non-lobar consolidations by 15% compared with frontal imaging alone. Further high-quality research assessing the potential added value of lateral chest radiographs, particularly in low-income settings, is needed. Third, although serial chest radiographs may improve diagnostic sensitivity, it increases ionising radiation exposure and this study’s ethical review boards would not approve a protocol with serial imaging.
In sum, this analysis demonstrates that we have rigorously and effectively applied the WHO methodology for interpreting chest radiographs among a large cohort of children aged 3–35 months in rural Bangladesh. This study provides justification for use of these interpretations in planned case–control and incident trend PCV-effectiveness evaluations in addition to other planned epidemiological analyses.