Article Text

Download PDFPDF

Expert artificial intelligence-based natural language processing characterises childhood asthma
  1. Hee Yun Seol1,2,
  2. Mary C Rolfes3,
  3. Wi Chung1,
  4. Sunghwan Sohn4,
  5. Euijung Ryu5,
  6. Miguel A Park6,
  7. Hirohito Kita6,
  8. Junya Ono7,
  9. Ivana Croghan8,
  10. Sebastian M Armasu5,
  11. Jose A Castro-Rodriguez9,
  12. Jill D Weston1,
  13. Hongfang Liu4 and
  14. Young Juhn1
  1. 1Community Pediatrics and Adolescent Medicine, Mayo Clinic, Rochester, Minnesota, USA
  2. 2Pusan National University Yangsan Hospital, Yangsan, Republic of Korea
  3. 3Mayo Clinic Alix School of Medicine, Rocheser, Minnesota, USA
  4. 4Digital Health Sciences, Mayo Clinic, Rochester, MN, United States
  5. 5Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, United States
  6. 6Allergic Diseases, Mayo Clinic, Rochester, MN, United States
  7. 7Research and Development Unit, Shino-Test Corporation, Sagamihara, Japan
  8. 8Department of Medicine, Mayo Clinic, Rochester, MN, United States
  9. 9School of Medicine, Pontificia Universidad Católica de Chile, Santiago, Chile
  1. Correspondence to Dr Young Juhn; juhn.young{at}


Introduction The lack of effective, consistent, reproducible and efficient asthma ascertainment methods results in inconsistent asthma cohorts and study results for clinical trials or other studies. We aimed to assess whether application of expert artificial intelligence (AI)-based natural language processing (NLP) algorithms for two existing asthma criteria to electronic health records of a paediatric population systematically identifies childhood asthma and its subgroups with distinctive characteristics.

Methods Using the 1997–2007 Olmsted County Birth Cohort, we applied validated NLP algorithms for Predetermined Asthma Criteria (NLP-PAC) as well as Asthma Predictive Index (NLP-API). We categorised subjects into four groups (both criteria positive (NLP-PAC+/NLP-API+); PAC positive only (NLP-PAC+ only); API positive only (NLP-API+ only); and both criteria negative (NLP-PAC/NLP-API)) and characterised them. Results were replicated in unsupervised cluster analysis for asthmatics and a random sample of 300 children using laboratory and pulmonary function tests (PFTs).

Results Of the 8196 subjects (51% male, 80% white), we identified 1614 (20%), NLP-PAC+/NLP-API+; 954 (12%), NLP-PAC+ only; 105 (1%), NLP-API+ only; and 5523 (67%), NLP-PAC/NLP-API. Asthmatic children classified as NLP-PAC+/NLP-API+ showed earlier onset asthma, more Th2-high profile, poorer lung function, higher asthma exacerbation and higher risk of asthma-associated comorbidities compared with other groups. These results were consistent with those based on unsupervised cluster analysis and lab and PFT data of a random sample of study subjects.

Conclusion Expert AI-based NLP algorithms for two asthma criteria systematically identify childhood asthma with distinctive characteristics. This approach may improve precision, reproducibility, consistency and efficiency of large-scale clinical studies for asthma and enable population management.

  • asthma
  • asthma epidemiology
  • paediatric asthma

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


  • Contributors Study concept and design: HL, WC, SS, ER, MAP, HK, IC and YJ; acquisition, analysis or interpretation of data: HYS, MCR, HL, WC, SS, ER, JO, SMA, JDW and YJ; drafting of the manuscript: HYS, MCR, WC, HL and YJ; critical revision of the manuscript for important intellectual content: HYS, MCR, HL, WC, SS, ER, MAP, HK, JO, IC, SMA, JAC-R, JDW and YJ; statistical analysis: WC, ER and SMA; study supervision: HL, WC, SS, ER, MAP, HK and YJ.

  • Funding National Institute of Health (NIH)-funded R01 grant (R01 HL126667) and R21 grant (R21AI116839-01 and R21AI142702), and T. Denny Sanford Pediatric Collaborative Research Fund. The resources of the Rochester Epidemiology Project (R01-AG34676) from the National Institute on Aging and CTSA Grant Number UL1 TR000135 from the National Center for Advancing Translational Sciences.

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Ethics approval The study protocol was approved by the Institutional Review Board (IRB) at Mayo Clinic (14-009934).

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement No data are available. The datasets generated and/or analysed during the current study are not publicly available as they include protected health information. Access to data could be discussed per the institutional policy after the IRB at Mayo Clinic approves it.