Estimating prevalence of rare genetic disease diagnoses using electronic health records in a children's hospital.

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • Additional Information
    • Source:
      Publisher: Elsevier Inc Country of Publication: United States NLM ID: 101772885 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 2666-2477 (Electronic) Linking ISSN: 26662477 NLM ISO Abbreviation: HGG Adv Subsets: MEDLINE
    • Publication Information:
      Original Publication: New York : Elsevier Inc., [2020]-
    • Subject Terms:
    • Abstract:
      Rare genetic diseases (RGDs) affect a significant number of individuals, particularly in pediatric populations. This study investigates the efficacy of identifying RGD diagnoses through electronic health records (EHRs) and natural language processing (NLP) tools, and analyzes the prevalence of identified RGDs for potential underdiagnosis at Cincinnati Children's Hospital Medical Center (CCHMC). EHR data from 659,139 pediatric patients at CCHMC were utilized. Diagnoses corresponding to RGDs in Orphanet were identified using rule-based and machine learning-based NLP methods. Manual evaluation assessed the precision of the NLP strategies, with 100 diagnosis descriptions reviewed for each method. The rule-based method achieved a precision of 97.5% (95% CI: 91.5%, 99.4%), while the machine-learning-based method had a precision of 73.5% (95% CI: 63.6%, 81.6%). A manual chart review of 70 randomly selected patients with RGD diagnoses confirmed the diagnoses in 90.3% (95% CI: 82.0%, 95.2%) of cases. A total of 37,326 pediatric patients were identified with 977 RGD diagnoses based on the rule-based method, resulting in a prevalence of 5.66% in this population. While a majority of the disorders showed a higher prevalence at CCHMC compared with Orphanet, some diseases, such as 1p36 deletion syndrome, indicated potential underdiagnosis. Analyses further uncovered disparities in RGD prevalence and age of diagnosis across gender and racial groups. This study demonstrates the utility of employing EHR data with NLP tools to systematically investigate RGD diagnoses in large cohorts. The identified disparities underscore the need for enhanced approaches to guarantee timely and accurate diagnosis and management of pediatric RGDs.
      Competing Interests: Declaration of interests The authors declare no competing interests.
      (Copyright © 2024 The Authors. Published by Elsevier Inc. All rights reserved.)
    • Contributed Indexing:
      Keywords: Orphanet; bioinformatics; electronic health record; genetic testing; natural language processing; rare genetic diseases
    • Publication Date:
      Date Created: 20240816 Date Completed: 20241011 Latest Revision: 20241011
    • Publication Date:
      20241012
    • Accession Number:
      PMC11401171
    • Accession Number:
      10.1016/j.xhgg.2024.100341
    • Accession Number:
      39148290