A systematic review of data mining and machine learning for air pollution epidemiology.

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • Additional Information
    • Source:
      Publisher: BioMed Central Country of Publication: England NLM ID: 100968562 Publication Model: Electronic Cited Medium: Internet ISSN: 1471-2458 (Electronic) Linking ISSN: 14712458 NLM ISO Abbreviation: BMC Public Health Subsets: MEDLINE
    • Publication Information:
      Original Publication: London : BioMed Central, [2001-
    • Subject Terms:
    • Abstract:
      Background: Data measuring airborne pollutants, public health and environmental factors are increasingly being stored and merged. These big datasets offer great potential, but also challenge traditional epidemiological methods. This has motivated the exploration of alternative methods to make predictions, find patterns and extract information. To this end, data mining and machine learning algorithms are increasingly being applied to air pollution epidemiology.
      Methods: We conducted a systematic literature review on the application of data mining and machine learning methods in air pollution epidemiology. We carried out our search process in PubMed, the MEDLINE database and Google Scholar. Research articles applying data mining and machine learning methods to air pollution epidemiology were queried and reviewed.
      Results: Our search queries resulted in 400 research articles. Our fine-grained analysis employed our inclusion/exclusion criteria to reduce the results to 47 articles, which we separate into three primary areas of interest: 1) source apportionment; 2) forecasting/prediction of air pollution/quality or exposure; and 3) generating hypotheses. Early applications had a preference for artificial neural networks. In more recent work, decision trees, support vector machines, k-means clustering and the APRIORI algorithm have been widely applied. Our survey shows that the majority of the research has been conducted in Europe, China and the USA, and that data mining is becoming an increasingly common tool in environmental health. For potential new directions, we have identified that deep learning and geo-spacial pattern mining are two burgeoning areas of data mining that have good potential for future applications in air pollution epidemiology.
      Conclusions: We carried out a systematic review identifying the current trends, challenges and new directions to explore in the application of data mining methods to air pollution epidemiology. This work shows that data mining is increasingly being applied in air pollution epidemiology. The potential to support air pollution epidemiology continues to grow with advancements in data mining related to temporal and geo-spacial mining, and deep learning. This is further supported by new sensors and storage mediums that enable larger, better quality data. This suggests that many more fruitful applications can be expected in the future.
    • References:
      J Med Internet Res. 2015 Mar 26;17(3):e22. (PMID: 25831020)
      PLoS One. 2015 Oct 27;10(10):e0141185. (PMID: 26505756)
      J Clin Epidemiol. 2009 Oct;62(10):1006-12. (PMID: 19631508)
      Sci Total Environ. 2015 Apr 15;512-513:103-113. (PMID: 25616226)
      PLoS One. 2017 Jul 14;12 (7):e0179763. (PMID: 28708836)
      JAMA. 2006 Mar 8;295(10):1127-34. (PMID: 16522832)
      Environ Sci Technol. 2015 Mar 17;49(6):3887-96. (PMID: 25648639)
      J Environ Sci (China). 2017 Jun;56:214-229. (PMID: 28571857)
      Artif Intell Med. 2016 Nov;74:44-52. (PMID: 27964802)
      Environ Res. 2016 Oct;150:227-35. (PMID: 27318255)
      Biostatistics. 2015 Jul;16(3):493-508. (PMID: 25532525)
      Environ Sci Process Impacts. 2013 May;15(5):996-1005. (PMID: 23535697)
      Environ Sci Pollut Res Int. 2014 Mar;21(5):3558-71. (PMID: 24271724)
      Int J Environ Res Public Health. 2015 Dec 01;12(12):15233-53. (PMID: 26633448)
      Environ Health Perspect. 2015 Nov;123(11):1193-9. (PMID: 25859761)
      Int J Environ Res Public Health. 2017 Jun 25;14 (7):null. (PMID: 28672831)
      Faraday Discuss. 2016 Jul 18;189:85-103. (PMID: 27104223)
      Environ Pollut. 2013 Jun;177:156-63. (PMID: 23500053)
      Chemosphere. 2010 Nov;81(10):1358-67. (PMID: 20825963)
      Trop Med Int Health. 2008 Aug;13(8):1022-41. (PMID: 18768080)
      Nature. 2015 May 28;521(7553):436-44. (PMID: 26017442)
      Environ Health Insights. 2015 May 12;9(Suppl 1):41-52. (PMID: 26005352)
      Sensors (Basel). 2008 Jun 01;8(6):3601-3623. (PMID: 27879895)
      Ann Appl Stat. 2017 Mar;11(1):93-113. (PMID: 28572869)
      Radiat Prot Dosimetry. 2009 Dec;137(3-4):324-8. (PMID: 19914968)
      Environ Sci Technol. 2010 Dec 15;44(24):9370-6. (PMID: 21090571)
      Geospat Health. 2014 Dec 01;8(3):S611-30. (PMID: 25599634)
      Environ Health. 2014 Mar 13;13(1):17. (PMID: 24625053)
      Am J Epidemiol. 2012 Nov 1;176(9):815-24. (PMID: 23045474)
      Environ Pollut. 2017 Nov;230:730-740. (PMID: 28732336)
      PLoS One. 2016 Feb 26;11(2):e0148875. (PMID: 26919723)
    • Grant Information:
      Canada CIHR
    • Contributed Indexing:
      Keywords: Air pollution; Association mining; Big data; Data mining; Epidemiology; Exposure; Machine learning
    • Publication Date:
      Date Created: 20171129 Date Completed: 20180321 Latest Revision: 20181202
    • Publication Date:
      20221213
    • Accession Number:
      PMC5704396
    • Accession Number:
      10.1186/s12889-017-4914-3
    • Accession Number:
      29179711