OAK

PM2.5 Concentration Prediction and Analysis using Machine Learning

Metadata Downloads
Author(s)
Juhyun Lee
Type
Thesis
Degree
Master
Department
대학원 전기전자컴퓨터공학부
Advisor
Jeon, Moongu
Abstract
Air pollution is an important issue that directly affects human health. In particular, the particulate matter (PM), one of the major components of air pollution, is produced by automobiles and factories. As the global interest in PM concentration increases, it is important to improve the prediction accuracy of PM concentration. In this paper, we predict PM2.5 concentration which is small size among PM and also causes fatal respiratory disease. We propose using tree-based models such as random forests, XGBoost, LightGBM, and CatBoost to forecast PM2.5 concentration in South Korea. Despite promising results of the tree-based models, there are challenging issues to solve: how to handle the many missing values included in air pollutant data and what features help improve prediction performance. To address these issues, we exclude all missing data so that training models do not require sophisticated missing value processing. Even if these are eliminated, we use enough large-scale dataset to train the model. And then, we show that the prediction performance by using concatenated data of the observation and output data predicted by the traditional mathematical model, Community Multi-scale Air Quality model, is better than using the observation data only. In addition, we conduct extensive experiments that when the number of trees is increasing, when there is location information, and when missing values imputation without removing them. Experimental results show that the XGBoost model outperforms other tree-based models for all experimental cases.
URI
https://scholar.gist.ac.kr/handle/local/32945
Fulltext
http://gist.dcollection.net/common/orgView/200000908271
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.