PM2.5 Concentration Prediction and Analysis using Machine Learning
- Author(s)
- Juhyun Lee
- Type
- Thesis
- Degree
- Master
- Department
- 대학원 전기전자컴퓨터공학부
- Advisor
- Jeon, Moongu
- Abstract
- Air pollution is an important issue that directly affects human health. In particular, the particulate matter (PM), one of the major components of air pollution, is produced by automobiles and factories. As the global interest in PM concentration increases, it is important to improve the prediction accuracy of PM concentration. In this paper, we predict PM2.5 concentration which is small size among PM and also causes fatal respiratory disease. We propose using tree-based models such as random forests, XGBoost, LightGBM, and CatBoost to forecast PM2.5 concentration in South Korea. Despite promising results of the tree-based models, there are challenging issues to solve: how to handle the many missing values included in air pollutant data and what features help improve prediction performance. To address these issues, we exclude all missing data so that training models do not require sophisticated missing value processing. Even if these are eliminated, we use enough large-scale dataset to train the model. And then, we show that the prediction performance by using concatenated data of the observation and output data predicted by the traditional mathematical model, Community Multi-scale Air Quality model, is better than using the observation data only. In addition, we conduct extensive experiments that when the number of trees is increasing, when there is location information, and when missing values imputation without removing them. Experimental results show that the XGBoost model outperforms other tree-based models for all experimental cases.
- URI
- https://scholar.gist.ac.kr/handle/local/32945
- Fulltext
- http://gist.dcollection.net/common/orgView/200000908271
- 공개 및 라이선스
-
- 파일 목록
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.