基于集成学习优化的河套盆地地下水砷风险评估

付宇, 曹文庚, 张春菊, 翟文华, 任宇, 南天, 李泽岩

PDF(1626 KB)
PDF(1626 KB)
地学前缘 ›› 2024, Vol. 31 ›› Issue (3) : 371-380. DOI: 10.13745/j.esf.sf.2023.2.40
地下水与地热资源

基于集成学习优化的河套盆地地下水砷风险评估

作者信息 +

Risk assessment of groundwater arsenic in Hetao Basin base on ensemble learning optimization

Author information +
History +

摘要

河套盆地浅层地下水砷污染严重超标,其潜在的高砷风险对当地居民健康造成严重威胁。当前宏观尺度的高砷地下水风险分布认识仍显不足。本研究以605个浅层地下水样数据以及沉积环境、气候、人类活动、土壤理化特征、水文地质条件等环境因子为数据源,构建了以随机森林(RF)、极端梯度提升(XGBoost)、支持向量机(SVM)为基学习器,线性判别分析(LDA)为元学习器的高砷地下水Stacking集成学习模型,预测了研究区地下水砷风险分布,并对影响该地区地下水砷风险分布的关键环境因子进行识别。研究表明:研究区地下水砷浓度超标(>10 μg/L)率为49.59%,多集中在改道形成的古河道影响带和黄河决口扇;构建的Stacking集成模型比单一模型中性能最优的RF模型具有更高的可靠性,ROC曲线下的面积(AUC)和准确率分别提高了1.1%和3.2%;高风险区面积达到5 257 km2,占研究区总面积的38.44%;沉积环境是影响高砷地下水风险分布的关键环境因素,对模型准确性贡献度高达25.06%。研究结果能够为地下水砷风险分布制图提供方法及参考,对地区饮水安全和人类健康具有重要意义。

Abstract

The shallow groundwater arsenic pollution in Hetao Basin seriously exceeds the standard, and its potential pollution risk poses a serious health threat to local residents. At present, the perception of the risk distribution of high arsenic groundwater is still insufficient on the macroscopic scale. Based on 605 shallow groundwater samples and environmental factors such as sedimentary environment, climate, human activities, soil physical and chemical characteristics, and hydrogeological conditions as data sources, Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Support Vector Machine (SVM) were selected as the base learners, and Linear Discriminant Analysis (LDA) was selected as the meta-learner to construct a Stacking ensemble learning model for high arsenic groundwater. The ensemble learning model was used to predict the risk distribution of high arsenic groundwater and identify the key environmental factors affecting the risk distribution of high arsenic groundwater in the region. The research showed that the groundwater arsenic concentration exceeded the standard rate (>10 μg/L) was 49.59%, mainly concentrated in the paleochannel zone and flood fans of the Yellow River. The Stacking ensemble model had higher reliability than the RF model with the best performance in the single model, and the Area Under the ROC Curve (AUC) and accuracy were increased by 1.1% and 3.2%, respectively. The high-risk area reached 5257 km2, accounting for 38.44% of the total area of the study area. The sedimentary environment is the key environmental factor affecting the risk distribution of high arsenic groundwater, contributing up to 25.06% to the accuracy of the model. The results of this study can provide a method and reference for mapping the spatial distribution of high arsenic groundwater pollution and have important implications for the safety of drinking water and human health in the region.

关键词

Stacking集成学习 / 地下水 / 高砷 / 风险分布 / 河套盆地

Key words

Stacking ensemble learning / groundwater / high arsenic / risk distribution / Hetao Basin

中图分类号

P641;X523;TP18

引用本文

导出引用
付宇 , 曹文庚 , 张春菊 , . 基于集成学习优化的河套盆地地下水砷风险评估. 地学前缘. 2024, 31(3): 371-380 https://doi.org/10.13745/j.esf.sf.2023.2.40
Yu FU, Wengeng CAO, Chunju ZHANG, et al. Risk assessment of groundwater arsenic in Hetao Basin base on ensemble learning optimization[J]. Earth Science Frontiers. 2024, 31(3): 371-380 https://doi.org/10.13745/j.esf.sf.2023.2.40

参考文献

[1]
PODGORSKI J, BERG M. Global threat of arsenic in groundwater[J]. Science, 2020, 368(6493): 845-850.
[2]
WORLD HEALTH ORGANIZATION. Guidelines for Drinking-water Quality[S]. 4th ed. Geneva: World Health Organization, 2011.
[3]
OREMLAND R S, STOLZ J F. The ecology of arsenic[J]. Science, 2003, 300(5621): 939-944.
[4]
JIA Y F, GUO H M, JIANG Y X, et al. Hydrogeochemical zonation and its implication for arsenic mobilization in deep groundwaters near alluvial fans in the Hetao Basin, Inner Mongolia[J]. Journal of Hydrology, 2014, 518(Part C): 410-420.
[5]
SMEDLEY P L, KINNIBURGH D. A review of the source, behaviour and distribution of arsenic in natural waters[J]. Applied Geochemistry, 2002, 17(5): 517-568.
[6]
曹文庚, 董秋瑶, 谭俊, 等. 河套盆地晚更新世以来黄河改道对高砷地下水分布的控制机制[J]. 南水北调与水利科技, 2021, 19(1): 140-150.
[7]
金银龙, 梁超轲, 何公理, 等. 中国地方性砷中毒分布调查(总报告)[J]. 卫生研究, 2003, 32(6): 519-540.
[8]
SMEDLEY P L, ZHANG M, ZHANG G, et al. Mobilisation of arsenic and other trace elements in fluviolacustrine aquifers of the Huhhot Basin, Inner Mongolia[J]. Applied Geochemistry, 2003, 18(9): 1453-1477.
[9]
郭华明, 唐小惠, 杨素珍, 等. 土著微生物作用下含水层沉积物砷的释放与转化[J]. 现代地质, 2009, 23(1): 86-93.
[10]
高存荣, 刘文波, 冯翠娥, 等. 干旱、半干旱地区高砷地下水形成机理研究: 以中国内蒙古河套平原为例[J]. 地学前缘, 2014, 21(4): 13-29.
[11]
CAO W G, GUO H M, ZHANG Y L, et al. Controls of paleochannels on groundwater arsenic distribution in shallow aquifers of alluvial plain in the Hetao Basin, China[J]. The Science of the Total Environment, 2018, 613/614: 958-968.
[12]
GUO H M, LI X M, XIU W, et al. Controls of organic matter bioreactivity on arsenic mobility in shallow aquifers of the Hetao Basin, P. R. China[J]. Journal of Hydrology, 2019, 571: 448-459.
[13]
张庆卜. 国家级地下水位监测数据分析研究: 以民勤盆地为例[D]. 北京: 中国地质大学(北京), 2020.
[14]
CHOWDHURY M, ALOUANI A, HOSSAIN F. Comparison of ordinary Kriging and artificial neural network for spatial mapping of arsenic contamination of groundwater[J]. Stochastic Environmental Research and Risk Assessment, 2010, 24(1): 1-7.
[15]
LIN Y P, CHENG B Y, CHU H J, et al. Assessing how heavy metal pollution and human activity are related by using logistic regression and Kriging methods[J]. Geoderma, 2011, 163(3/4): 275-282.
[16]
AHN J S, CHO Y C. Predicting natural arsenic contamination of bedrock groundwater for a local region in Korea and its application[J]. Environmental Earth Sciences, 2013, 68(7): 2123-2132.
[17]
WINKEL L, BERG M, AMINI M, et al. Predicting groundwater arsenic contamination in Southeast Asia from surface parameters[J]. Nature Geoscience, 2008, 1(8): 536-542.
[18]
TAN Z, YANG Q, ZHENG Y. Machine learning models of groundwater arsenic spatial distribution in Bangladesh: influence of Holocene sediment depositional history[J]. Environmental Science & Technology, 2020, 54(15): 9454-9463.
[19]
TWARAKAVI N K C, MISRA D, BANDOPADHYAY S. Prediction of arsenic in bedrock derived stream sediments at a gold mine site under conditions of sparse data[J]. Natural Resources Research, 2006, 15(1): 15-26.
[20]
CHO K H, STHIANNOPKAO S, PACHEPSKY Y A, et al. Prediction of contamination potential of groundwater arsenic in Cambodia, Laos, and Thailand using artificial neural network[J]. Water Research, 2011, 45(17): 5535-5544.
[21]
LOMBARD M A, BRYAN M S, JONES D K, et al. Machine learning models of arsenic in private wells throughout the conterminous United States as a tool for exposure assessment in human health studies[J]. Environmental Science & Technology, 2021, 55(8): 5012-5023.
[22]
FU Y, CAO W G, PAN D, et al. Changes of groundwater arsenic risk in different seasons in Hetao Basin based on machine learning model[J]. The Science of the Total Environment, 2022, 817: 153058.
[23]
CAO H L, XIE X J, WANG Y X, et al. The interactive natural drivers of global geogenic arsenic contamination of groundwater[J]. Journal of Hydrology, 2021, 597: 126214.
[24]
BUI D T, KHOSRAVI K, TIEFENBACHER J, et al. Improving prediction of water quality indices using novel hybrid machine-learning algorithms[J]. The Science of the Total Environment, 2020, 721: 137612.
[25]
MALLICK J, TALUKDAR S, ALSUBIH M, et al. Integration of statistical models and ensemble machine learning algorithms (MLAs) for developing the novel hybrid groundwater potentiality models: a case study of semi-arid watershed in Saudi Arabia[J]. Geocarto International, 2022, 37(22): 6442-6473.
[26]
CHEN Y, CHEN W, CHANDRA PAL S, et al. Evaluation efficiency of hybrid deep learning algorithms with neural network decision tree and boosting methods for predicting groundwater potential[J]. Geocarto International, 2022, 37(19): 5564-5584.
[27]
WOLPERT D H. Stacked generalization[J]. Neural Networks, 1992, 5(2): 241-259.
[28]
CHATZIMPARMPAS A, MARTINS R M, KUCHER K, et al. StackGenVis: alignment of data, algorithms, and models for stacking ensemble learning using performance metrics[J]. IEEE Transactions on Visualization and Computer Graphics, 2021, 27(2): 1547-1557.
[29]
LEDEZMA A, ALER R, SANCHIS A, et al. GA-stacking: evolutionary stacked generalization[J]. Intelligent Data Analysis, 2010, 14(1): 89-119.
[30]
SUN W, TREVOR B A. stacking ensemble learning framework for annual river ice breakup dates[J]. Journal of Hydrology, 2018, 561: 636-650.
[31]
HU X D, ZHANG H, MEI H B, et al. Landslide susceptibility mapping using the stacking ensemble machine learning method in Lushui, Southwest China[J]. Applied Sciences, 2020, 10(11): 4016.
[32]
TAGHIZADEH R, SCHMIDT K, CHAKAN A A, et al. Improving the spatial prediction of soil organic carbon content in two contrasting climatic regions by stacking machine learning models and rescanning covariate space[J]. Remote Sensing, 2020, 12(7): 1095.
[33]
GU J Y, LIU S G, ZHOU Z Z, et al. A stacking ensemble learning model for monthly rainfall prediction in the Taihu Basin, China[J]. Water, 2022, 14(3): 492.
[34]
GUO H M, ZHANG Y, XING L N, et al. Spatial variation in arsenic and fluoride concentrations of shallow groundwater from the town of Shahai in the Hetao Basin, Inner Mongolia[J]. Applied Geochemistry, 2012, 27(11): 2187-2196.
[35]
高存荣. 河套平原地下水砷污染机理的探讨[J]. 中国地质灾害与防治学报, 1999(2): 25-32.
[36]
RAPHAËL B, VINCENT C, ERIC R, et al. A review and evaluation of the impacts of climate change on geogenic arsenic in groundwater from fractured bedrock aquifers[J]. Water, Air, & Soil Pollution, 2016, 227(9): 296.
[37]
CHARLET L, POLYA D A. Arsenic in shallow, reducing groundwaters in southern Asia: an environmental health disaster[J]. Elements, 2006, 2(2): 91-96.
[38]
PODGORSKI J E, EQANI S, KHANAM T, et al. Extensive arsenic contamination in high-pH unconfined aquifers in the Indus Valley[J]. Science Advances, 2017, 3(8): e1700935.
[39]
NGUYEN P T, HA D H, NGUYEN H D, et al. Improvement of credal decision trees using ensemble frameworks for groundwater potential modeling[J]. Sustainability, 2020, 12(7): 2622.
[40]
GHOBADI A, CHERAGHI M, SOBHANARDAKANI S, et al. Groundwater quality modeling using a novel hybrid data-intelligence model based on gray wolf optimization algorithm and multi-layer perceptron artificial neural network: a case study in Asadabad Plain, Hamedan, Iran[J]. Environmental Science and Pollution Research, 2022, 29(6): 8716-8730.
[41]
OSMAN A I A, AHMED A N, HUANG Y F, et al. Past, present and perspective methodology for groundwater modeling-based machine learning approaches[J]. Archives of Computational Methods in Engineering, 2022, 29(6): 3843-3859.
[42]
ALI E B, ABDESLAM T, YOUSSEF B. Groundwater quality forecasting using machine learning algorithms for irrigation purposes[J]. Agricultural Water Management, 2021, 245: 106625.
[43]
SINGHA S, PASUPULETI S, SINGHA S S, et al. Prediction of groundwater quality using efficient machine learning technique[J]. Chemosphere, 2021, 276: 130265.
[44]
NASIR N, KANSAL A, ALSHALTONE O, et al. Water quality classification using machine learning algorithms[J]. Journal of Water Process Engineering, 2022, 48: 102920.
[45]
MOSAVI A, SAJEDI HOSSEINI F, CHOUBIN B, et al. Ensemble boosting and bagging based machine learning models for groundwater potential prediction[J]. Water Resources Management, 2021, 35(1): 23-37.
[46]
NAGHIBI S A, AHMADI K, DANESHI A. Application of support vector machine, random forest, and genetic algorithm optimized random forest models in groundwater potential mapping[J]. Water Resources Management, 2017, 31(9): 2761-2775.
[47]
HAMMED M M, ALOMAR M K, KHALEEL F, et al. An extra tree regression model for discharge coefficient prediction: novel, practical applications in the hydraulic sector and future research directions[J]. Mathematical Problems in Engineering, 2021, 2021(1): 1-19.
[48]
HUSEN S, KHAMITKAR S, BHALCHANDRA P, et al. Modeling groundwater spring potential of selected geographical area using machine learning algorithms[M]//Applied computer vision and image processing. Singapore: Springer, 2020: 424-432.
[49]
HUANG X, GAO L, CROSBIE R S, et al. Groundwater recharge prediction using linear regression, multi-layer perception network, and deep learning[J]. Water, 2019, 11(9): 1879.
[50]
MAIR A, EL-KADI A I. Logistic regression modeling to assess groundwater vulnerability to contamination in Hawaii, USA[J]. Journal of Contaminant Hydrology, 2013, 153: 1-23.
[51]
WILSON S R, CLOSE M E, ABRAHAM P. Applying linear discriminant analysis to predict groundwater redox conditions conducive to denitrification[J]. Journal of Hydrology, 2018, 556: 611-624.
[52]
MALLIKARJUNA B, SATHISH K, VENKATA KRISHNA P, et al. The effective SVM-based binary prediction of ground water table[J]. Evolutionary Intelligence, 2021, 14(2): 779-787.
[53]
ZHAO J C, JI G X, TIAN Y, et al. Environmental vulnerability assessment for mainland China based on entropy method[J]. Ecological Indicators, 2018, 91: 410-422.
[54]
周志华, 王珏. 机器学习及其应用[M]. 北京: 清华大学出版社, 2007: 63-72.
[55]
GUO H M, TANG X H, YANG S Z, et al. Effect of indigenous bacteria on geochemical behavior of arsenic in aquifer sediments from the Hetao Basin, Inner Mongolia: evidence from sediment incubations[J]. Applied Geochemistry, 2008, 23(12): 3267-3277.
[56]
VAN GEENA A, ZHENG Y, GOODBRED Jr S, et al. Flushing history as a hydrogeological control on the regional distribution of arsenic in shallow groundwater of the Bengal Basin[J]. Environmental Science & Technology, 2008, 42(7): 2283-2288.
[57]
付宇, 曹文庚, 张娟娟. 基于随机森林建模预测河套盆地高砷地下水风险分布[J]. 岩矿测试, 2021, 40(6): 860-870.
[58]
郭华明, 高志鹏, 修伟. 地下水典型氧化还原敏感组分迁移转化的研究热点和趋势[J]. 地学前缘, 2022, 29(3): 64-75.

基金

国家自然科学基金项目(41972262)
河北自然科学基金优秀青年科学基金项目(D2020504032)

评论

PDF(1626 KB)

Accesses

Citation

Detail

段落导航
相关文章

/