
考虑机器学习建模中训练/测试集时空划分原则的滑坡易发性预测建模
黄发明, 欧阳慰平, 蒋水华, 范宣梅, 连志鹏, 周创兵
考虑机器学习建模中训练/测试集时空划分原则的滑坡易发性预测建模
Landslide Susceptibility Prediction Considering Spatio-Temporal Division Principle of Training/Testing Datasets in Machine Learning Models
滑坡易发性预测时大多按空间随机来划分模型训练/测试数据集,但随机划分方式难免将不确定性因素引入建模中.因为理论上滑坡易发性是基于过去的滑坡来预测将来发生滑坡的空间概率,具有显著的时间顺序特征而非单纯的空间随机,可见有必要探索基于滑坡发生的时间顺序划分模型训练/测试集.以浙江文成县为例获取11类环境因子及128个时间准确的滑坡;之后将联接了环境因子的滑坡‒非滑坡样本分别按照滑坡时间顺序和空间随机的原则,划分为两类不同训练/测试集;其划分比例分别设定为9∶1、8∶2、7∶3、6∶4和5∶5等以避免不同比例影响研究结果,由此得到10种组合工况下的训练/测试集;最后再训练测试支持向量机(SVM)、多层感知器(MLP)和随机森林(RF)等模型以预测滑坡易发性并分析其不确定性.结果表明:(1)训练/测试集按时间顺序划分的SVM、MLP和RF模型预测的滑坡易发性的不确定性略低于按空间随机性划分的模型,验证了按时间顺序划分的可行性;(2)训练/测试集按时间顺序划分实际上是其在空间随机划分下的一种更符合滑坡发生实际情况的“确定性”特征,当然对缺乏滑坡发生时间的数据集开展空间随机划分也是可行的.
In most of the landslide susceptibility prediction (LSP) models, the landslide-non landslide spatial datasets are divided into training/testing datasets according to the principle of spatial random, however, this spatial randomness division inevitably introduces uncertainties into LSP modelling. Theoretically, LSP modelling is based on past landslide inventories to predict the spatial probability of future landslides, which has significant time series characteristics rather than only spatial random characteristics. Therefore, we believe that it is necessary to divide spatial datasets into the model training/testing datasets based on the time series of landslide occurrence. Taking Wencheng County in China as an example, 11 types of environmental factors and 128 time-accurate landslides are obtained; Then, the landslide and non-landslide samples connected with environmental factors are divided into two different types of training/testing datasets according to the principles of landslide time series and spatial random, respectively. The division ratios of training/testing datasets are set as 9∶1, 8∶2, 7∶3, 6∶4 and 5∶5, respectively, to avoid the influences of different ratios on the LSP results. Thus, the training/testing datasets under 10 combined working conditions are obtained. Finally, several typical machine learning models, such as Support Vector Machine (SVM), Multi-Layer Perceptron (MLP) and Random Forest (RF), are then trained and tested to perform LSP and analyze their uncertainties. Results show that: (1) The LSP uncertainties performed by the time series-based SVM, MLP and RF models are slightly lower than those by spatial random-based models, which verifies the feasibility of dividing by time series; (2) The time series division of training/testing datasets is actually a “deterministic” case among the spatial random division, which is more consistent with the actual situation of landslides. Of course, it is also feasible to carry out spatial random division for training and testing datasets when lacking landslide occurrence time.
滑坡 / 滑坡易发性 / 时间顺序 / 训练/测试集比例 / 机器学习 / 工程地质
landslides / landslide susceptibility / time series / training/testing dataset / machine learning model / engineering geology
P64
Cao, W. G., Pan, D., Xu, Z. J., et al., 2023. Landslide Hazard Susceptibility Mapping in Henan Province: Comparison of Multiple Machine Learning Models. Bulletin of Geological Science and Technology, 1-11 (in Chinese with English abstract).
|
Chen, W., Peng, J. B., Hong, H. Y., et al., 2018. Landslide Susceptibility Modelling Using GIS-Based Machine Learning Techniques for Chongren County, Jiangxi Province, China. Science of the Total Environment, 626: 1121-1135. https://doi.org/10.1016/j.scitotenv.2018.01.124
|
Guo, Y. H., Dou, J., Xiang, Z. L., et al., 2023. Evaluation of Susceptibility of Wenchuan Coseismic Landslide Using Gradient Lifting Decision Trees and Random Forests Based on Optimal Negative Sample Sampling Strategy. Geological Science and Technology Bulletin, 1-20 (in Chinese with English abstract).
|
Huang, F.M., Chen, B., Mao, D.X., et al., 2023. Landslide Susceptibility Prediction Modeling and Interpretability Based on Self-Screening Deep Learning Model. Earth Science, 48(5): 1696-1710 (in Chinese with English abstract).
|
Huang, F.M., Chen, J.W., Tang, Z.P., et al., 2021. Uncertainties of Landslide Susceptibility Prediction Due to Different Spatial Resolutions and Different Proportions of Training and Testing Datasets. Chinese Journal of Rock Mechanics and Engineering, 40(6): 1155-1169 (in Chinese with English abstract).
|
Huang, F. M., Hu, S.Y., Yan, X.Y., et al., 2022a. Landslide Susceptibility Prediction Modeling Based on Machine Learning and Identification of Main Control Factors. Bulletin of Geological Science and Technology, 41(2):79-90 (in Chinese with English abstract).
|
Huang, F. M., Li, J. F., Wang, J. Y., et al., 2022b. Landslide Susceptibility Prediction Modeling Law Considering Suitability of Linear Environmental Factors and Different Machine Learning Models. Bulletin of Geological Science and Technology, 41(2):44-59 (in Chinese with English abstract).
|
Huang, F. M., Ye, Z., Jiang, S. H., et al., 2021. Uncertainty Study of Landslide Susceptibility Prediction Considering the Different Attribute Interval Numbers of Environmental Factors and Different Data-Based Models. CATENA, 202: 105250. https://doi.org/10.1016/j.catena.2021.105250
|
Hussin, H. Y., Zumpano, V., Reichenbach, P., et al., 2016. Different Landslide Sampling Strategies in a Grid-Based Bi-Variate Statistical Susceptibility Model. Geomorphology, 253: 508-523. https://doi.org/10.1016/j.geomorph.2015.10.030
|
Khanna, K., Martha, T. R., Roy, P., et al., 2021. Effect of Time and Space Partitioning Strategies of Samples on Regional Landslide Susceptibility Modelling. Landslides, 18(6): 2281-2294. https://doi.org/10.1007/s10346-021-01627-3
|
Li, W.B., Fan, X.M., Huang, F.M., et al., 2021. Uncertainties of Landslide Susceptibility Modeling under Different Environmental Factor Connections and Prediction Models. Earth Science, 46(10): 3777-3795 (in Chinese with English abstract).
|
Li, Y.W., Xu, L.R., Zhang, L.L., et al., 2023. Study on Development Patterns and Susceptibility Evaluation of Coseismic Landslides within Mountainous Regions Influenced by Strong Earthquakes. Earth Science, 48(5):1960-1976 (in Chinese with English abstract).
|
Lombardo, L., Tanyas, H., 2020. Chrono-Validation of Near-Real-Time Landslide Susceptibility Models via Plug-in Statistical Simulations. Engineering Geology, 278: 105818. https://doi.org/10.1016/j.enggeo.2020.105818
|
Shirzadi, A., Solaimani, K., Roshan, M. H., et al., 2019. Uncertainties of Prediction Accuracy in Shallow Landslide Modeling: Sample Size and Raster Resolution. CATENA, 178: 172-188. https://doi.org/10.1016/j.catena.2019.03.017
|
Wang, L. L., 2016. Feature Processing Methods in the Assessment of the Vulnerability of Rainfall-Type Landslides. Zhejiang University, Hangzhou (in Chinese with English abstract).
|
Wu, R.Z., Hu, X.D., Mei, H.B., et al., 2021. Spatial Susceptibility Assessment of Landslides Based on Random Forest: A Case Study from Hubei Section in the Three Gorges Reservoir Area. Earth Science, 46(1): 321-330 (in Chinese with English abstract).
|
Zhang, H., Gu, Q.Y., Sun, C.B., et al., 2022. Landslide Susceptibility Mapping in Hilly and Gentle Slope Region Based on Interpretable Machine Learning. Journal of Chongqing Normal University (Natural Science), 39(3): 78-92 (in Chinese with English abstract).
|
Zhu, J.X., Zhang, L.Z., Zhou, X.Y., et al., 2014. Characteristics of Temporal Scale of Regional Landslides Susceptibility Assessment. Soil and Water Conservation in China, (6): 18-21, 69 (in Chinese with English abstract).
|
/
〈 |
|
〉 |