地理研究 ›› 2022, Vol. 41 ›› Issue (6): 1731-1747.doi: 10.11821/dlyj020210528

• 研究论文 • 上一篇    下一篇

基于机器学习模型的区域土壤重金属空间预测精度比较研究

金昭(), 吕建树()   

  1. 山东师范大学地理与环境学院,济南 250358
  • 收稿日期:2021-06-21 接受日期:2021-09-13 出版日期:2022-06-10 发布日期:2022-08-10
  • 通讯作者: 吕建树(1986-),男,山东莱芜人,博士,副教授,主要从事重金属环境地球化学及地质统计学相关研究。E-mail: lvjianshu@126.com
  • 作者简介:金昭(1999-),女,山东济南人,硕士研究生,主要研究方向为土壤污染物空间建模。 E-mail: geostatistical@163.com
  • 基金资助:
    山东省自然科学基金优秀青年基金项目(ZR2020YQ31);国家自然科学基金项目(41601549)

Comparison of the accuracy of spatial prediction for heavy metals in regional soils based on machine learning models

JIN Zhao(), LV Jianshu()   

  1. College of Geography and Environment, Shandong Normal University, Jinan 250358, China
  • Received:2021-06-21 Accepted:2021-09-13 Published:2022-06-10 Online:2022-08-10

摘要:

为识别区域土壤重金属的空间变异特征并厘清其影响因素,本研究构建了多元线性回归(MLR)、弹性网络回归(ENR)、随机森林(RF)、随机梯度提升(SGB)、堆叠(stacking)集成模型、反向传播神经网络(BP-ANN)、基于模型平均的神经网络集成(avNNet)、线性核支持向量机(SVM-L)和高斯核支持向量机(SVM-R)共九种机器学习模型,利用山东省中部土壤重金属(Cd、Cu、Hg、Pb和Zn)和环境辅助变量数据,开展区域土壤重金属空间预测精度比较研究。结果表明:RF对五种重金属空间预测的决定系数(R2)介于0.263~0.448之间,平均绝对误差(MAE)和均方根误差(RMSE)分别小于8.408和10.636,预测值/实际值(P/O)均接近于1,对五种重金属的预测效果均较为理想,是研究区土壤重金属空间预测的最优模型;SVM-R整体预测性能仅次于RF,各项精度评价指标均相对稳健,可作为备选模型;其余七种模型的预测性能均明显低于RF和SVM-R。RF的空间预测结果显示,研究区五种重金属呈现出相似的空间分布格局,含量均由研究区东北部向西南部递减,包括东北部、北部和南部3个高值区,且高值区与当地工业–交通密集区的分布格局一致,反映出人类活动是研究区土壤重金属空间分异的主要影响因素。本研究可为区域土壤污染调查、评价和管控提供科学参考。

关键词: 机器学习, 土壤重金属, 空间预测, 影响因素

Abstract:

In order to identify the spatial variation of regional soil heavy metals and clarify the relevant influencing factors, this work built multiple linear regression (MLR), elastic network regression (ENR), random forest (RF), stochastic gradient boosting (SGB), ensembled model based on stacking, Back-Propagation artificial neural network (BP-ANN), neural network ensemble based on model averaging (avNNet), support vector machine with linear kernel (SVM-L), and support vector machine with radial basis function kernel (SVM-R); and applied these nine machine learning models to a dataset consisting of soil Cd, Cu, Hg, Pb, Zn concentrations and environmental auxiliary variables in the central part of Shandong Province. Finally, the spatial prediction accuracy derived from nine models was compared. It was confirmed that RF outperformed other models, with R2 values among 0.263 and 0.448, while MAE and RMSE below 8.408 and 10.636, respectively, and P/O approximating to 1. Thus, RF can be regarded as the optimal model for spatial prediction of soil heavy metals. Besides, SVM-R showed ideal predictive accuracy, and can serve as the alternative model. The accuracy for other seven models were obviously inferior to RF and SVM-R. Soil heavy metals in the study area showed similar spatial patterns with concentrations following the decreasing trend from northeast to southwest according to RF. The regions of high heavy metals contents were located in northeastern, northern, and southern parts, coherent with the industrial sites and road networks, indicating that human activities are a significant influencing factor for spatial distributions of heavy metals in soils. This work can provide an important reference for regional soil pollution management.

Key words: machine learning, heavy metals in soils, spatial prediction, influencing factors