
Dynamic outlier detection algorithm for network large data set based on classification and regression trees decision tree
FU Li-fang, CHEN Zhuo, AO Chang-lin
Dynamic outlier detection algorithm for network large data set based on classification and regression trees decision tree
There are massive data in big data sets, and when the data scale expands to a certain extent, the processing efficiency of discrete point detection is limited. Therefore, a dynamic outlier detection algorithm based on CART decision tree was proposed. Firstly, the abnormal data standard of large data set was divided, the data dispersion degree by variance was measured, the abnormal data sample association rule matrix by support vector machine was established, the abnormal data range of large data set was clarified, and the amount of outlier detection calculation by dynamic meshing strategy was reduced. Then, the classification and regression trees(CART) decision tree method was used to take Boolean detection at the branch nodes, unify the data to be detected as continuous data, arrange the training data set in ascending order, calculate the maximum information gain of the data, prune the decision tree until no non leaf nodes can be replaced, and obtain the dynamic detection results of outliers. Simulation results show that the proposed algorithm has high outlier detection accuracy, short detection time, significant computational advantages, and can provide positive help for the reliable application of large data sets.
classification and regression trees(CART) decision tree / large data sets / outlier detection / data preprocessing / meshing / Gini coefficient
1 |
杨晓玲, 冯山, 袁钟. 基于相对距离的反k近邻树离群点检测[J]. 电子学报, 2020, 48(5): 937-945.
|
2 |
|
3 |
张倩倩, 于炯, 李梓杨, 等. 基于近邻传播的离群点检测算法[J]. 计算机应用研究, 2021, 38(6): 1662-1667.
|
4 |
江峰, 王凯郦, 于旭, 等. 基于粗糙熵的离群点检测方法及其在无监督入侵检测中的应用[J]. 控制与决策, 2020, 35(5): 1199-1204.
|
5 |
|
6 |
袁庆军, 王安, 王永娟, 等. 基于流形学习能量数据预处理的模板攻击优化方法[J]. 电子与信息学报, 2020, 42(8): 1853-1861.
|
7 |
|
8 |
邓泓, 刘志超, 彭莹琼, 等. 基于Fibonacci采样的数据预处理方法研究[J]. 江西师范大学学报: 自然科学版, 2021, 45(1): 60-66.
|
9 |
|
10 |
|
11 |
刘云, 郑文凤, 张轶. 模糊残差算法对离群点数据的优化研究[J].小型微型计算机系统, 2021, 42(6): 1321-1326.
|
12 |
王习特, 朱宗梅, 于雪苹, 等. 异构分布式环境中的并行离群点检测算法[J]. 湖南大学学报: 自然科学版, 2020, 47(10): 100-110.
|
13 |
|
14 |
水泽农, 张星宇, 沙朝锋. 基于最优输运和k-近邻的离群文档检测[J]. 计算机科学, 2021, 48(7): 105-111.
|
15 |
|
16 |
|
17 |
林雪. 海量不确定数据集中离群点快速检测方法仿真[J]. 计算机仿真, 2021, 38(6): 378-382.
|
18 |
|
19 |
董泽, 贾昊. 基于EWT-LOF的热工过程数据异常值检测方法[J]. 仪器仪表学报, 2020, 41(2): 126-134.
|
20 |
|
/
〈 |
|
〉 |