Design of big data anomaly detection model based on random forest algorithm

SONG Shi-jun, FAN Min

PDF(765 KB)
PDF(765 KB)
J Jilin Univ Eng Tech Ed ›› 2023, Vol. 53 ›› Issue (09) : 2659-2665. DOI: 10.13229/j.cnki.jdxbgxb.20220598

Design of big data anomaly detection model based on random forest algorithm

Author information +
History +

Abstract

Aiming at the problem that Big data anomaly detection process is easily interfered by edge data, which leads to poor accuracy of Big data anomaly detection, a big data anomaly detection model based on Random forest algorithm was proposed. Firstly, the improved k-means algorithm was used to cluster the big data, and the principal component analysis method was used to extract the features of the big data. Then a big data anomaly detection model based on random forest classifier was built, the extracted features was inputted into the model, a decision tree was built, and the classification accuracy of the classifier was improved by dynamically updating the weight value of the decision tree. Finally, the classification results are output to complete the anomaly detection of big data. The experimental results show that the detection time of the proposed model is about 25 s, the average big data anomaly detection accuracy is 91%, and the false alarm rate is 4.5%.

Key words

big data clustering / feature extraction / principal component analysis / random forest classifier / decision tree / update weights

Cite this article

Download Citations
SONG Shi-jun , FAN Min. Design of big data anomaly detection model based on random forest algorithm. Journal of Jilin University(Engineering and Technology Edition). 2023, 53(09): 2659-2665 https://doi.org/10.13229/j.cnki.jdxbgxb.20220598

References

1
刘永辉, 张显, 孙鸿雁, 等. 能源互联网背景下电力市场大数据应用探讨[J]. 电力系统自动化, 2021, 45(11): 1-10.
Liu Yong-hui, Zhang Xian, Sun Hong-yan, et al. Discussion on application of big data in electricity market in background of energy internet[J]. Automation of Electric Power Systems, 2021, 45(11): 1-10.
2
姜丹, 梁春燕, 吴军英, 等. 基于大数据分析的电力运行数据异常检测示警方法[J]. 中国测试, 2020, 46(7): 18-23.
Jiang Dan, Liang Chun-yan, Wu Jun-ying, et al. Alarm method of power operation data anomaly detection based on big data analysis[J]. China Measurement & Test, 2020, 46(7): 18-23.
3
万磊, 陈成, 黄文杰, 等. 基于BRB和LSTM网络的电力大数据用电异常检测方法[J]. 电力建设, 2021, 42(8): 38-45.
Wan Lei, Chen Cheng, Huang Wen-jie, et al. Power abnormity detection method based on power big data applying BRB and LSTM network[J]. Electric Power Construction, 2021, 42(8): 38-45.
4
李清. 基于改进PSO-PFCM聚类算法的电力大数据异常检测方法[J]. 电力系统保护与控制, 2021, 49(18): 161-166.
Li Qing. Power big data anomaly detection method based on an improved PSO-PFCM clustering algorithm[J]. Power System Protection and Control, 2021, 49(18): 161-166.
5
丁小欧, 于晟健, 王沐贤, 等. 基于相关性分析的工业时序数据异常检测[J]. 软件学报, 2020, 31(3): 726-747.
Ding Xiao-ou, Yu Sheng-jian, Wang Mu-xian, et al. Anomaly detection on industrial time series based on correlation analysis[J]. Journal of Software, 2020, 31(3): 726-747.
6
谢桦, 陈昊, 邓晓洋, 等. 基于改进k-means聚类技术与半不变量法的电-气综合能源系统运行风险评估方法[J]. 中国电机工程学报, 2020, 40(1): 59-69, 374.
Xie Hua, Chen Hao, Deng Xiao-yang, et al. Electric-gas integrated energy system operational risk assessment based on improved k-means clustering technology and semi-invariant method[J]. Proceedings of the CSEE, 2020, 40(1): 59-69, 374.
7
吴金蔚. φ-混合样本下密度函数在有限点处的联合渐近分布[J]. 信阳师范学院学报: 自然科学版, 2021, 34(4): 541-544.
Wu Jin-wei. The joint asymptotic distribution of probability density function in a finite number of points under φ-mixing samples[J]. Journal of Xinyang Normal University (Natural Science Edition), 2021, 34(4): 541-544.
8
张重远, 胡焕, 程槐号, 等. 基于欧氏距离分析的电力变压器绕组变形程度与类型的诊断方法[J]. 高压电器, 2020, 56(1): 224-230.
Zhang Zhong-yuan, Hu Huan, Cheng Huai-hao, et al. Diagnostic method to determine degree and type of winding deformation in power transformer based on euclidean distance[J]. High Voltage Apparatus, 2020, 56(1): 224-230.
9
代瑾, 陈莹. 联合线性判别和图正则的任务导向型跨模态检索[J]. 计算机辅助设计与图形学学报, 2021, 33(1): 106-115.
Dai Jin, Chen Ying. Joint Linear Discrimination and graph regularization for task-oriented cross-modal retrieval[J]. Journal of Computer-Aided Design & Computer Graphics, 2021, 33(1): 106-115.
10
蔡瑞初, 李嘉豪, 郝志峰. 基于类内最大均值差异的无监督领域自适应算法[J]. 计算机应用研究, 2020, 37(8): 2371-2375.
Cai Rui-chu, Li Jia-hao, Hao Zhi-feng.Unsupervised domain adaptive algorithm with intra-class maximum mean discrepancy[J]. Application Research of Computers, 2020, 37(8): 2371-2375.
11
胡善科, 秦玉华, 段如敏, 等. 联合矩阵局部保持投影的近红外光谱特征提取[J]. 光谱学与光谱分析, 2020, 40(12): 3772-3777.
Hu Shan-ke, Qin Yu-hua, Duan Ru-min, et al. Research on feature extraction of near-infrared spectroscopy based on joint matrix local preserving projection[J]. Spectroscopy and Spectral Analysis, 2020, 40(12): 3772-3777.
12
吴铮, 张悦, 董泽. 基于改进高斯混合模型的热工过程异常值检测[J]. 系统仿真学报, 2023, 35(5): 1020-1033.
Wu Zheng, Zhang Yue, Dong Ze. Outlier detection during thermal processes based on improved Gaussian mixture model[J]. Journal of System Simulation, 2023, 35(5): 1020-1033.
13
谢桦, 陈俊星, 赵宇明, 等. 基于SMOTE和决策树算法的电力变压器状态评估知识获取方法[J]. 电力自动化设备, 2020, 40(2): 137-142.
Xie Hua, Chen Jun-xing, Zhao Yu-ming, et al. Knowledge acquisition method of power transformer condition assessment based on SMOTE and decision tree algorithm[J]. Electric Power Automation Equipment, 2020, 40(2): 137-142.
14
蔡瑞初, 白一鸣, 乔杰, 等. 基于混淆因子隐压缩表示模型的因果推断方法[J]. 计算机应用, 2021, 41(10): 2793-2798.
Cai Rui-chu, Bai Yi-ming, Qiao Jie, et al. Causal inference method based on confounder hidden compact representation model[J]. Journal of Computer Applications, 2021, 41(10): 2793-2798.
15
张清华, 庞国弘, 李新太, 等. 基于代价敏感的序贯三支决策最优粒度选择方法[J]. 电子与信息学报, 2021, 43(10): 3001-3009.
Zhang Qing-hua, Pang Guo-hong, Li Xin-tai, et al. Optimal granularity selection method based on cost-sensitive sequential three-way decisions[J]. Journal of Electronics & Information Technology, 2021, 43(10): 3001-3009.

Comments

PDF(765 KB)

Accesses

Citation

Detail

Sections
Recommended

/