
Design of big data anomaly detection model based on random forest algorithm
SONG Shi-jun, FAN Min
Design of big data anomaly detection model based on random forest algorithm
Aiming at the problem that Big data anomaly detection process is easily interfered by edge data, which leads to poor accuracy of Big data anomaly detection, a big data anomaly detection model based on Random forest algorithm was proposed. Firstly, the improved k-means algorithm was used to cluster the big data, and the principal component analysis method was used to extract the features of the big data. Then a big data anomaly detection model based on random forest classifier was built, the extracted features was inputted into the model, a decision tree was built, and the classification accuracy of the classifier was improved by dynamically updating the weight value of the decision tree. Finally, the classification results are output to complete the anomaly detection of big data. The experimental results show that the detection time of the proposed model is about 25 s, the average big data anomaly detection accuracy is 91%, and the false alarm rate is 4.5%.
big data clustering / feature extraction / principal component analysis / random forest classifier / decision tree / update weights
1 |
刘永辉, 张显, 孙鸿雁, 等. 能源互联网背景下电力市场大数据应用探讨[J]. 电力系统自动化, 2021, 45(11): 1-10.
|
2 |
姜丹, 梁春燕, 吴军英, 等. 基于大数据分析的电力运行数据异常检测示警方法[J]. 中国测试, 2020, 46(7): 18-23.
|
3 |
万磊, 陈成, 黄文杰, 等. 基于BRB和LSTM网络的电力大数据用电异常检测方法[J]. 电力建设, 2021, 42(8): 38-45.
|
4 |
李清. 基于改进PSO-PFCM聚类算法的电力大数据异常检测方法[J]. 电力系统保护与控制, 2021, 49(18): 161-166.
|
5 |
丁小欧, 于晟健, 王沐贤, 等. 基于相关性分析的工业时序数据异常检测[J]. 软件学报, 2020, 31(3): 726-747.
|
6 |
谢桦, 陈昊, 邓晓洋, 等. 基于改进k-means聚类技术与半不变量法的电-气综合能源系统运行风险评估方法[J]. 中国电机工程学报, 2020, 40(1): 59-69, 374.
|
7 |
吴金蔚. φ-混合样本下密度函数在有限点处的联合渐近分布[J]. 信阳师范学院学报: 自然科学版, 2021, 34(4): 541-544.
|
8 |
张重远, 胡焕, 程槐号, 等. 基于欧氏距离分析的电力变压器绕组变形程度与类型的诊断方法[J]. 高压电器, 2020, 56(1): 224-230.
|
9 |
代瑾, 陈莹. 联合线性判别和图正则的任务导向型跨模态检索[J]. 计算机辅助设计与图形学学报, 2021, 33(1): 106-115.
|
10 |
蔡瑞初, 李嘉豪, 郝志峰. 基于类内最大均值差异的无监督领域自适应算法[J]. 计算机应用研究, 2020, 37(8): 2371-2375.
|
11 |
胡善科, 秦玉华, 段如敏, 等. 联合矩阵局部保持投影的近红外光谱特征提取[J]. 光谱学与光谱分析, 2020, 40(12): 3772-3777.
|
12 |
吴铮, 张悦, 董泽. 基于改进高斯混合模型的热工过程异常值检测[J]. 系统仿真学报, 2023, 35(5): 1020-1033.
|
13 |
谢桦, 陈俊星, 赵宇明, 等. 基于SMOTE和决策树算法的电力变压器状态评估知识获取方法[J]. 电力自动化设备, 2020, 40(2): 137-142.
|
14 |
蔡瑞初, 白一鸣, 乔杰, 等. 基于混淆因子隐压缩表示模型的因果推断方法[J]. 计算机应用, 2021, 41(10): 2793-2798.
|
15 |
张清华, 庞国弘, 李新太, 等. 基于代价敏感的序贯三支决策最优粒度选择方法[J]. 电子与信息学报, 2021, 43(10): 3001-3009.
|
/
〈 |
|
〉 |