摘要随着科学技术的发展,越来越多的数据涌现出来。如何更好地利用这些数据, 成为了人类亟需解决的重大课题之一。机器学习的目的就是通过已知数据来学习一 般规律,并使其适用于新数据。80846
分类技术是机器学习的重要分支之一,其通过对已知的训练集进行分析,训练 出一个分类器,使其能尽量拟合新数据,并给出预测值。传统的分类技术通常是基 于平衡数据的,然而在现实生活中,不平衡数据随处可见,在异常检测,网络安全 检测,信用卡欺诈等方面,由于少数类样本获取的成本及难度很高,数据类别分布 往往存在着严重的不平衡。如何通过改进已有的分类算法,使之适应于不平衡数 据,是当今机器学习研究领域的一个热点问题。
极限学习机(ELM)是近些年提出的一种学习方法,通过最小二乘法拟合训练样 本,而不需要迭代,具有运行速度快,泛化能力强的优点。然而 ELM 在应对不平衡 数据的时候,也会遇到与传统机器学习算法一样的问题:对少数类样本的识别准确 度会急剧下降。 针对此类问题, 前人提出了加权极限学习机 (WELM) 的算法。 WELM 通过为不同类的样本赋予不同的权重,来适应不平衡数据,取得了很好地效 果。FSVM-CIL 则引入了模糊集的概念,并将其应用于支持向量机(SVM),通过对 样本进行模糊加权,以适应不平衡数据。
本文将上述二者的思想相结合,主要工作如下:
(1) 耦合样本全局先验分布信息的模糊加权极限学习机。在 WELM 的基础上, 借鉴 FSVM-CIL 的做法,引入了模糊集的概念,通过对每个样本至同类质心或初始 分类面的距离来设计隶属函数,进而分配权重,对 WELM 算法进行了改进。
(2) 耦合样本局部先验分布信息的模糊加权极限学习机。针对仅利用样本全局先 验分布信息的几个缺点,充分挖掘样本的局部信息,包括邻域不纯度信息、密度信 息及样本偏离度信息,并把它作为分配权重的重要考量因素,以为样本进行个性化 加权。
(3) 通过大量实验,验证了上述两类算法的有效性与可行性。
毕业论文关键词:极限学习机, 不平衡数据, 模糊加权, 先验信息
Abstract With the development of science and technology, more and more data have occurred。 How to better use these data is becoming an important issue which people should provide the related solutions。 The aim of machine learning is to learn general law from data we have known, and then to apply it in new data。
Classification technology is an important branch in machine learning field, which train a classifier by analyzing the known data and then to make it suit new data, and to give the predictions。 Traditional classification technologies are generally based on balanced data, but in the real world, imbalanced data can be found everywhere, e。g。, anomaly detection, detection of network security, and credit card fraud。 Since the large cost and difficulty to collect minority class samples, we often obtain some serious imbalance data sets。 Therefore, improving the current classification algorithms to suit imbalanced data has been a hot issue in machine learning field。
Extreme learning machine (ELM) is a novel learning algorithm which uses least square method to fit training data without iterations。 The algorithm have two main merits as follows: fast running speed and strong generalization ability。 However, when it is applied to deal with imbalance data, the recognition accuracy of the minority class samples will also decline sharply。 To deal with this issue, weight extreme learning machine (WELM) have been proposed。 WELM fits imbalance data better by distributing different weights for different classes。 FSVM-CIL is another weighted algorithm which is adopted in SVM。 It suits imbalance data set by fuzzy weighting。 不平衡模糊加权极限学习机算法研究:http://www.youerw.com/jisuanji/lunwen_94100.html