摘要从人工智能的角度来看,粗糙集是一种模仿人脑的思维活动和认知过程用来处理不精确信息的数学工具。但随着处理的数据集规模不断增大,数据复杂度不断增加,传统的粗糙集理论已不能适应于现实工程应用的需求。为此,必须拓展或泛化传统的粗糙集模型。本文从测试代价敏感和样本选择两个角度出发,以粗糙集数据建模和属性约简为研究内容,获得了如下所示的研究成果:
1.基于测试代价的模糊集属性约简。模糊粗糙集是一种用于处理连续型数据的数学工具。然而,数据集的分类与约简都会产生测试代价。为解决这一问题,本文将测试代价作为一个评价指标,提出了一种基于遗传策略的模糊集属性约简算法。从实验结果可以得知,新的算法能够在近似质量保持不变或变化较小的前提下,获得一个最小测试代价的约简。84451
2.基于样本选择的启发式属性约简。传统的启发式算法使用了决策系统中的所有样本,但实际上,每个样本对于约简的贡献程度是不同的,这在一定程度上增加了启发式算法的时间消耗。为解决这一问题,提出了一种基于样本选择的启发式算法,该算法主要分为3步:首先从样本集中挑选出重要的样本;然后利用选取出的样本构建新的决策系统;最后利用启发式算法求解约简。实验结果表明,新算法能够有效地减少约简的求解时间。
毕业论文关键词:属性约简;测试代价敏感;样本选择;粗糙集;
Abstract From the point of view of artificial intelligence, rough set is one kind of the mathematical tools which simulate the human thinking to solve the problems about inaccurate information。 However, with the increasing of the data set and data complexity, traditional rough set theory can be no longer adapted to solve the real engineering application problems。 Therefore, we should expand the traditional rough set model to solve this problem。 In this paper, two different viewpoints are considered, they are test cost sensitivity and sample selection。 By studying rough data set model and attribute reduction in terms such two viewpoints, the following research results are obtained:
1。 A fuzzy set attribute reduction based on test cost。 Fuzzy set is one kind of the mathematical tools which deals with problems of continuous data。 However, the classification and reduction in data set can generate test cost。 To solve this problem, by putting the test cost as a target of evaluating, a fuzzy set attribute reduction based on test cost is improved。 The experimental results show that the proposed algorithm can get a reduction of least test cost which remains the approximate quality。
2。 A heuristic attribute reduction algorithm which based on sample selection。 Heuristic attribute reduction algorithm with greedy strategy is one of the widely used approaches to compute reduct。 Traditional heuristic algorithm used all of the samples in the information system。 However, it should be noticed that given a data set, different samples contribute different importances to find reduct, it follows that the time efficiency of the heuristic attribute reduction algorithm suffers from the redundant samples。 To solve such problem, a heuristic attribute reduction algorithm based on sample selection is proposed。 The algorithm is composed by three stages: firstly, the most informative samples are selected; secondly, a new information system is formed by using these selected samples; finally, one of the reducts can be computed by using heuristic algorithm。 The experimental results show that the proposed algorithm can efficiently reduce the computational time。
Keywords: Attribute reduction; Test-cost sensitivity; Sample selection; Rough set
目 录
第一章 绪论