摘要多元统计分析,也被称之为多变量统计分析,是统计学的重要分支之一,其中最常用的就是聚类分析方法,迄今为止已被广泛应用于自然科学和社会科学等多个领域。因为聚类分析方法通常需要运用复杂的数学理论,所以通常情况下不能进行手工运算,需要通过计算机以及统计软件的撑持,而R语言作为一个不收任何费用并且数据源对外公开的软件,同时能够进行有效的统计分析和制图,已经成为海内外大多数统计学家钟爱的数据分析工具。78429
聚类分析是以事物自身特征的研究为依据对被聚类对象进行分类的方法。聚类分析的研究目标是对于同一类中的对象而言,它们拥有很大的相似性,而对于不相同的类中的对象来说,它们具有很大的差异性。聚类分析要处理的问题就是在以没有先验知识的前提之下,实现满足这类要求的类或簇的聚合。
论文第一章介绍聚类分析的背景、国内外发展现状以及研究方法和工具。第二章主要介绍聚类分析的基本概念、数据类型。第三章主要介绍了系统聚类研究步骤,R软件实现应用。第四章实证分析主要以南京地铁3号线沿线房价变动为例,按系统聚类法对数据进行初步统计分析。第五章是总结本文研究情况,并对于今后的研究学习进行展望。
关键词:聚类算法;系统聚类;R语言
ABSTRACT
Multivariate statistical analysis is the important branch of statistics。 And It’s also called multivariate statistics。 Clustering analysis is the most commonly used。 At present, clustering analysis is used in nature science and social science。 On account of involving complicated mathematical theory, clustering analysis need the support of computer and statistical software instead of hand computation。 R software is free and open-source with powerful statistical analysis function and excellent statistical cartography function。 Nowadays ,R software is the data analysis tool that statists like not only at home but also on abroad。
Clustering analysis is the method which partition class to the clustered object as required of thing’s characteristics。 Clustering is the process of grouping the data into classes or clusters so that objects within the same cluster have high similarity in comparison to one another, but are very dissimilar to objects in other clusters。 Clustering processes are always carried out in the condition with no pre-known knowledge, so the most research task is to solve that how to get the clustering result in this premise。
The first chapter of paper introduce the background, the domestic and foreign development present situation, the study methodology and tools。 The second chapter introduce basic concepts, data type。 The third introduce the research procedure of the hierarchical clustering, the realization and application of R software。 The fourth chapter is demonstration and take the housing price changes along the MTR in NanJing as an example。 The fifth chapter summarize the paper and give expectation。
KEYWORD: Clustering algorithm Hierarchical clustering R programming language
目 录
第一章 绪论 1
1。1 聚类分析的背景 1
1。2 聚类分析的国内外研究现状 2
1。3 研究工具—R语言 5
第二章 聚类分析 6
2。1 聚类分析的基本概念及定义 6
2。2 聚类算法的分类 6
2。2。1 基于层次的方法