摘要:    海量web日志给企业带来的是大数据存储、大数据处理和大数据挖掘的挑战,目前,基于传统的日志挖掘工具已经很难满足现有海量web日志的分析需求,所以,本文提出基于云平台的web日志挖掘系统,使用分布式框架hadoop,既能够满足海量web日志的存储,也能满足企业快速处理日志的需求,主要是能够从海量web日志中挖掘出更有价值的信息,让企业能够更快的做出调整,提高企业竞争力。39468
    本文提出基于云平台的自动web日志挖掘系统。该系统中主要使用hadoop分布式框架作为基础,开发基于mapreduce并行计算框架的数据清洗算法进行日志清洗,利用hadoop的HDFS作为分布式存储,并使用hive进行日志挖掘。挖掘的结果使用sqoop进行数据迁移,Hbase存储日志明细,最后使用mysql作为日志挖掘结果展示平台进行数据展示和查询。
    本文最后对系统进行测试,基于hadoop的web日志挖掘系统比较传统日志挖掘,拥有并行计算、全自动化、高可靠性、高可扩展性、高鲁棒性、定时挖掘等优点,能够更快速而准确地挖掘出企业所需要的内容信息,测试表明,相比较于传统单机挖掘,本文提出的web日志挖掘能够提高3.6倍速度,同时,可以通过增加集群的规模,进一步提高挖掘速率。整个日志挖掘系统能够满足企业的数据安全,传输稳定,批量处理,并行计算,自动分析等需求。
毕业论文关键词:    hadoop;日志挖掘;hive;sqoop;mapreduce;HDFS
Web Log Mining And Research Based On The Cloud Platform
Abstract:     Massive amounts of web logs bring challenges of data storage, data process and data mining. At present, the traditional log mining tools have been difficult to meet the analysis needs of the mass web logs. Therefore, this paper presents a web log mining system based on cloud platform. The application of distributed framework hadoop meets both the storage of the mass web logs and enterprise demand for fast processing logs. It can dig out more valuable information from the massive web logs, so as to make the enterprise adjust more quickly, improve the competitiveness of enterprises.
This paper puts forward the automatic web log mining system based on the cloud platform.This system mainly uses the Hadoop distributed framework as the basis, develops a data cleaning algorithm based on the mapreduce parallel computing framework to realize log cleaning. It applies the HDFS of hadoop as the distributed storage, and applies the hive to realize log mining. The results of mining use sqoop for data migration, Hbase storage log details, and use mysql as display platform of log mining results to display and inquiry the data.
At last, this paper tests the system. The web log mining system based on the hadoop has advantages of parallel computing, full automation, high reliability, high scalability, high robustness and timing mining in comparing with traditional log mining system. It can dig out the information required by the enterprise more quickly and accurately. Tests show that the mining speed of the web log mining system proposed in this paper is 3.6 times of the traditional system. At the same time, it can further improve the mining speed through increasing the size of cluster. The whole log mining system can meet the enterprise requirements of data security, stable transmission, batch process, parallel computing and automatic analysis.
Keywords:    hadoop;Log mining;hive;sqoop;mapreduce;HDFS
目录
摘要    i
Abstract    i
目录    iii
1 绪论    1
1.1 研究背景    1
1.2 国内外研究现状    1
1.3 本文研究内容和意义    3
上一篇:银行叫号系统的程序设计
下一篇:基于hadoop的数据挖掘算法研究与实现

基于Apriori算法的电影推荐

考证平台静态网页设计与制作

基于PageRank算法的网络数据分析

基于神经网络的验证码识别算法

基于网络的通用试题库系...

python基于决策树算法的球赛预测

基于消费者个性特征的化...

志愿者活动的调查问卷表

AT89C52单片机的超声波测距...

国内外图像分割技术研究现状

公寓空调设计任务书

C#学校科研管理系统的设计

中国学术生态细节考察《...

承德市事业单位档案管理...

神经外科重症监护病房患...

医院财务风险因素分析及管理措施【2367字】

10万元能开儿童乐园吗,我...