毕业论文关键词 微博 网络爬虫 舆情分析
Title A public opinion analysis platform based on Weibo spider and burst event detection
Abstract Along with the information technology development in 21st century, user-operated media is becoming a popular way for cyber citizen to record their life and express their opinions。 And SINA WEIBO is becoming the most famous user-operated media in china mainland, yet the number of its user is still increasing。 For the time being, WEIBO is the wind vane of public opinion。 But because of some special character of WEIBO, the data collection and analyzing is pretty difficult, so we can't get enough data to do the statistics and analysis work。
Based on the study of existing software framework, and focus on some of character of WEIBO, I designed a public opinion analysis platform which can scrawl tweets from WEIBO and follow some keyword continuously。 This platform can get linguistic data of large scale, monitor WEIBO half-real-time, and provide data for the following data mining。
Keywords WEIBO; NetworkSpider; Public opinion
目 次
1 绪论 1
1。1 课题的背景和意义 1
1。3 研究内容和论文结构 1
2 微博爬虫设计 3
2。1 微博爬虫概述 3
2。2 微博爬虫基本组成和架构简介 3
2。2。1 爬虫实例 3
2。2。2 调度器 4
2。2。3 账号登陆模块 4
2。2。4 数据收集器 4
2。2。5 跟踪和展示平台 4
2。2。6 RabbitMQ 5
2。2。7 架构图解 5
2。3 微博的爬取和解析 5
2。3。1 会话保持 5
2。3。2 验证码处理 6
2。3。3 微博网页获取和解析 6
2。3。4 微博的内部数据结构定义 7
2。3。5 AJAX加载的爬取 8
2。3。6 错误处理 8
2。4 存储的设计 8
2。4。1 原始HTML存储 8
2。4。2 关系型数据的存储 8
2。5 调度和通信 10