摘要现如今的新浪微博已经很深入的影响着人们的日常生活,随着智能手机的迅速普及,人们可以随时随地发布自己的状态,具有实时性和信息碎片性等特点。随着新浪微博功能不断完善,开始形成自己的生态系统,微博用户之间可以相互关注,可以评论、转发和赞自己感兴趣的微博内容,同时还能发布长微博,具有很强的互动性和灵活性。作为现如今的第一社交媒体,新浪微博庞大的用户群和因此而产生的海量数据是值得我们很好的研究的。本文研究了微博数据的提取、话题检测和微博内容的相应情感分析。23120
传统的网络文本数据提取一般是利用图遍历的方法通过网络爬虫搜集信息,而本文是利用新浪微博提供的API接口去获取自己想要的微博中的内容。
本文介绍了相关的微博话题检测大致流程和相应算法,本文主要调用中科院ICTCLAS 2014分词系统里已有的关键词提取算法去获得微博话题。从而筛选相应的微博内容,在此基础上,通过情感分类对微博内容进行模型化表示,进一步转换为能通过weka处理的数据格式,进而通过机器学习来进行情感分析。
毕业论文关键词:微博;数据提取;话题检测;机器学习;情感分析
Title Microblogging hot topic extraction and analysis
        techniques                                               
Abstract
Today's sina weibo has been very deeply into and affect people's daily lives,with the rapidly growing popularity of smart phones,people just need to be anywhere that people can publish their own state through  finger.So it has real-time information and other characteristics of debris.And now sina weibo function continuously improved, began to form their own ecosystem,weibo users can mutual concern and comment, forwarding, like mutual concern people microblogging content, which has a strong interaction and flexibility makes microblogging has a very strong social features.As is now the first social media, weibo huge user base and huge amounts of data thus generated is worth a good study. This paper studies the microblogging data extraction, topic detection and corresponding emotions microblogging content analysis.
Traditional network text data extraction using graph traversal general idea of gathering information through the web crawler, but this paper is to use API interface provided by sina weibo to get what you want microblogging content, only to realize it is convenient to extract data and extract efficiency is very good.
In introducing the relevant microblogging topic detection process and the corresponding algorithms, the paper calls the CASICTCLAS 2014 segmentation system existing keyword extraction algorithm to obtain microblogging topic. Thereby filtering the corresponding micro-blog content, on this basis, through emotional dictionaries for the micro-blog content processing, expressed as processed by weka data format, and then through machine learning for sentiment analysis.
Keywords : microblogging; data acquisition; topic detection; machine learning; sentiment analysis
目录
摘    要I
AbstractII
1  绪论.1
  1.1  研究背景.1
  1.2  研究现状.2
  1.3  研究的内容和意义.2
      1.3.1  研究内容.2
      1.3.2  研究意义.3
  1.4  论文组织结构.3
2  相关背景知识介绍.4
  2.1  微博.4
      2.1.1  微博的发展历程、新浪微博及其特性.4
上一篇:基于Android的图书管理系统中学生端挂失模块设计
下一篇:深空目标中段飞行仿真中的航迹生成软件的开发

微博热搜”机制的创新传...

语音信号的基音周期提取方法研究

社交网络上用户建模融合...

微博社交网络社区发现方法的研究

深度学习的图像语义提取研究

基于安卓的微博客户端设计

社交网络上用户建模融合...

医院财务风险因素分析及管理措施【2367字】

公寓空调设计任务书

10万元能开儿童乐园吗,我...

中国学术生态细节考察《...

志愿者活动的调查问卷表

AT89C52单片机的超声波测距...

国内外图像分割技术研究现状

C#学校科研管理系统的设计

承德市事业单位档案管理...

神经外科重症监护病房患...