摘要随着互联网技术的迅猛发展,微博(microblog)近年来获得了爆炸式的发展,吸引着越来越多的网民参与。作为国内最大的微博网站新浪微博(sina Microblog)不仅成为用户在线交流、话题讨论、表达自己见解的重要虚拟场所,同时也成为某些重大突发事件发酵、爆发、传播、演化并影响社会舆论的重要媒介。因此本文重点关注特定事件(如H7N9事件)在新浪微博上的话题发展和演化,这对了解大众关心的问题,对舆情监控、信息安全等都有十分重要的意义。33440
本文以在新浪微博上抓取到的H7N9文本数据为例,主要借用Latent Dirichlet Allocation(LDA)模型进行微博子话题的抽取,然后根据LDA模型结果进行话题演化分析研究。首先,利用Gibbs Sampling 进行LDA模型的参数推理。之后,根据LDA模型生成的文档-话题分布矩阵和话题-词语分布矩阵去推断h7n9话题中的子话题,并用话题分布概率的平均值作为话题强度。最后,以致病原因、疫苗研制、患病情况、防控措施、对家禽业的影响这五个子话题为例析了这些子话题在各个月份的演化。实验结果证明,本文方法可以比较有效地发现微博子话题并追踪其演化过程。
关键词 微博话题 LDA模型 话题演化 话题强度 微博舆情 毕业论文外文摘要
Title Research on the Subtopic Discovery and Evolution of Microblog
Abstract
With the rapid development of Internet, microblog has a explosive development in recent years, and attracts more and more users. As the largest domestic microblog, Sina Weibo has been an important virtual place where people communicate each other, discuss events and express their own opinion. At the same time, it has become a important media of sudden event simmer, erupt, spread, and influence social opinion. Therefore, the topic discovery and topic evolution of specific events, such as H7N9 event ,on Sina Weibo is researched in this paper. It is very important to understand the problem of public concern, the public opinion monitoring, information security.
In this paper, we mainly takes the H7N9 dataset collected from Sina Weibo as an example, then generate the main topics on these dataset by using Latent Dirichlet Allocation(LDA) model and analyze the evolution of each topics. Firstly, we infer parameters of LDA model by Gibbs Sampling. Then, we use the document-topic distribution matrix and topic-word distribution matrix to infer the subtopic of H7N9 topic, and use the average distribution probability of the topic as the topic intensity. At last, we analysis the evolution of five subtopics in each month ,these five subtopics are as follows: pathogenesis, vaccine development, prevalence, prevention and control measures, the impact of poultry industry . The experimental results show that our method can discover the microblog subtopic and track its evolution process effectively.
Keywords Microblog topic, LDA model, net-pubic opinion monitoring, Topic evolution,Topic intensity
目 次
1 绪论 1
1.1 研究背景 1
1.2 研究意义 1
1.3 研究思路 2
1.4 论文组织结构 3
2 基础理论和相关关键技术概述 5
2.1 网络舆情分析概述 5
2.1.1 网络舆情的概念及研究现状 5
2.1.2 情报学方法在网络舆情研究中的应用 6
2.2主题模型发展简述 7
2.2.1 传统的话题表示模型 7
2.2.2 LDA概率生成模型 8
3 话题的抽取和演化原理 10