基于文本分析的敏感网页识别舆情分析数据挖掘
时间:2018-09-04 21:07 来源:毕业论文 作者:毕业论文 点击:次
摘要随着互联网技术的迅猛发展,信息交换和传播的方式也越来越多样化。网络媒体作为一种新的信息传播形式,已经深入人们的日常生活。网络言论活跃已经达到前所未有的程度,通过网络来表达观点传播思想产生舆论压力,已经达到任何机构和部门都无法忽视的地步。网站数量和存在于互联网上的网页数信息数迅速增长,以及电子政务和电子商务的广泛开展,对于网络信息监管部门来说,如何加强对网络舆论的及时检测,以及对网络舆论危机的积极化解,对于文护社会稳定、促进国家发展具有重要的意义,因此及时获知网络上的敏感信息以及传播速度的增长趋势就变得尤为重要。27优尔 “舆情分析”是针对网络信息监管部门需求开发的一种网络信息审查系统,可以对指定网站上的文本信息进行获取和分析。本系统的研究是在网络中各网站的众多信息中辨别出公共信息,以进行相应的敏感信息匹配和分类预警分析。 关键词:舆情分析;语义倾向性分析;数据挖掘 Abstract With the rapid development of Internet technology, information exchange and dissemination of more and more perse. Online media as a new form of information dissemination, and has penetrated people's daily lives. The network remarks active has reached unprecedented levels, the pressure of public opinion to express the point of view dissemination of ideas, has reached the point where any agencies and departments that can not be ignored by the network. The rapid growth of the number of sites and the number of pages that exist on the Internet, the number of information, as well as a wide range of e-government and e-commerce to carry out, for Network Information regulators, how to enhance the timely detection of the network of public opinion, and actively resolve the network of public opinion crisis , is of great significance for the maintenance of social stability and promoting development of the country, so be informed of sensitive information on the network as well as the propagation velocity of growth trend becomes particularly important. Typical site-sensitive information classification and early warning and analysis system is the network regulatory authorities needs to develop a network information review system, the text information on the designated website access and analysis. The system is to identify public information in many network site for sensitive information matching and classification of early warning analysis. Keywords: Site sensitive information classification and early warning and analysis system; semantic bias; data mining 目 录 第一章 绪论 4 1.1 引言 4 1.2 研究背景 4 1.3 国内外研究现状 5 1.3.1国外研究现状 5 1.3.2 国内研究现状 6 1.4 论文的研究内容 6 1.5 论文的结构安排 7 第二章 系统设计分析 7 2.1 网站敏感信息基本概念 7 2.2 网络敏感信息特点 8 2.3 舆情分析功能 8 2.4 网站敏感信息分类预警分析的作用 8 第三章 系统开发关键技术 9 3.1 预警分析系统行为模式识别技术 9 3.2 自动分类和聚类技术 9 3.3 内容分析技术 10 第四章 舆情分析的设计与实现 12 4.1 数据库设计 12 4.2 基于web文本分析的敏感网页数据挖掘 13 4.3 典型网站敏感信息分类预警分析模型设计 15 4.4 舆情分析实现 15 第五章 总结与展望 31 (责任编辑:qin) |