当前,社会标签系统分析或挖掘逐渐成为Web2.0应用和服务中的一个重要基础。作为热点应用之一的博客系统,用户标签或机器自动生成标签正被广泛地使用。目前尚缺乏针对大规模中文博客系统的用户标签数据挖掘工作。
本文针对科学网博客6万多篇博文数据,设计与开发了一个中文博客标签及标签云图自动生成系统,主要工作包括如下三个方面内容。
首先是单篇博文标签生成及云图展示,即:利用单文档关键词自动抽取技术,实现对单篇博文的标签自动推荐和标签云图展示,并生成标签权重图。
其次是博主个性化标签生成,即:对每个博主的文章进行标签统计,为每个博主自动推荐10个标签,并进行云图展示。
最后是流行标签的时间走势分析图生成,即:统计所有用户标注最频繁的前50个标签,画出标签的时间走势图,并分析其变化原因。
本文实验结果表明,与科学网博客网站上自身提供的标签自动推荐功能相比,本系统的标签自动推荐功能效果更好。系统给出的博主标签能够反映出博主所在的研究领域与兴趣爱好。利用科学网用户标签时间走势图,我们可以发现和追踪热点事件并分析其变化原因。10404
最后对本文工作进行了总结并给出了该系统进一步完善的研究目标。
关键词:社会标签系统,中文博客,标签自动生成,标签云图
毕 业 论 文 外 文 摘 要
Title Mining Social Tagging System — Automatically Generation of Chinese blogs’ Tag and Tag Cloud
Abstract
Currently, Social tagging system analysis or mining has gradually become an important foundation for Web 2.0 applications and services. As one of the hot applications, user tag or machine automatically generated tag for the blog system are widely used. Currently a lack of user tag data mining system for large-scale Chinese blog work.
Based on the data for Science blog more than 60,000 articles, we design and develop a system of the automatic generation of a Chinese blog tags and tag cloud, the principal tasks include the following three aspects.
The first is automatic tag and tag cloud generation for a single blog text, namely: the use of automatic extraction technology of the single-document keywords, automatically recommend a list of tags and tag cloud for a single blog entries and the right to generate a tag-weight graph.
The following is generation of bloggers’ personalized tags, which compiles statistics of blogger’s user tags for each article and automatically recommends the 10 tags and show as a tag cloud for each blogger.
Finally, it is the generation of time trend analysis of popular tags map, which gets the statistics of the most frequent top 50 tags marked by all users and analyze the reasons for the changes.
The experimental results show that compared with the feature of tag automatically recommendation offered by the Web of Science blog, our system can function better. The tags of the bloggers given by the system reflect the bloggers’ research fields and interests. Using the time charts of user tag of scientific net blog, we can find and track hot events and analyze the reasons for the changes.
In conclusion, the present work is summarized and given the research objectives of the system to further improve.
Keywords: Social tagging system,Chinese blog,tag generation automatically,
tag cloud
目 录
1 绪论 1
1.1 选题背景 1
1.2 研究意义 1
1.3 本文的研究思路及内容 2
1.3.1本文的研究思路 2
2 文献综述 4 社会标签系统挖掘研究中文博客标签及标签云图的自动生成研究:http://www.youerw.com/jisuanji/lunwen_9456.html