摘要近年来,随着互联网的普及,互联网的信息量更是呈爆炸式的增长。搜索引擎的出现使得人们可以迅速、准确、方便的从繁冗的网络信息中获取自己需要的信息。但是大多数的搜索引擎并不能很方便的获取精准信息。那么我们需要如何改进才能达到这样的效果已经成为当今社会网络研究领域的热点之一。63866
现如今虽然搜索引擎给人们的生活带来了很多便利,但是大多数搜索引擎并不能精准的选择出用户所需求的信息,这让用户浪费了很多不必要的时间,大大影响了用户的使用。为了精准快速的搜索出用户所需要的信息,垂直搜索引擎应运而生。垂直搜索引擎对行业中大量的网站和网页信息实施结构化分析,获取其关键字段建立索引,不仅可以提供普通网页索引,而且对商业信息加工后提取的结构化信息。
本论文先描述了搜索引擎的现状及研究意义,重点研究了垂直搜索引擎的网页内容提取、存储、显示等。
毕业论文关键词 微博热点话题,垂直搜索引擎,
毕业设计说明书(论文)外文摘要
Title Microblog hot topic extraction technology research
Abstract In recent years, with the popularity of the Internet, the Internet information of explosive growth. The emergence of the search engine allows people to quickly, accurately and easily from the onerous in the network information to obtain the information you need. But most of the search engine is not very convenient to obtain accurate information. So we need to how to improve to achieve such effect has become one of hotspots in the field of network research in today's society.
Nowadays although search engines has led to a lot of convenience to people's life, but most search engines do not accurate selection of information demand by users, it allows users to waste a lot of unnecessary time, greatly influence the use of user. In order to accurately fast to search out the needed information, vertical search engine arises at the historic moment. Vertical search engines the information industry, a large number of web sites and pages to a structured analysis, obtain the key field index, not only can provide a common web page index, and to extract structured information after business information processing.
This paper first describes the current situation and research significance of search engine, vertical search engine is mainly studied web content extraction, storage, display, etc.
Keywords: Microblog hot topics, vertical search engines
1 绪论 1
1.1 概述 1
1.3 研究目的及意义 2
2垂直搜索引擎的工作原理和技术 2
2.1垂直搜索引擎概述 2
2.2 网页内容的抓取的软件工作方式 4
2.3 网页存储器和分析索引器 5
2.4查询器和用户接口的设计 7
2.5 LUCENE索引技术 8
3 系统需求分析 9
3.1概述 9
3.2 系统需求分析 10
3.2.1 设计目标 10
3.2.2 设计原则 11
3.3 系统可行性分析 11
3.4 运行需求 11
3.4.1 硬件配置