多搜索引擎信息采集分析系统的设计与实现

摘要：搜索引擎作为互联网最大的数据共享中心，已经成为各种数据应用的主要信息来源，即数据应用首先到搜索引擎上采集数据，然后经过分析加工后向用户提供服务。不同的搜索引擎各有自己的特色，所提供的数据也不完全相同。例如百度搜索对国内网站的覆盖率较好，而Google搜索对国外网站的覆盖率更高。因此，同时从多个搜索引擎上采集数据，并进行分析，则能够获取更加全面的信息，为用户提供更加优质的服务。为了实现该目标，本课题设计并实现了一个多搜索引擎信息采集分析系统。该系统主要包括四个功能模块，分别是任务管理模块、信息采集模块、搜索结果分析模块和用户管理模块。任务管理模块包括搜索任务的创建、查询和删除；信息采集模块负责根据搜索任务到各搜索引擎上执行搜索，并获取搜索结果；搜索结果分析模块对各搜索引擎返回的结果进行分析以得到每一条搜索结果的标题、URL、图片、内容等信息；用户管理模块实现对用户基本信息的管理。该系统不直接为用户服务，而是为其他应用程序提供数据服务。系统根据搜索任务，到各搜索引擎上采集信息，并将搜索结果分析后放入数据库，然后提供给其他应用程序使用。经过测试，该系统运行良好，达到了预期的设计目标。80742

关键词：搜索引擎；信息采集；HTML分析；正则表达式

Design and Implementation of Information Collection and Analysis System through multiple Search Engines

Abstract: As the biggest data sharing center on the Internet, search engine has become the main information source of a variety of data applications。 The data applications collect data from the search engine, analysis the information and serve for users。 Different search engines have their own characteristics, therefore the data provided is not entirely the same。 For example, the Baidu works well on Chinese websites, while Google works well on foreign websites。 Therefore, collecting data at the same time from multiple search engines and then carry on the analysis can obtain more comprehensive information to provide users with more high-quality service。 In order to achieve this goal, this topic designs and implements an information collection and analysis system through multiple Search Engines。 This system mainly includes task management module, information collection module, search results analysis module and the user management module。 Task management module includes the creation, query and delete of search tasks。 Information acquisition module performs search tasks, and gets the results from various search engines。 Search result analysis module analyzes the search results to get the title, URL, image, content and other information。 User management module manages the basic information of the user。 The system does not directly serve for the user。 It serves for other applications to provide data services。 According to the search task, the system gathers information from search engines, and saves it into the database, and then makes it available to other applications。 The testing results show that the system runs well。

Keywords: Search engine; Information acquisition; HTML analysis; Regular expression

1 绪论 1

1。1 课题的研究背景 1

1。2 国内外研究现状与存在的问题 1

1。3 本篇论文结构 2

2 系统需求分析 3

2。1

上一篇：java+sqlserver酒店管理系统的设计+源代码+ER图

下一篇：java+mysql酒柜管理系统的设计与实现

多搜索引擎信息采集分析系统的设计与实现

银行行办公信息服务系统【1544字】

论信息技术茬外语教學中的應用【3270字】

计算机信息管理茬第三方...

电子商务中信息不對称问题研究【2365字】

用友NC信息系统的实施應用实践研究【3307字】

用VisualBasic实现多画面播放功能【1344字】

浅析搜索引擎的原理及发展前景【2973字】

LiMn1-xFexPO4正极材料合成及充放电性能研究

网络语言“XX体”研究

麦秸秆还田和沼液灌溉对...

安康汉江网讯

张洁小说《无字》中的女性意识

新課改下小學语文洧效阅...

老年2型糖尿病患者运动疗...

互联网教育”变革路径研究进展【7972字】

我国风险投资的发展现状问题及对策分析

ASP.net+sqlserver企业设备管理系统设计与开发