Python网络爬虫设计与实现

摘要随着互联网的飞速发展，人们对于网络信息的需求越来越多，同时互联网中信心量变得十分巨大也颇为复杂。自一九九零年搜索引擎的建立以来，人们对于网络信息的获取变得方便起来。现在比较有名的例如Google，百度等。而网络爬虫技术就是搜索引擎中不可或缺的一环，能够让搜索引擎变得更快，更准确以及更加方便。86165

本课题的主要研究内容是网络爬虫的设计与实现。本文研究了网络爬虫主要采用的策略，工作流程及建立方法。通过Python实现一个基于深度优先策略的单线程网络爬虫程序。通过导入Python中的urillb2模板中的urlopen函数实现一个可以爬取给定网页源代码的网络爬虫程序。通过建立这个网络爬虫来学习Python语言，了解互联网相关协议的工作方式以及了解网络爬虫的建立方法和工作方式。

本文首先概述了网络爬虫的背景，接着介绍了网络爬虫的工作原理及所用技术，最后实现了一个简易网络爬虫软件，实验验证该软件可获取给定网页的源代码。

毕业论文关键词：网络爬虫；Python；网络协议；源程序；爬虫

Abstract With the rapid development of the Internet, people need more and more network information。what’s more,Internet information become very quite complex。 Since the establishment of search engine in 1990, it is more convenient for people to look for network information。 At present,well-known search engines have become popular,such as Google, Baidu and so on。 And the web crawler technology is the integral part of a search engine,which can make search engine faster, more accurate and more convenient。

The main content of this paper is the design and implementation of the web crawler。 In this paper, the topic mainly study the strategy, the work flow and the establishment method of the web crawler。 The topic achieve a depth first strategy based on a single threaded network reptiles procedures through Python。 The topic achieve a given web page which can climb the source code of the web crawler program through the introduction of urillb2 in the python template in the urlopen function。 The topic learn Python language, understand the work of the Internet related protocols, and understand of the establishment of web crawler methods and work through the establishment of this web crawler。

At first,this paper outlines the background of the web crawler。Then introduced the working principle of the web crawler and the technology used, and finally realized a simple web crawler software, the experiment proved that the software can get the source code of a given web page。

Keywords:Web crawler; Python; network protocol;source program;crawlers

第一章绪论 1

1。1 网络爬虫的背景 1

1。2 研究方法，步骤和措施等 1

第二章相关技术介绍 2

2。1 Python介绍 2

2。2 Python软件的安装 2

2。3 Python常用模块介绍 3

2。3。1 urillb2模板详解 5

2。4 Python 常用库介绍 7

2。5 在Python运行中经常遇到的17个错误 9

2。6 网络爬虫原理 11

2。7 网络爬虫策略 11

2。7。1 基于爬虫的策略 11

上一篇：FPAG汽车内腔喷涂机器人轨迹优化

下一篇：Android+CC2541单片机智能健康计步器设计

Python网络爬虫设计与实现

基于PageRank算法的网络数据分析

基于神经网络的验证码识别算法

基于网络的通用试题库系...

python基于决策树算法的球赛预测

网络常见故障的分类诊斷【2055字】

网络安全的研究【1797字】

网络信息安全技术管理的...

我国风险投资的发展现状问题及对策分析

安康汉江网讯

网络语言“XX体”研究

老年2型糖尿病患者运动疗...

麦秸秆还田和沼液灌溉对...

LiMn1-xFexPO4正极材料合成及充放电性能研究

张洁小说《无字》中的女性意识

ASP.net+sqlserver企业设备管理系统设计与开发

互联网教育”变革路径研究进展【7972字】

新課改下小學语文洧效阅...