摘要:随着信息时代的来临,无纸化办公得以普及,字符识别技术被广泛应用,表格识别作为字符识别的一个主要方面也成为了研究热点之一。本文首先对采集的表格图像进行图像二值化操作,并进行倾斜矫正,接着分别采用了Hough变换和形态学处理两种方法对表格线进行提取,然后应用八邻域内轮廓跟踪算法改进了单元格提取的方法及完成了字符的断痕修复,并采用投影法对提取的字符图像进行分割,用归一化和图像细化对分割后的图像进行加工,分析字符的结构特征和统计特征并对这些特征进行提取,最后通过BP神经网络对字符进行识别。将该方法应用到平时成绩单识别系统中,识别率可达到80%左右。30683 毕业论文关键字:表格定位;断痕修复;BP神经网络;字符识别
Research and Realization of Automatic Document Recognition System Based on Image Processing
Abstract: With the advent of the information age, paperless office to be popular, character recognition technology is widely used. Form recognition as a major aspect of character recognition has become one of the research hotspots. In this paper, we first perform the image binarization operation on the collected table images, and then use the Hough transform and the morphological processing to extract the table lines respectively. Then we use the eight-neighborhood contour tracking algorithm to improve the cell. First, the method of extracting and the completion of the character of the fracture repair are used. Then, projection method is used to extract the character image segmentation. Normalization and image refinement of the segmented image processing is used to analyze the character structure and statistical characteristics These features are extracted, and finally the characters are identified by BP neural network. The method is applied to the usual transcripts identification system. In the end, the recognition rate can reach about 80%.
Key words: table positioning; fracture repair; BP neural network; character recognition
目录
摘要 1
关键字 1
Abstract. 1
Key words 1
1课题背景 1
1.1研究意义 1
1.2国内外研究现状 2
1.2.1国外研究现状 2
1.2.2国内研究现状 2
1.3研究目的与内容 2
1.3.1研究目的 3
1.3.2研究内容 3
2系统设计与实现 3
2.1系统简介 3
3图像预处理 4
3.1图像二值化 5
3.2图像倾斜矫正 5
4表格字符定位与提取 6
4.1概述 6
4.2表格线提取方法 7
4.2.1用Hough变换检测直线 7
4.2.2基于数学形态学提取表格线的方法 8
4.3表格字符定位与提取 9
4.3.1坐标法定位单元格 9
4.3.2定位含有越出字符的单元格 10
4.3.3断痕修复 14
4.4字符图像预处理 17
4.4.1数字分割 18
4.4.2数字归一化 21
4.4.3数字细化 21
5基于BP神经网络的手写数字识别 22
5.1手写数字的特征提取 22
5.2手写数字识别 24
5.2.1 BP神经网络设计 24
5.2.2 BP神经网络识别结果分析 25
6系统实现与结果及分析 26
6.1系统实现 26
6.2结果分析 27
致谢 28