Abstract Every-day, the postal sorting systems diffuse several tons of mails. It is noted that the principal origin of mail rejection is related to the failure of address-block localization task, particularly, of the physical layout segmentation stage. The bottom-up and top-down segmentation methods bring different knowledge that should not be ignored when we need to increase the robustness. Hybrid methods combine the two strategies in order to take advantages of one strategy to the detriment of other. Starting from these remarks, our proposal makes use of a hybrid segmentation strategy more adapted to the postal mails. The high level stages are based on the hierarchical graphs coloring, allowing managing through a pyramidal data organization, the complex rules leading the interpretation of the connected components decomposition of interest zones.31818
Today, no other work in this context has make use of the powerfulness of this tool. The performance evaluation of our approach was tested on a corpus of 10000 envelope images. The processing times and the rejection rate were considerably reduced. 1. Introduction Automatic mail sorting machines of most recent systems process about 17 mail pieces per second. That requires a fast and precise OCR based recognition of the block-address. This recognition is mainly conditioned by a correct address lines organization [1][2][3]. Once the envelope image has been acquired by a linear CCD camera, three principal modules contribute to the task of the address-block localization: physical layout segmentation of envelope image, feature extraction and address-block identification (see figure 1).
The phase of the physical layout segmentation has a great impact on the global performance of the sorting system. Generally, this segmentation indicates the decomposition of envelope image into disjoint constitutive elements containing homogeneous components in order to identify them separately. These elements are often spaced and form elementary geometrical blocks, based on a rectangle in the large majority of the cases. The definition of segmentation in literal sense is very similar to the word “analyze”. One speaks about over-segmentation when constitutive components are fragmented and about under-segmentation when several constitutive components cannot be isolated. From the effectiveness point of view, we noticed that the traditional segmentation techniques encounter several constraints (figure 2): degraded images (folded envelopes),
very large mail variety (quality, color and different paper textures), real time constraints (limited processing time), skewed text lines on the envelopes, non-uniform spacing between characters, lines and blocks of text, result’s obligation, high spatial resolution of the images (300 dpi), Presence of parasitic elements near the address-block (stamps, post office marks, printed logos…), superimposed information layer (stamp, handwritten notes…). Figure 2. Very large mail varietyTaking into account these limits, we propose in this paper an original method of physical layout segmentation of the postal mail images. Using the graph theory, the principal of our technique is based on pyramidal representation of data. The fundamental objective consists in increasing performances of each segmentation stage and its coherence with the other stages in order to reduce mail rejection and time processing to the maximum. The developed method will be integrated into a system of automatic mail sorting. The remainder of this paper is organized as follows: The various segmentation methods are quoted in section 2 in which the previous works and the set limits are presented. In the third section, the formal aspects of graphs coloring are detailed. The fourth section describes the application of the coloring to the segmentation problem. The obtained results are then commented and discussed. 2. Various segmentation strategies The segmentation methods analyze the envelope image in order to extract the textual block pided into lines and characters. It is mainly based on the hierarchical revelation of the linear structure of physical components. The text regions represent one of the main information sources necessary to the automatic sorting of mail items. It is clear that a huge amount of constraints makes the segmentation of these vital regions very difficult. The literature generally refers to three strategies of segmentation. The bottom-up and top-down segmentation methods bring different knowledges that should not be ignored when we need to increase the robustness. Hybrid methods (or mixed approaches) combine the two strategies (bottom-up and top-down) in order to benefit from the advantages of one strategy to fill the disadvantages of the other. This combination can reduce several errors caused by the traditional segmentation methods [4].
- 上一篇:破碎机偏心速度英文文献和中文翻译
- 下一篇:模糊PID控制算法英文文献和中文翻译
-
-
-
-
-
-
-
浅析中国古代宗法制度
g-C3N4光催化剂的制备和光催化性能研究
巴金《激流三部曲》高觉新的悲剧命运
NFC协议物理层的软件实现+文献综述
现代简约美式风格在室内家装中的运用
高警觉工作人群的元情绪...
中国传统元素在游戏角色...
C++最短路径算法研究和程序设计
江苏省某高中学生体质现状的调查研究
上市公司股权结构对经营绩效的影响研究