IMPROVED PARCEL SORTING BY COMBINING AUTOMATIC SPEECH AND CHARACTER RECOGNITION
Automatic postal sorting systems have traditionally reliedon optical character recognition (OCR) technology. WhileOCR systems perform well for flat mail items such as en-velopes, the performance deteriorates for parcels. In thisstudy, we propose a new multimodal solution for parcel sort-ing which combines automatic speech recognition (ASR)technology with OCR in order to deliver better performance.Our multimodal approach is based on estimating OCR outputconfidence, and then optionally using ASR system outputwhen OCR results show low confidence. Particularly, weproposed a Levenshtein edit distance (LED) based measureto compute OCR confidence. Based on the OCR confidencemeasure, a dynamic fusion strategy is developed that formsits final decision based on (i) OCR output alone, (ii) ASRoutput alone, and (iii) combination of ASR and OCR outputs.The proposed system is evaluated on speech and image datacollected in real-world conditions. Our experiments showthat the proposed multimodal solution achieves an overallzip code recognition rate of 90.2%, which is a 27903
substantialimprovement over ASR alone (81%) and OCR alone (80.6%)systems. This advancement represents an important contri-bution that leverages OCR and ASR technologies to improveaddress recognition in parcels.Index Terms— Automatic Speech Recognition, OpticalCharacter Recognition, Parcel Sorting, Address Recognition
1. INTRODUCTIONParcel sorters are large machines that automatically sort anddirect mail towards their destination. Parcel sorting technol-ogy provides greater speed, efficiency, and reliability to maildelivery while cutting operational cost. The key step in auto-matic parcel sorting is reliable automatic identification of theaddress information on inpidual mail items. Traditionally,optical character recognition (OCR) has been used widely forreading addresses on mail items. While OCR performs ex-tremely well on flat mail items (also known as flats) such asenvelopes, its performance is lower on parcels. Address labelson flats are generally more consistently oriented and providea strong black-font on white-background contrast resulting inbetter OCR performance. However, address labels on parcels are inconsistently placed, often covered with plastic, and theparcel itself could be irregularly shaped resulting in poorerOCR performance. Consequently, there is a need to improveautomatic address recognition accuracy for parcels.In this study, we explore the possibility of using automaticspeech recognition (ASR) together with OCR to deliver im-proved address recognition accuracy for parcels. While mul-timodal approaches that combine OCR and ASR system havebeen explored in other domains [1, 2], the application domainof postal automation is relatively under explored. Here, it isimportant to note that parcel sorters are generally operated bya single person, whose primary job is to pick up the parcel andplace it on the sorting machine conveyor belt. In the proposedmultimodal solution, the operator is assigned a secondary jobwhere he/she reads the address label while placing the par-cel. In this manner, the necessary speech input for the ASRsystem can be obtained. It is also important to note that theoperation is minimally effected as the operator is generallyonly required to speak the zip code (or the first three digits ofthe zip code).
In the proposed multimodal approach, the ASR and OCRsystems work independently to decode the best results, whichis later combined to deliver superior performance. Particu-larly, the combination logic examines the OCR output andgenerates a confidence score. If the OCR output is gener-ated with high confidence, then ASR output is not used forfinal processing of the result. Additionally, if the OCR out-put is generated with low confidence, then the OCR outputis not trusted and the ASR output alone is used for process-ing the final output. Finally, if the OCR output is generatedwith medium confidence, then a combination of OCR andASR outputs is used to process the final result. In order toplace the OCR output in high, medium, and low confidencecategories, a confidence measure based on Levenshtein editdistance (LED) is proposed. The LED based measure is veryefficient and effective at utilizing ASR output when OCR out-put is unlikely to be correct. 邮政包裹分拣系统英文文献和中文翻译:http://www.youerw.com/fanyi/lunwen_22564.html