Unified detection and recognition for reading text in scene images

2008 2008

Other formats: Order a copy

Abstract (summary)

Although an automated reader for the blind first appeared nearly two-hundred years ago, computers can currently "read" document text about as well as a seven-year-old. Scene text recognition brings many new challenges. A central limitation of current approaches is a feed-forward, bottom-up, pipelined architecture that isolates the many tasks and information involved in reading. The result is a system that commits errors from which it cannot recover and has components that lack access to relevant information.

We propose a system for scene text reading that in its design, training, and operation is more integrated. First, we present a simple contextual model for text detection that is ignorant of any recognition. Through the use of special features and data context, this model performs well on the detection task, but limitations remain due to the lack of interpretation. We then introduce a recognition model that integrates several information sources, including font consistency and a lexicon, and compare it to approaches using pipelined architectures with similar information. Next we examine a more unified detection and recognition framework where features are selected based on the joint task of detection and recognition, rather than each task individually. This approach yields better results with fewer features. Finally, we demonstrate a model that incorporates segmentation and recognition at both the character and word levels. Text with difficult layouts and low resolution are more accurately recognized by this integrated approach. By more tightly coupling several aspects of detection and recognition, we hope to establish a new unified way of approaching the problem that will lead to improved performance. We would like computers to become accomplished grammar-school level readers.

Indexing (details)

Artificial intelligence;
Computer science
0800: Artificial intelligence
0984: Computer science
Identifier / keyword
Applied sciences; Automated readers; Scene text recognition; Text reading
Unified detection and recognition for reading text in scene images
Weinman, Jerod J.
Number of pages
Publication year
Degree date
School code
DAI-B 69/09, Dissertation Abstracts International
Place of publication
Ann Arbor
Country of publication
United States
Hanson, Allen R.; Learned-Miller, Erik G.
Committee member
McCallum, Andrew; Rayner, Keith
University of Massachusetts Amherst
Computer Science
University location
United States -- Massachusetts
Source type
Dissertations & Theses
Document type
Dissertation/thesis number
ProQuest document ID
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Document URL
Access the complete full text

You can get the full text of this document if it is part of your institution's ProQuest subscription.

Try one of the following:

  • Connect to ProQuest through your library network and search for the document from there.
  • Request the document from your library.
  • Go to the ProQuest login page and enter a ProQuest or My Research username / password.