Home Research Virtual Acoustic Space (VAS) Natural Scenes Text Reader

Skip to content. | Skip to navigation

Natural Scenes Text Reader

Latest results:




We aim to develop a system that will allow blind or visually impaired people to read text available in their environments. The detection and recognition of text is performed using computer vision and the results are communicated back to the user through a text-to-speech synthesizer. It is focused in scene text (billboards, street names, shop names, etc.) in uncontrolled, outdoor environments.


This research work is being carried out by Carlos Merino Gracia and Prof. Majid Mirmehdi. It is a collaboration between the Virtual Acoustic Space group and the Visual Information Laboratory from Bristol University.

As the result of this research line, state of the art techniques have been developed for text detection, text perspective recovery and text tracking. A text reading prototype has also been built which allows the demonstration and testing of the developed system.



(the following content is currently only available in English)

The main restriction on a text reading system is real-time operation. We need to be able to process images from a video camera at the same rate as they are being produced, and provide a response to the user within a reasonable interval. Our text detection and perspective recovery techniques are designed to be efficient and fast (without losing accuracy) and compare favorably in this aspect with any other state-of-the-art scene text detection techniques.

We also try to exploit the advantages we have over traditional flat-bed scanning systems, i.e. lower resolution images but a continuous stream of them. We trade spatial resolution with temporal redundancy. Hence the focus on text tracking as the basis of the context awareness of the system.

Text detection

Original imageMSER regions
Filtered MSER regionsDetected text regions

Several stages of our Hierarchical MSER based text detection algorithm (Merino-Gracia et al., 2011).

Perspective recovery

Scene text is often encountered in arbitrary 3D orientations, which has always been a limitation of scene text readers. Off-the-shelf OCR engines are sensitive to perspective distorted text, rapidly losing accuracy as deformations are introduced. Our scene text perspective recovery technique (Merino-Gracia et al., 2013) uses a geometrical approach to estimate the orientation of text lines based only on the characters themselves. It is efficient and fast, while operating on wider angle ranges than previous state-of-the-art methods.

Pitcher & PianoThe India ShopSt. Michael
@ BristolPlease do not climbMyddelton & Major

Results of our perspective detection technique (Merino-Gracia et al., 2013).

Text tracking

Read more about text tracking 

Images of text in natural scenes suffer from several problems that are not present in scanned documents: blur, low resolution, uneven lighting, etc. However, when a video sequence of images is considered, a temporal redundancy can be exploited to compensate some of these drawbacks: blurred frames can be skipped or frames can be stacked together to obtain higher resolution images. Our scene text tracking framework is aimed at exploring these opportunities. Our latest results on text tracking as well as the texttrack dataset of annotated scene text sequences are available on the text tracking page. Additionally, some early results videos are available in our CBDAR 2007 article page (Merino and Mirmehdi, 2007).


PrototypeOur prototype

People participating in this line