Inicio Investigación Espacio Acústico Virtual (EAV) Lector de textos en escenas naturales

Cambiar a contenido. | Saltar a navegación

Lector de textos en escenas naturales

Últimos resultados:

 

 

Contenido

Nuestro objetivo es desarrollar un sistema que permita la lectura de textos del entorno a usuarios ciegos o con deficiencia visual. La detección y reconocimiento de texto se realiza utilizando visión artificial y los resultados se comunican al usuario usando un sintetizador de texto-a-voz. Esta enfocado en texto de escena (carteles, nombres de calles, nombres de tiendas, etc.) en entornos de exterior no controlados.

 

Esta investigación esta siendo realizada por Carlos Merino Gracia y el Prof. Majid Mirmehdi. Este trabajo es una colaboración entre grupo Espacio Acústico Virtual y el Visual Information Laboratory de la Universidad de Bristol.

Como resultado de esta linea de investigación, se han desarrollado técnicas punteras de detección de texto, corrección de perspectiva y seguimiento de texto. Asimismo, se ha construido un prototipo portátil que permite la demostración y prueba del sistema completo.

 

 

(the following content is currently only available in English)

The main restriction on a text reading system is real-time operation. We need to be able to process images from a video camera at the same rate as they are being produced, and provide a response to the user within a reasonable interval. Our text detection and perspective recovery techniques are designed to be efficient and fast (without losing accuracy) and compare favorably in this aspect with any other state-of-the-art scene text detection techniques.

We also try to exploit the advantages we have over traditional flat-bed scanning systems, i.e. lower resolution images but a continuous stream of them. We trade spatial resolution with temporal redundancy. Hence the focus on text tracking as the basis of the context awareness of the system.

Text detection

Imagen originalRegiones MSER
Regiones MSER filtradasRegiones de texto detectadas

Several stages of our Hierarchical MSER based text detection algorithm (Merino-Gracia et al., 2011).

Perspective recovery

Scene text is often encountered in arbitrary 3D orientations, which has always been a limitation of scene text readers. Off-the-shelf OCR engines are sensitive to perspective distorted text, rapidly losing accuracy as deformations are introduced. Our scene text perspective recovery technique (Merino-Gracia et al., 2013) uses a geometrical approach to estimate the orientation of text lines based only on the characters themselves. It is efficient and fast, while operating on wider angle ranges than previous state-of-the-art methods.

Pitcher & PianoThe India ShopSt. Michael
@ BristolPlease do not climbMyddelton & Major

Results of our perspective detection technique (Merino-Gracia et al., 2013).

Text tracking

Read more about text tracking 

Images of text in natural scenes suffer from several problems that are not present in scanned documents: blur, low resolution, uneven lighting, etc. However, when a video sequence of images is considered, a temporal redundancy can be exploited to compensate some of these drawbacks: blurred frames can be skipped or frames can be stacked together to obtain higher resolution images. Our scene text tracking framework is aimed at exploring these opportunities. Our latest results on text tracking as well as the texttrack dataset of annotated scene text sequences are available on the text tracking page. Additionally, some early results videos are available in our CBDAR 2007 article page (Merino and Mirmehdi, 2007).

Prototype

PrototipoOur prototype

Investigadores que participan en esta línea