Home Publications A Framework Towards Realtime Detection and Tracking of Text

Skip to content. | Skip to navigation

Carlos Merino and Majid Mirmehdi (2007)

A Framework Towards Realtime Detection and Tracking of Text

In: CBDAR 2007, pp. 10–17.


We present a near realtime text tracking system capable of detecting and tracking text on outdoor shop signs or indoor notices, at rates of up to 15 frames per second (for generous 640 × 480 images), depending on scene complexity. The method is based on extracting text regions using a novel tree-based connected component filtering approach, combined with the Eigen-Transform texture descriptor. The method can efficiently handle dark and light text on light and dark backgrounds. Particle filter tracking is then used to follow the text, including SIFT matching to maintain region identity in the face of multiple regions of interest, fast displacements, and erratic motions.


This work is part of our project to develop a text reading system for blind people.

Sample scenes and videos

These are the full video sequences of the samples in the paper and the CBDAR 2007 presentation.

Sample 1 BORDERS (Figure 7)


Sample 2 UOB (Figure 8)


Sample 3 ST. MICHAEL'S HOSPITAL (Figure 9)


Sample 4 LORRY (Figure 10)


Sample 5 LA PAZ

Sample 6 DESKTOP

Other sample scenes:

These additional scene scenes were obtained from this web page, from Myers et al. work, and processed with our algorithm.

Sample 7

Sample 8

Document Actions