Literature Review Of The Methods Of Scene Text Detection

Shaohui Ruan et al. (2018) describes the most challenging problem of scene text detection called arbitrarily oriented text. This paper mainly focuses on prediction of word level bounding boxes via fully connected network. They proposed a method which extracts the feature from the input image by residual network and apply multi-level fusion over the extracted features. The pipeline consists of a fully convolutional network and a standard NMS as post-processing. This method achieves an F-measure of 83. 46% and 56. 39% on ICDAR 2015 Incidental Scene Text benchmark and COCO-Text dataset respectively, outperforming the previous methods by a large margin. Also, it can run at over 11 FPS on 704×1280 images, which is much faster than the previous works.

Wafa Khlif et al. (2018) describes reading of text embedded in natural scene text detection which is essential for many application. In this paper, they proposed a method for detecting text in scene images based on multi-level connected component (CC) analysis and learning text component features via convolutional neural networks (CNN), followed by a graph-based grouping of overlapping text boxes. The system is evaluated on the standard public dataset of the ICDAR2013 Robust Reading Competition Challenge2: Focused Scene Text. When evaluated on the ”Robust Reading Competition” dataset for natural scene images, our method achieved better detection results compared to state-of-the-art methods. In addition to its efficacy, this method can be easily adapted to detect multi-oriented or multi-lingual text as it operates at low level initial components, and it does not require such components to be characters. Examples of failure cases: strong highlights, transparent or very small text. Red boxes show missed text, green boxes show correctly detected text.

Zhen Zhu et al. (2018) describes a significant challenge in the scene text detection is the large variation in the text size. This paper presents an accurate oriented text detector based on Faster R-CNN. They applied feature fusion both in RPN and Fast R-CNN to alleviate this problem and furthermore, enhance model’s ability to detect relatively small text. This text detector achieves comparable results to those state of the art methods on ICDAR 2015 and MSRA-TD500, showing its advantage and applicability. Baseline+ feat fusion model.

Jianqi Ma et al. (2018) introduces a novel rotation-based framework for arbitrary-oriented text detection in natural scene images. They have presented the Rotation Region Proposal Networks (RRPN), which are designed to generate inclined proposals with text orientation angle information. The Rotation Region-of-Interest (RRoI) pooling layer is proposed to project arbitrary-oriented proposals to a feature map for a text region classifier. They have conducted experiments using the rotation-based framework on three real-world scene text detection datasets and demonstrate its superiority in terms of effectiveness and efficiency over previous approaches. Proposed model achieves the comparable results to state of the art methods on IC15 with F-measure reaching 0. 776.

Kun Fan et al. (2018) describes detection of text regions which is defined as part of text lines containing a whole character or transitions between two adjacent characters. This paper presents simple features which consist of means and standard deviations of image gradients to train a random forest so as to detect text regions over multiple image scales and color channels. Even though our method is trained on English, our experiments demonstrate that it achieves high recall with a few thousand good quality proposals on four standard benchmarks, including multi-language datasets. Following the One-to-One And Many-to-One detection criteria, our method achieves 91. 6%, 87. 4%, 92. 1% and 97. 9% recall on the ICDAR 2013 Robust Reading Dataset, Street View Text Dataset, Pan’s Multilingual Dataset And Sampled KAIST Scene Text Dataset respectively, with an average of less than 1250 proposals.

Baoguang Shi et al. (2018) introduces Segment Linking (SegLink), an oriented text detection method. The main idea is to decompose text into two locally detectable elements, namely segments and links. Both elements are detected densely at multiple scales by an end-to-end trained, fully-convolutional neural network. Final detections are produced by combining segments connected by links. It achieves an f-measure of 75. 0% on the standard ICDAR 2015 Incidental (Challenge 4) benchmark, outperforming the previous best by a large margin. It runs at over 20 FPS on 512×512 images. Moreover, without modification, SegLink is able to detect long lines of non-Latin text, such as Chinese.

Yuliang Liu et al. (2017) describes incidental scene text detection which is challenging task because of multi-orientation, perspective distortion, and variation of text size, color and scale. This paper is focused on using rectangular bounding box or horizontal sliding window to localize text, which may result in redundant background noise, unnecessary overlap or even information loss. They proposed a new Convolutional Neural Networks(CNNs) based method, named Deep Matching Prior Network (DMPNet), to detect text with tighter quadrangle. System uses quadrilateral sliding windows, a shared Monte-Carlo method and a sequential protocol to complete the proposed approach.

The effectiveness of our approach is evaluated on a public word-level, multi oriented scene text database, ICDAR 2015 Robust Reading Competition Challenge 4 “Incidental scene text localization”. The performance of our method is evaluated by using F-measure and found to be 70. 64%, outperforming the existing state-of-the-art method with F-measure 63. 76%. Houssem Turki et al. (2017) describes text detection in natural scenes which holds great importance in the field of research and still remains a challenge. The contribution of our proposed method is to filtering out complex backgrounds by combining three strategies. This uses MSER, CNN, SVM, HOG model. They use the technique of word grouping who the boundary box localization select different words in the image where false positives text blocks are eliminated by geometrical properties. The evaluation of the proposed method demonstrate the effectiveness of our method for complex foreground through the experimental results tested on three benchmarks ICDAR2013, ICDAR2015 and MSRA-TD500. (MSER, CNN)

Xiang Bai et al. describes the method for differentiating the images that contain text from a large volume of natural images. To address this problem, we propose a novel convolutional neural network variant, called multi-scale spatial partition network (MSP-Net). The network classifies images that contain text or not, by predicting text existence in all image blocks, which are spatial partitions at multiple scales on an input image. The whole image is classified as a text image (an image containing text) as long as one of the blocks is predicted to contain text. The MSP-Net takes input as a whole image and outputs block-level classification results in an end-to-end manner. The results on several datasets have demonstrated the robustness and effectiveness of this proposed method.

18 March 2020
close
Your Email

By clicking “Send”, you agree to our Terms of service and  Privacy statement. We will occasionally send you account related emails.

close thanks-icon
Thanks!

Your essay sample has been sent.

Order now
exit-popup-close
exit-popup-image
Still can’t find what you need?

Order custom paper and save your time
for priority classes!

Order paper now