easy-to-implement strong baseline for irregular scene text recognition, using off-the-shelf neural network components and only word-level annotations

 

31-layer ResNet, an LSTM-based encoder-decoder framework and a 2-dimensional attention module

 

*Model

(1) ResNet CNN for feature extraction

(2) 2D-attention based encoder-decoder model

 

*Implementation Details

- Torch

- NVIDIA Titan X GPU with 12GB Memory

- cross-entropy loss

- without pre-training

- ADAM optimizer(learning rate = 0.001, decay rate = 0.9 every 10000 iterations until it reaches 0.00001

 

반응형

+ Recent posts