ICDAR
SROIE : Scanned Receipts OCR and Information Extraction
1. Training Datasets
- 말레이시아 영수증
- 1. jpg + 1.txt 이렇게 한 쌍
- Task2 : 'company', 'date', 'address', 'total' 에 대한 Information Extraction
2. Sample Dataset(task2train 626개 세트)
{
"company" : "PERNIAGAAN ZHENG HUI",
"date" : "25/12/2018",
"address" : "NO 122.124. JALAN DEDAP 13 81100 JOHOR BAHRU",
"total" : "80.90"
}
3. Task3 - Key Information Extraction from Scanned Receipts
- Task Description
- Evaluation Protocol : F1 score
4. Ranking Table 1위
- HIK_OCR_Exclude_ocr_mismatch (Recall : 96.33%, Precision : 98.38%, Hmean : 97.34%)
- Hik-Vision 인 듯
5. PICK-PAPCIC & XZMU (Ranking Table 3위)
- PICK : Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks
- PAPCIC : Ping An Property & Casualty Insurance Company
- XZMU : Xuzhou Medical University
6. PICK 논문 개요
- handling complex document layout for KIE by combining graph learning with graph convolution operation, yielding a richer semantic representation containing the textual and visual features and global layout without ambiguity.
- KIE 문제를 sequence tagging problem 이나 NER 문제라고 생각하면, ignoring most of valuable visual and non-sequential information(text, position, layout, image)
- how to fully and efficiently expoit both textual and visual features of documents to get a richer semantic representation that is crucial for extracting key information without ambiguity in many cases and the expansibility of the method.
- traditional approaches [hand-craft features (regex and template matching)] 의 단점 : 확장 불가능
- sequence taggers problem, solved by NER 의 단점 : only operate on plain texts & not corporates visual info and global layout of docu to get a richer representation.
- LayoutLM : Pre-training of Text and Layout for Document Image Understanding 의 단점 : not consider the latent relationship between two text segments. needs adequate data and time consuming to pre-train model inefficiently
- predefine a graph to combine textual and visual info by using graph convolutions operation
- Input(Text + Box) → Encoder(BiLSTM) → Graph Module(GCN) → Decoder(BiLSTM+CRF) → Output(Entity)
- Input(Text + Box + Image) → Encoder(CNN+Transformer) → Graph Module(GLCN) → Decoder(BiLSTM+CRF) → Output(Entity)
- GraphIE : A graph-based framework for information extraction 의 단점 : needs prior knowledge and extensive human efforts to predefine task-specific edge type and adjacent matrix of the graph (challenging, subjectivity, time-consuming)
- Graph convolution for multimodal information extraction from visually rich documents
7. PICK 논문의 제안
- improve extraction ability by automatically making full use of the textual and visual features within documents
- PICK incorporates graph learning module into existing graph architecture to learn a soft adjacent matrix to effectively and efficiently refine the graph content structure indicting the relationship between nodes for downstream tasks instead of predefining edge type of the graph artificially.
- Semi-supervised learning with graph learning-convolutional networks
- PICK make full use of features of the docu including text, image, and position features by using graph convolution to get richer representaion for KIE.
- graph convolution operation has the powerful capacity of exploiting the relationship generated by the graph learning module and propagates info b/w nodes within a docu.
- The learned richer representations are finally used to a decoder to assist sequence tagging at the character level.
- GCN(graph convolutional networks) dmonstrated huge success in unstructured data tasks.
- spatial convolution method, spectral convolution method
8. H&H Lab (Hmean = 89.63%)
9. H&H Lab 논문
- Authors : HUST_VLRGROUP(Hui Zhang, Mengde Xu, Mingkun Yang, Zhen Zhu, Jiehua Yang) & HUAWEI_CLOUD_EI(Jing Wang, Yibin Ye, Shenggao Zhu, Dandan Tu)
- Paper : Ma, Xuezhe, and Eduard Hovy. "End-to-end sequence labeling via bi-directional lstm-cnns-crf." arXiv preprint arXiv:1603.01354 (2016)
-
10. "End-to-end sequence labeling via bi-directional lstm-cnns-crf" 코드
- 경로 : https://github.com/guillaumegenthial/sequence_tagging
- 내용 :
'스타트업 > AI' 카테고리의 다른 글
[AI] Adaptive Transfer Learning (0) | 2020.07.02 |
---|---|
[AI] transfer learning (0) | 2020.07.01 |
[AI] VIA tool 사용법 정리 (0) | 2020.06.18 |
[AI] Fraud Detection (0) | 2020.06.17 |
[AI] 홈페이지 만들기 (0) | 2020.06.12 |