ICDAR

SROIE : Scanned Receipts OCR and Information Extraction

 

1. Training Datasets

- 말레이시아 영수증

- 1. jpg + 1.txt 이렇게 한 쌍

- Task2 : 'company', 'date', 'address', 'total'  에 대한 Information Extraction

 

2. Sample Dataset(task2train 626개 세트)

{

    "company" : "PERNIAGAAN ZHENG HUI",

    "date" : "25/12/2018",

    "address" : "NO 122.124. JALAN DEDAP 13 81100 JOHOR BAHRU",

    "total" : "80.90"

}

 

3. Task3 - Key Information Extraction from Scanned Receipts

- Task Description

- Evaluation Protocol : F1 score

 

4. Ranking Table 1위

- HIK_OCR_Exclude_ocr_mismatch (Recall : 96.33%, Precision : 98.38%, Hmean : 97.34%)

- Hik-Vision 인 듯

 

5. PICK-PAPCIC & XZMU (Ranking Table 3위)

- PICK : Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks

- PAPCIC : Ping An Property & Casualty Insurance Company

- XZMU : Xuzhou Medical University

 

6. PICK 논문 개요

- handling complex document layout for KIE by combining graph learning with graph convolution operation, yielding a richer semantic representation containing the textual and visual features and global layout without ambiguity.

- KIE 문제를 sequence tagging problem 이나 NER 문제라고 생각하면, ignoring most of valuable visual and non-sequential information(text, position, layout, image)

- how to fully and efficiently expoit both textual and visual features of documents to get a richer semantic representation that is crucial for extracting key information without ambiguity in many cases and the expansibility of the method.

- traditional approaches [hand-craft features (regex and template matching)] 의 단점 : 확장 불가능

- sequence taggers problem, solved by NER 의 단점 : only operate on plain texts & not corporates visual info and global layout of docu to get a richer representation.

- LayoutLM : Pre-training of Text and Layout for Document Image Understanding 의 단점 : not consider the latent relationship between two text segments. needs adequate data and time consuming to pre-train model inefficiently

- predefine a graph to combine textual and visual info by using graph convolutions operation

- Input(Text + Box) → Encoder(BiLSTM) → Graph Module(GCN) → Decoder(BiLSTM+CRF) → Output(Entity)

- Input(Text + Box + Image) → Encoder(CNN+Transformer) → Graph Module(GLCN) → Decoder(BiLSTM+CRF) → Output(Entity)

- GraphIE : A graph-based framework for information extraction 의 단점 : needs prior knowledge and extensive human efforts to predefine task-specific edge type and adjacent matrix of the graph (challenging, subjectivity, time-consuming)

- Graph convolution for multimodal information extraction from visually rich documents

 

7. PICK 논문의 제안

- improve extraction ability by automatically making full use of the textual and visual features within documents

- PICK incorporates graph learning module into existing graph architecture to learn a soft adjacent matrix to effectively and efficiently refine the graph content structure indicting the relationship between nodes for downstream tasks instead of predefining edge type of the graph artificially.

- Semi-supervised learning with graph learning-convolutional networks

- PICK make full use of features of the docu including text, image, and position features by using graph convolution to get richer representaion for KIE.

- graph convolution operation has the powerful capacity of exploiting the relationship generated by the graph learning module and propagates info b/w nodes within a docu.

- The learned richer representations are finally used to a decoder to assist sequence tagging at the character level.

- GCN(graph convolutional networks) dmonstrated huge success in unstructured data tasks.

- spatial convolution method, spectral convolution method

 

8. H&H Lab (Hmean = 89.63%)

 

9. H&H Lab 논문

- Authors : HUST_VLRGROUP(Hui Zhang, Mengde Xu, Mingkun Yang, Zhen Zhu, Jiehua Yang) & HUAWEI_CLOUD_EI(Jing Wang, Yibin Ye, Shenggao Zhu, Dandan Tu)

- Paper : Ma, Xuezhe, and Eduard Hovy. "End-to-end sequence labeling via bi-directional lstm-cnns-crf." arXiv preprint arXiv:1603.01354 (2016)

 

10. "End-to-end sequence labeling via bi-directional lstm-cnns-crf" 코드

- 경로 : https://github.com/guillaumegenthial/sequence_tagging

- 내용 : 

반응형

'스타트업 > AI' 카테고리의 다른 글

[AI] Adaptive Transfer Learning  (0) 2020.07.02
[AI] transfer learning  (0) 2020.07.01
[AI] VIA tool 사용법 정리  (0) 2020.06.18
[AI] Fraud Detection  (0) 2020.06.17
[AI] 홈페이지 만들기  (0) 2020.06.12

+ Recent posts