CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents

ref: https://github.com/DevashishPrasad/CascadeTabNet/blob/master/README.md

Devashish Prasad, Ayan Gadpal, Kshitij Kapadni, Manish Visave, Kavita Sultanpure, CVPR2020

Abstract

CascadeTabNet: The presented approach

We try to focus on using a small amount of data effectively to achieve high accuracy results. Working towards this goal, our primary strategy includes:

Model architecture

CascadeTabNet is a three-staged Cascade mask R-CNN HRNet model. A backbone such as a ResNet-50 without the last fully connected layer is a part of the model that transforms an image to feature maps.

Iterative transfer learning

  1. The two-stage transfer learning strategy is used to make a single model learn end to end table recognition using a small amount of data.

    • in the first iteration of transfer learning, we initialize our CNN model with the pre-trained imagenet coco model weights before training. After training, CNN successfully predicts the table detection masks for tables in the images.

    • in the second iteration, the model is again fine-tuned on a smaller dataset to accomplish even more specific task of predicting the cell masks in borderless tables along with detecting tables according to their types.

  2. We create a general dataset for a general task of table detection. We add images of different types of documents like word and latex in this dataset. These documents contain tables of various types like bordered, semi-bordered and borderless.

    • A bordered table is one for which an algorithm can use just the line positions to estimate the cells and overall structure of the table.
    • If some of the lines are missing, it becomes difficult for a line detection based algorithm to separate the adjacent cells of the table. We call such a table as a semi-bordered table, in which some lines are not present.
    • And a borderless table is one which doesn’t have any lines.
  3. This ew dataset contains slightly advanced annotations intimating the model to detect tables of two types with their labels (two classes) as bordered and borderless (with borderless and semi-bordered), as well as predict borderless table cell masks (total three classes).

Pipeline

In the bordered branch,

In the borderless branch,

Image Transformation and data augmentation

Results and Analysis

For creating a General dataset for table detection task we merge three datasets of ICDAR 19 (cTDaR), Marmot and Github 1.

Preliminary Analysis

Evaluation metrics for ICDAR 19 dataset are based on IoU (Intersection over Union) to evaluate the performance of table region detection. These results proved that both image transformation techniques for data augmentation help the model learn more effectively.

Table detection evaluation

First, we fine-tune Cascade mask R-CNN HRNet on the ICDAR 19 track A train set along with dilation transform augmentation, and the following results were obtained on the modern tack A test set.

Evaluation metrics for TableBank dataset for table detection are based on, calculating the Precision, Recall, and F1.

Evaluation metrics for ICDAR 2013 is based on completeness and purity of the sub-objects of a table.

Table structure recognition evaluation

For each cell, it is required to return the coordinates of a polygon defining the convex hull of the cell’s contents.

It predicts accurate cell masks for most of the borderless tables. For some images where some of the predictions for cells are missed by the model (5 c.), we correct it using line estimation and contour-based text detection algorithm. The model fails badly for some images (5 d.).