ref: https://github.com/DevashishPrasad/CascadeTabNet/blob/master/README.md
In this paper, we present an improved deep learning-based end to end approach for solving both problems of table detection and structure recognition using a single Convolution Neural Network (CNN) model.
We propose CascadeTabNet: a Cascade mask Region-based CNN High-Resolution Network (Cascade mask R-CNN HRNet) based model that detects the regions of tables and recognizes the structural body cells from the detected tables at the same time.
We try to focus on using a small amount of data effectively to achieve high accuracy results. Working towards this goal, our primary strategy includes:
Cascade RCNN
HRNetV2p: The original architecture of HRNet was enhanced for semantic segmentation to form HRNetV2. And, then a feature pyramid was formed over HRNetV2 for object detection to form HRNetV2p.
CascadeTabNet is a three-staged Cascade mask R-CNN HRNet model. A backbone such as a ResNet-50 without the last fully connected layer is a part of the model that transforms an image to feature maps.
The two-stage transfer learning strategy is used to make a single model learn end to end table recognition using a small amount of data.
in the first iteration of transfer learning, we initialize our CNN model with the pre-trained imagenet coco model weights before training. After training, CNN successfully predicts the table detection masks for tables in the images.
in the second iteration, the model is again fine-tuned on a smaller dataset to accomplish even more specific task of predicting the cell masks in borderless tables along with detecting tables according to their types.
We create a general dataset for a general task of table detection. We add images of different types of documents like word and latex in this dataset. These documents contain tables of various types like bordered, semi-bordered and borderless.
This ew dataset contains slightly advanced annotations intimating the model to detect tables of two types with their labels (two classes) as bordered and borderless (with borderless and semi-bordered), as well as predict borderless table cell masks (total three classes).
In the bordered branch,
In the borderless branch,
For creating a General dataset for table detection task we merge three datasets of ICDAR 19 (cTDaR), Marmot and Github 1.
Evaluation metrics for ICDAR 19 dataset are based on IoU (Intersection over Union) to evaluate the performance of table region detection. These results proved that both image transformation techniques for data augmentation help the model learn more effectively.
First, we fine-tune Cascade mask R-CNN HRNet on the ICDAR 19 track A train set along with dilation transform augmentation, and the following results were obtained on the modern tack A test set.
Evaluation metrics for TableBank dataset for table detection are based on, calculating the Precision, Recall, and F1.
Evaluation metrics for ICDAR 2013 is based on completeness and purity of the sub-objects of a table.
For each cell, it is required to return the coordinates of a polygon defining the convex hull of the cell’s contents.
It predicts accurate cell masks for most of the borderless tables. For some images where some of the predictions for cells are missed by the model (5 c.), we correct it using line estimation and contour-based text detection algorithm. The model fails badly for some images (5 d.).