Illustration of complex table detection results. Blue and Green colored rectangles correspond to ground truth and predicted bounding boxes using CDeC-Net.
The proposed network consists of a multistage extension of Mask R-CNN with a dual backbone having deformable convolution for detecting tables varying in scale with high detection accuracy at higher IoU threshold.
Our solution has three important properties:
Cascade R-CNN
We use a dual backbone based architecture which creates a composite connection between the parallel stages of two adjacent ResNeXt-101 backbones (one is called assistant backbone and other is called lead backbone).
CBNetV2: A Composite Backbone Network Architecture for Object Detection
We replace the fixed receptive field CNN with deformable CNN [22] in each of our dual backbone architectures. The gird is deformable as each grid point can be moved by a learnable offset.
Deformable Convolution
We observe from the table that CDeC-Net outperforms state-of-the-art techniques on ICADR2013, UNLV, Marmot, TableBank, and PubLayNet datasets.
ILLUSTRATES COMPARISON BETWEEN THE PROPOSED CDEC-NET AND STATE-OF-THE-ART TECHNIQUES ON ICDAR-2013 DATASET
ILLUSTRATES COMPARISON BETWEEN THE PROPOSED CDEC-NET AND STATE-OF-THE-ART TECHNIQUES ON ICDAR-2019 DATASET.
Shows examples where CDeC-Net fails to accurately detect the tables.
While our single model CDeC-Net‡ fails to predict bounding boxes corresponding to tables present in the documents.