Distributed (or mobile) enrolment for services like banking is becoming more and more common. Users are frequently required to take a picture of an ID as identification during such processes. It is crucial to automatically check basic document verification properties and perform text recognition for this to function securely. Additionally, difficult circumstances include contrasting backgrounds, varied lighting, angles, perspectives, etc. In this research, we offer a machine-learning-based pipeline to process images of documents in such situations. This pipeline makes use of a number of analytical modules and visual cues to confirm the type and legality of the documents. Using identity documents from the Republic of Colombia, we assess our strategy.

Introduction

Interest in remote enrollment or onboarding processes is growing as a result of the widespread use of mobile devices and internet connectivity. As part of the identity verification process, such services often need images of identification documents (ID). Some organizations require identity confirmation, thus it’s necessary to have security measures in place for remote ID verification systems to thwart identity theft. In truth, there are a number of issues that need to be resolved before scale onboarding systems can recognize a document as authentic. The system should first localize the document and extract pertinent data from images captured by users in unpredictable settings, such as with different backdrops, angles, and mobile camera capabilities.

Relevant Work

Verifying identity documents seeks to establish whether an input image corresponds to an authentic document class and whether the document is valid. The document should be localized and processed from the input image before performing document verification. This step ensures that the authenticity checking system will receive a standard input. To find the document, the majority of earlier investigations use word recognition or image processing. Verifying identity documents seeks to establish whether an input image corresponds to an authentic document class and whether the document is valid. So, the document should be localized and processed from the input image before performing document verification. This step ensures that the authenticity checking system will receive a standard input. To find the document, the majority of earlier investigations use word recognition or image processing.

Approach 3

Two modules make up the proposed pipeline for document analysis (see Fig. 1). The first module deals with the pre-processing needs of in-the-wild smartphone document capturing. Moreover, In order to perform: a) picture matching with the predicted identification document class and b) a fundamental assessment of the document’s authenticity. The second module extracts local and global descriptors from the input image.

Document acquisition in Module 1

Deep Learning Background Removal Model: To find the document in the image, we employed semantic segmentation. This technique divides each pixel into one of two classes—identity document or background—depending on its function.

First, we locate the border of the document’s contour. The four-line intersections are identified as the document’s corners. So, the contour is then utilized to do a linear regression on each side of the page. A geometric transformation matrix is discovered from the chosen corners. We utilized the warp-perspective tool to convert the original image into a well-oriented document using the calculated matrix.

Verification of Documents in Module 2

A group of features that best characterize the visual and structural information of the input image are categorized by the document verification process. These qualities should set the original document class apart from others and verify the fundamentals of the document’s authenticity. The features that characterize the entire image are referred to as global features.

The first global feature compares the grayscale histograms of the input image. And against an authentic document image, defined as the ground truth. To handle the variability from the light conditions, the histograms are normalized using a min-max feature scaling. Histogram similarity was measured using the Wasserstein distance (WD). The WD metric proposed is base on the theory of optimal transport between distributions.

4 Assessment

Data Set

101 Colombian identity documents in all, collecting with the subjects’ voluntary cooperation, made up the evaluation dataset. The participants use their own smartphones without any limitations to take pictures of the documents. 

Acquisition of Documents

Background Removal: Using synthetic data adding to the dataset given in Sect. 4.1, we training into a deep neural network.

A total of 2254 negative instances, made up of random images and 0-filled masks, were including in the dataset, along with empty backdrops without ID Documents for training.

To achieve the optimum results throughout training, factors were changed. So, grayscale and color photos were used to test the input, as well as a smaller, more balanced dataset of only 4766 images. Moreover, Although a Jaccard-based loss was also evaluated, binary cross-entropy (BCE), which is the standard solution for binary classification problems, was utilizing as the loss function.

Conclusion

It was suggested to use a pipeline for identification document authentication analysis. A deep learning-integrated document verification solution capture module that can remove the background in challenging situations has been developed and tested. Utilizing machine access. The case study’s findings demonstrate the methodologies’ potential for seamless enrolling procedures. We want to see if the suggested pipeline can be simply modified to handle more document kinds and bigger datasets in the future.

By Manali