Why Semi-Supervised ML algorithms work well to prevent identity theft: Lydia Miller


Why Semi-Supervised ML algorithms work well to prevent identity theft: Lydia Miller

Identity fraud is a growing problem in today’s world where more and more services are fast going digital. From shopping to social interactions to medical and health services and even banking, are all digital!

According to the Identity Fraud Study by Javelin in 2018, a record 16.7 million adults in the US have experienced identity fraud. The study states that there has been a large number of fraudulent transactions, data breaches, and a high amount of identity theft.

This is where the digital identity has a key role to play as personal identity becomes tougher to verify in cyberspace. This is mainly because of the lack of trusted standards and face-face interactions and the high amount of identity frauds taking place in the digital sphere.

In a telephonic interview Lydia Miller, the Senior Director and a Leader of the digital strategy and growth analytics at Tata Consultancy, North America talked about why Identity authentication is a high priority in establishing trust between two parties in a digital space, as trust is the foundation for a thriving online business.With over 17 years of experience in successfully devising strategies to help big and small businesses worldwide integrate their services on digital platforms, Lydia has been in the forefront of digital transformations in the business world.

Role of AI in preventing identity theft

Machine learning, deep learning if used properly can be used to accurately authenticate identities at scale. By this, we mean that identity documents like driver’s license, passports are first scanned and stored in the system either remotely through mobile devices. They will then go through added authentication tests to confirm if they are genuine like microprint text and security threads, validation of special paper and ink, comparison between OCR and barcodes and magnetic strips, data validity tests, and biometrics or facial recognition to link the individual to the ID credential.

Using Machine learning to do the authentication makes it more efficient, quicker, and more accurate than when it’s done manually.

According to Miller, these ML solutions tend to have an automated data storing system built in. This internal data-collection mechanism is anonymous and can store all information required for authentication purposes. Since there are already hundreds of authentic samples for the machine to compare the documents with, it efficiently identifies the flaws and can say whether or not the documents entered are authentic or not.

However, there is a slight catch: this means that you are only as good as your data.

“ In the real world we have different forms of ids, passports, resident cards, driving license, press cards—for a machine to understand this and the variations in them is a monumental task that requires a lot of historic data for comparison,” Miller added.

The business growth strategist and consultant who has been working with various companies big and small including Fortune 500’s-- for over 15 years-- to help them develop, a successful strategy to execute digital transformations, feels that digital id is no longer a luxury but rather a necessity but this requires a lot of data and this in itself is a challenge.

Why we need data in the development of an optimal AI solution

To carry out effective authentication of papers that have been uploaded by the user, the AI software or system requires historic metadata to make comparisons during the authentication process. This data needs to be collected and fed into the system to allow the machine learning software to train itself to detect and analyze complex patterns and output a result. By this, we mean that the metadata that will be used in this process will contain information from past documents that have already been run on the system. The details will form the foundation for any analysis done by the software.

Even then we cannot really ensure that the outcome will be 100 percent accurate as it may be difficult for the software to detect:

· wear and tear

· other physical damage

· manufacturing errors or defects

· minor design changes

Meaning that even a valid document may not pass the test and could come out as invalid because the software was unable to recognize the slight distortions in the document.

Therefore, it is important to continuously update and optimize the software’s library for the outcome of the authentication process to be reliable. A robust document library against which to compare captured IDs is of utmost importance as it cuts down the time that machines require to process data on their own and maximizes data extraction and authentication capabilities.

Why use a semi-supervised ML model over automated

Relying completely on an automated machine learning algorithm can have devastating results as even valid documents with a little wear and tear may not be recognized as valid.

Semi-supervised machine learning enables us to make adjustments without interfering with the information that is required to authenticate the identification documents. To do this, a feedback loop can be created.

This loop will allow you to consistently add new data into the algorithm and test its outcomes to check if they are consistent and improving. The results are then fed into the algorithm so that the software continues to learn and adjust.

A fully automated ML will not be as efficient as semi-supervised ML algorithms as they have to rely heavily only on the historic data available to them. While a semi-supervised model allows

constant supervision, error identification in the system, and continuous updates to ensure the efficiency of the software.