siliconindia | | September 20208THE IMPORTANCE OF HUMANS IN MACHINE-LEARNING-BASED FRAUD SYSTEMSBy Nitesh Kumar, Head of Data Science, AffirmFraud detection is one of the most challenging prediction problems for a machine learning system. At its heart, machine learning for fraud identification is a complex function that maps a set of attributes (IP address, address match, name match, typing speed, device etc) to a fraud outcome (like identity theft). This function is learned from historical training examples of attributes-outcome pairs. In order to train accurate fraud detection models, a large number of known fraud outcomes are required. In practice, no business wants to collect fraud outcomes by letting their systems be vulnerable to such attacks, which is where humans must step in. Human experts trained at detecting and confirming fraud provide the machine with valuable outcome labels. Unlike other applications like predicting clicks where the outcome variable (click for e.g.) can be automatically recorded, it often requires human intervention to confirm whether a transaction or an application was truly fraudulent. Outside of this fundamental requirement, there are other factors that make it imperative for humans to be closely involved in machine learning based fraud systems. Machines can only predict future behavior that's representative of the pastThe effectiveness of a machine learning system depends on how well the model generalizes, or performs on previously unseen instances or inputs. This generalization depends on how well the training sample represents the unseen instances the model acts on. Unfortunately, the fraud game is inherently adversarial, so the problem isn't stationary. As the system gets better at stopping old fraud strategies by learning historical examples, fraudsters develop novel attack vectors to beat the system. This severely limits the generalizability and shelf-life of a fraud detection model, so people are required to constantly monitor its actions and performance. Humans are aware of context and capable of logical reasoning Machines cannot incorporate new information well; they look at only what they are trained to look at. For example, if there is a security breach in a large phone service or email provider, a human can take that knowledge into account while reviewing cases. However, the machine cannot spontaneously update itself to react appropriately to the new information. Often times, such new information results from a feature that didn't even exist when the model was trained like a vulnerable update to an existing software. In practice, when this happens, humans will design and deploy stop-gap rules to protect the newly-discovered blind spot. These rules provide relief to the fraud detection system while the model is updated to incorporate the changed situation. Humans are also capable of trying out complex approaches as they review cases. It is common for humans to contact the applicant to confirm if fraud occurred. During the course of the conversation, the human expert might choose to do a variety of things. The human could (a) ask the applicant to answer some questions associated with their past, (b) ask for their social security number, (c) ask the user to complete a set of tasks sent through email in order to Nitesh KumarIN MYOPINION
< Page 7 | Page 9 >