Data Mining & Analysis vs. Predictive Modeling

Complimentary but Different Facets of Risk Management
Fraud prevention and risk management professionals are familiar with both data mining/analysis and predictive modeling as tools of the trade, and the two are often discussed in conjunction with each other. In practice data mining/analysis and predictive modeling, although one (data analysis) is often part of the other (predictive modeling), represent two different facets of risk management.

Data Mining and Analysis
Data mining and analysis has been used as a fraud detection technique for decades. The general concept is that historical data is gathered and analyzed in an effort to further understand it. Several methodologies have proven effective with data analysis, especially in the field of forensic accounting. Some of the more common techniques include the use of filters, expressions/equations, gap detection, statistical analysis, duplicate detection, sorting/indexing, summarization, stratification, cross tabulation/pivot tables, aging, joining/relating, trend analysis, regression analysis, parallel simulation, Benford’s Law, digital analysis, sampling, or the combination of one or more of these techniques.

The general idea is that when performing one or more of the above, suspicious or outlier data is discovered that may indicate the presence of fraudulent or abusive behavior. Alternatively, data that is already known to be fraudulent may be analyzed in an attempt to determine similarities or profiles that seem to be tied to fraud. Numerous software applications have been created to assist with the data analysis work. CaseWare International’s IDEA and ACL Services’ ACL Desktop Edition are two examples. Many data analysis efforts defer to the “poor man’s data analysis tool”, otherwise known as Microsoft Excel. Those who have used Excel extensively will testify that it can be a very effective data analysis tool. It just might take a bit more manual intervention than some of the “prepared” data analysis programs mentioned earlier.

Predictive Modeling
Over the years, many modeling methodologies have been developed in the finance sector in an effort to put data mining to use by predicting and preventing the occurrence of fraud. Industry-specific SaaS and enterprise offerings have been created to address particular pain points. In the Card Not Present realm, CyberSource’s Decision Manager, 41st Parameter’s FraudNet, and Accertify’s Interceptas are a few among many offerings focused on fraud prevention.

Reactive vs. Proactive
The conceptual differences between data mining and analysis vs. predictive modeling could be illustrated as follows:

Proactive Reactive
Predictive Modeling Data Mining & Analysis
Fraud Prevention Fraud Detection

If my main concern is the detection of existing or historical fraud, I will be most interested in data mining and analysis. On the other hand, if I am focused on preventing fraud before (or while) it occurs, I will spend most of my efforts on predictive modeling. And of course, if my risk management agenda encompasses both concerns then I will create policies that include both reactive and proactive activities.

Although predictive modeling is a separate activity, it is not divorced from the data. In fact, for predictive models to be effective they must be based on assumptions made from analyzing available data. The difference is that on one hand you are mining and analyzing data and on the other hand you are creating a model or function that determines what occurs if particular types of data are encountered. Think of predictive modeling as the “next step” after data mining and analysis has occurred. Now that you know what the fraud looks like, you are trying to predict and prevent it from happening again.