Data Mining & Analysis vs. Predictive Modeling

Complimentary but Different Facets of Risk Management
Fraud prevention and risk management professionals are familiar with both data mining/analysis and predictive modeling as tools of the trade, and the two are often discussed in conjunction with each other. In practice data mining/analysis and predictive modeling, although one (data analysis) is often part of the other (predictive modeling), represent two different facets of risk management.

Data Mining and Analysis
Data mining and analysis has been used as a fraud detection technique for decades. The general concept is that historical data is gathered and analyzed in an effort to further understand it. Several methodologies have proven effective with data analysis, especially in the field of forensic accounting. Some of the more common techniques include the use of filters, expressions/equations, gap detection, statistical analysis, duplicate detection, sorting/indexing, summarization, stratification, cross tabulation/pivot tables, aging, joining/relating, trend analysis, regression analysis, parallel simulation, Benford’s Law, digital analysis, sampling, or the combination of one or more of these techniques.

The general idea is that when performing one or more of the above, suspicious or outlier data is discovered that may indicate the presence of fraudulent or abusive behavior. Alternatively, data that is already known to be fraudulent may be analyzed in an attempt to determine similarities or profiles that seem to be tied to fraud. Numerous software applications have been created to assist with the data analysis work. CaseWare International’s IDEA and ACL Services’ ACL Desktop Edition are two examples. Many data analysis efforts defer to the “poor man’s data analysis tool”, otherwise known as Microsoft Excel. Those who have used Excel extensively will testify that it can be a very effective data analysis tool. It just might take a bit more manual intervention than some of the “prepared” data analysis programs mentioned earlier.

Predictive Modeling
Over the years, many modeling methodologies have been developed in the finance sector in an effort to put data mining to use by predicting and preventing the occurrence of fraud. Industry-specific SaaS and enterprise offerings have been created to address particular pain points. In the Card Not Present realm, CyberSource’s Decision Manager, 41st Parameter’s FraudNet, and Accertify’s Interceptas are a few among many offerings focused on fraud prevention.

Reactive vs. Proactive
The conceptual differences between data mining and analysis vs. predictive modeling could be illustrated as follows:

Proactive Reactive
Predictive Modeling Data Mining & Analysis
Fraud Prevention Fraud Detection

If my main concern is the detection of existing or historical fraud, I will be most interested in data mining and analysis. On the other hand, if I am focused on preventing fraud before (or while) it occurs, I will spend most of my efforts on predictive modeling. And of course, if my risk management agenda encompasses both concerns then I will create policies that include both reactive and proactive activities.

Although predictive modeling is a separate activity, it is not divorced from the data. In fact, for predictive models to be effective they must be based on assumptions made from analyzing available data. The difference is that on one hand you are mining and analyzing data and on the other hand you are creating a model or function that determines what occurs if particular types of data are encountered. Think of predictive modeling as the “next step” after data mining and analysis has occurred. Now that you know what the fraud looks like, you are trying to predict and prevent it from happening again.

Is CDI Finding its Way Into the Norm?

How many people remember the Big Brother scare surrounding the Processor Serial Number (PSN) embedded in Pentium 3’s (and some Pentium 2’s) back in 1999-2000? Despite some of the technical community stating that the PSN was not a solid identifier, as it could be easily masked (or, conversely, “forced” to reveal itself), Intel created quite the scare among large groups of people. Eventually, in April of 2000, the company announced that they would not include the PSN in the forthcoming 1.5GHz Willamette chip. An anonymous Intel engineer was quoted telling Wired magazine, “The gains that it could give us for the proposed line of security features were not sufficient to overcome the bad rep it would give us.”

Jumping ahead 9 years, in mid-September of 2009 I noticed an announcement by ThreatMetrix, touting an opposite reaction to the idea of tracking a device. Evidently, a study done by Ponemon Institute found positive consumer reaction to the concept of CDI (Client Device Identification – sometimes called device ID or device fingerprinting) as part of a fraud prevention/consumer protection strategy. The article states that a significant percentage of surveyed individuals is more amicable to having their computer profiled/identified than they are to have to remember a password or submit to other typical security standards.

If the attitude expressed by the respondents in the Ponemon study is representative at all of the populace as a whole, could it mean the idea of device identification is no longer a scare to consumers?

The key may rest upon the question of whether or not Personally Identifiable Information (PII) is associated with the device ID’s being created. The Ponemon study reveals that consumers are comfortable with a device ID concept as long as personal information is not tied to it.

This is pretty much what today’s device identification vendors are marketing. The technology is intended to create a unique identifier surrounding a device without the need to collect any PII. Some of the device ID elements may be used to tell the technology vendors specific information that is critical to judge the threat level of a transaction (for example, IP geo-location information, time differentiation, browser language, etc.). This information can be scored in some way or forwarded directly to a client company to assist them with filtering suspicious transactions. Since the client company often has individual account information for its visitors, it may combine device ID information with its own customer data to provide an even deeper profile (for example, account-to-device relationships).

Critics of device ID complain that a unique fingerprint is not always attainable, and savvy users can spoof, change, or substitute a device ID. In response to the first concern, how many fraud prevention technologies are 100% accurate? And wouldn’t the absence of a device ID be cause for concern by itself, depending on the application? As far as the second concern goes, which fraud prevention technologies are immune to user tampering of any kind? Add to this the fact that most CDI vendors have the ability to tell when a device ID has been tampered with in some way and the confidence level is not degraded significantly (would a device ID that had been tampered with or that came back differently than expected not be cause for suspicion?).

As is frequently stated by fraud prevention professionals, “there is no silver bullet”. The same holds true for CDI. As always, the winning solution is the combination of various technologies in a layering effect. Despite the fact that CDI has inherent weaknesses, as do all of the prior fraud prevention technologies, it is providing tremendous benefit to many companies, ranging from credit and loan issuers to social networking sites to online retailers. This is especially true when layering it with other effective technologies.

As online business continues to expand it is pleasing to see consumer fear of new technologies, including device fingerprinting, beginning to diminish. I believe that CDI, and other related technologies that tie into the actual devices being used, will become one of the most effective, powerful tools in preventing online fraud and abuse. As long as CDI is used responsibly, including maintaining concern for where and how PII elements fit in to the picture, consumers and businesses alike will see significant benefits from this technology.

Could Shared Monitoring be the New Compliance?

Recently, as detailed by Anthony Freed of Information-Security-Resources.com, Larry Clinton of the Internet Security Alliance presented information to Congress regarding security and protecting privacy in cyberspace.

First of all, it is encouraging to hear that these kinds of discussions are being presented in D.C. Thanks to Larry Clinton and his team for representing these very important issues!

I agree with the feel of Larry’s suggestions — that it is not necessarily “compliance” that will resolve our concerns, and that more practical means must be established. If this is so, I would recommend ongoing monitoring as the key. And if monitoring is the key, how does this affect businesses, individuals, and personal privacy? And what role does government play, if any? Can we balance good monitoring and security with privacy?

My laptop is monitored constantly by security software. In return for the service, I voluntarily give up some information. However, this information is about my system and not me personally (other than standard billing info, which is public anyway, minus the credit card data). Do you think a similar solution could be implemented business-wide, to help monitor and keep businesses free from harmful attacks?

Perhaps “compliance”, in such a model, would be gained by agreeing to opt in to the monitoring system. Going along with one of Larry’s future objectives – information sharing – threats exposed in such a system could become immediately beneficial to other businesses that are hooked in.

Some companies are already attempting this strategy. The general concept is to create a sort of “reputation” around the data elements of the transaction. The more unique the data elements and the more clients use (and contribute to) the reputation, the more valuable the reputation becomes. Reputation can be tied to elements such as an IP address (as with MaxMind), a client device ID (CDI, as with 41st parameter, Kount, or iovation), a credit card number (as with Visa’s neural network), and so on.

Ostensibly, the most unique and valuable data element would be the client device ID. It provides a much more concrete identification mechanism than the other, dynamic and changeable elements such as email address, shipping/billing address, name, phone number, etc. Thus, gathering these – and especially sharing them – would provide an excellent foundation for a monitoring system. Ideally, both government and private sectors would contribute to the system, which would provide real-time updates and warnings concerning devices that were previously known to be used in fraudulent activities.

But what of privacy concerns? An intrinsic benefit of CDI is that it does not hold Personally Identifiable Information (PII) within it. You’re just looking at the device – and ideally the reputation surrounding it – rather than the person or private information behind the device. The privacy concern becomes moot. Granted, any client looking at the transaction has private information on their end (a retailer looking at the invoice, for example), and they could easily connect the PII and CDI together for their own purposes, but the PII portion would not be shared within the overarching monitoring system.

Moving full-circle back to the role of government, were they to adopt such a monitoring system and require that businesses take part in it as a requirement for a new kind of security “compliance”, we might see a positive shift from the bookshelf-breaking paper-based compliance of the past.