Friday, April 12, 2013

Louhia's Brief Introduction to Data Mining

I participated in a data mining related event today held at a local Protomo. The guys from Louhia Consulting Oy provided us a brief introduction to the topic. In this post I will go through some of the main observations made. Overall the presentation was well prepared and gave me some new insight to the topic.

I've discussed this topic before as I showed you how to use Node.js to scrape the web. Interestingly Louhia doesn't apparently perform web analytics or scraping at the moment. Rather they focus on deriving results based on client, and sometimes public, data.

Louhia is still a small and fresh (established 2009) company that aims to grow into a major player in the Finnish market. They are tool agnostic and seem to use whatever suits their purposes. Of the major tools used they mentioned at least SPSS, Rapid-i, SAS, Cognos, R, QlikView, pentaho and a few others. I have covered R earlier on this blog. Suffice to say it is a nice little language for statistical analysis.

When to Mine Data?

From business point of view it is not feasible to apply data mining techniques in case you do not have data in the first place. Data mining is something well established and mature companies may want to use to optimize their operation. They often gather data already. Data mining provides means to extract value out of it.

The Louhia guys made it clear that for data mining to be useful, certain criteria must be met. These criteria are the following:

  1. Data exists (need to mine something)
  2. Suitable mathematical models and algorithms exist. Sometimes these may vary based on domain and in cases you need to work hard to develop suitable
  3. There is a reasonable way to visualize the mined data. This is something that helps decision makers (such as you when you decide or not to go out)
  4. The insight gained leads to decisions

They put particular emphasis on 4. If there is no will to use the insight to support decisions, there is no much point in data mining business-wise. This means there has to be certain level of commitment on the client side as this affects the value proposition in commercial terms.

Case - Insurance Sales

They showed how an insurance company can benefit from data mining. They gather a lot of data by default so that definitely fulfills that history requirement. There are also models and algorithms suitable for the domain. And visualization of the data may lead to material useful for decisions. In this case the decision had to do with marketing. The insurance company wanted to know how it should spend its money on marketing a certain additional insurance.

By default it would have cost around 500 000 euros to launch the marketing campaign. This effort would have yielded a million in profit and a conversion rate of 3% (converted as in "bought insurance"). So in goes a euro and out comes two. Not that great a ratio. By using data mining techniques they managed to establishing 11 variables affecting the way people buy insurance (down from 86 originally). As a result they were able to provide a probability based on which a given person would buy the insurance.

This information lead to significant improvements in effectiveness. They decided to target people for which the probability of purchase was >= 50%. In practice this lead to a tenfold increase in conversion rate (from 3% to 30%) and decreased marketing costs to 50 000 euros consequently. Even with consulting costs (let's say 20 000 euros) this is a substantial improvement over the original.

I think this is a very good example of how to apply the techniques. And the techniques have their uses beyond the domain of business. For instance it was shown that they may applied to improve the way the healthcare system selects people for cancer screening improving the overall efficiency of the system. There are some ethical issues to consider of course but in theory data mining can be a great tool to utilize for these sort of things.

Conclusion

Data mining doesn't yield results always. Sometimes you can still end up with nothing. The data doesn't always hide patterns. You might just end up with random noise. Still, I'm very convinced that the techniques will be used in a larger scale in the future. At least in Finland the market is still a very emerging one. Pretty much only the larger companies have the interest and resources to leverage it to improve their business.

If there is one area that seems kind of hot for me at the moment, it's analytics. Apparently the guys get hired pretty fast and some into the gaming sector even. You can probably infer why. It's not just about web analytics. There's a whole world of analytics out there.