Geek Out: Crowdsourcing Data Science With Kaggle

Remember those science fairs back in junior high school?

Depending on how geeky you were, projects ranged from growing mold on potatoes (not geeky at all) to determining the thermodynamic solubility of nanotubes that could be used to build a bridge to the moon (uber-geeky and an actual science fair winner).

Enter Kaggle, a startup that runs grown-up competitions for data scientists.

These brainiacs work to solve a company’s data questions in an attempt to win the company-sponsored financial reward or just for the pleasure of showing off.

The idea is that a little crowdsourcing competition will come up with the best predictive and analytic models for things such as traffic forecasting, chess ratings, and predictive medical diagnoses.

Related Article: Code Kings: 10 Places Your Team Can Develop Programming Skills

Founded in 2010, Kaggle boasts a community of “tens of thousands” experts from over 100 countries and 200 universities in such quantitative fields as computer science, econometrics, statistics, math and physics, as well as in a variety of industries including insurance, finance, science and technology.

In addition to the competitions, these experts use Kaggle as a sort of “office water cooler” to collaborate and exchange ideas. There’s no membership fee to join Kaggle; the company makes its money by charging the sponsor of each competition to solve a data problem. Kaggle also offers competitions for academic institutions free of charge, which helps further broaden its community of experts.

Why would a company pay Kaggle to run a contest? For the same reason startups launch Kickstarter or other crowdsourcing campaigns—greater access to skilled and interested people without an associated growth in overhead. As ZDnet reports, the platform provides a multitude of approaches that can be winnowed to the technique that ultimately proves “best of breed” and most effective.

Instead of capital, companies are looking for top experts to solve a problem; Kaggle not only provides those experts without the high costs of consulting fees, it provides experts competing against one another to come up with the best, most elegant solution. The motivation isn’t just getting paid for providing a service, but proving your expertise against your leading peers.

Cade Metz in Wired quotes University of California professor David Kirby and Kaggle competition winner as saying, “The prizes are relatively small. You’re doing it for the challenge. And the glory.” It also fosters rivalry that pushes everyone else, which is why Kaggle maintains a leaderboard for each competition as new answers are submitted (Kaggle allows proposed solutions to be continually revised right up until the contest deadline). Momchil Georgiev says that whenever someone takes a top spot on the leaderboard he’s thinking, “what do they know that I don’t? And I push harder.”

Competing for Top Ranking

As Thomas Goetz points out in The Atlantic, “That everyone-has-a-chance ethos means that any competitor, no matter how isolated they may be, can be judged by their talents against the top rank of the field…Kaggle ranking has become an essential metric in the world of data science. Employers like American Express and the New York Times have begun listing a Kaggle rank as an essential qualification.”

Indeed, Facebook uses Kaggle competitions as part of its recruiting strategy. Its first competition attracted 3,550 entries. The prize: an interview for a job as a data scientist at Facebook. Facebook not only gained insights into its social network data from multiple perspectives, it also had a recruiting tool to identify “best of the best” prospective candidates.

Related article: Uncovering New Revenue Stream Through Data, Predictive Analysis and Personalization

A World of Experts at Your Disposal

Even if you aren’t hiring data scientists, the pool of data scientists and breadth of expertise available through Kaggle is likely to far exceed any company’s internal resources, and at a fraction of the cost. Consider Allstate, one the world’s largest insurance companies. For the price of a mere $10,000 prize, Kaggle came up with a risk algorithm to predict the probability of an insured having an accident that was a 340 percent improvement in predictive accuracy over the company’s best internal algorithm. As Peter Daimandis points out in XPRIZE, the results were shocking: “Allstate’s actuarial department is amongst the best in the world, and unless you are familiar with the insurance marketplace it’s hard to understand how huge of an achievement this Kaggle competition represents.”

While companies stand to benefit from tapping into Kaggle competitions, Kaggle itself has had some difficulty further monetizing its platform. As Wired reports, even with $11 million in venture capital funding, efforts to provide consulting services to offer end-to-end solutions that go beyond the scope of a single competition are floundering. The company has closed its energy consulting business and has cut staff. That may be in part because of the downturn in global oil prices and, with it, a decline in energy research and development. It could also just be way ahead of its time. And only time—and the data—can tell.

What Next?

Recent Articles

Leave a Reply

You must be Logged in to post comment.