16. Kaggle: Getting Quant Brains to Play Data Games

Context

Four years ago, 28-year-old Australian Anthony Goldbloom won “The Economist” annual essay writing competition. The prize was a 3-month internship, during which Anthony worked on data science and big data. Even more exciting was that he got to interview a lot of people from different companies for his market research. Before long he realized that many of these companies were in dire need of strong and efficient predictive modeling. The responsible taskforce was just not resilient enough to handle this either. This made him think about a business model where the companies would be charged for work based on meritocracy. Drawing upon this idea and his work experience in macroeconomic modeling, Anthony created Kaggle in 2010.

For the ones to whom the name Kaggle1 doesn’t ring a bell, it is a competition-based platform for predictive modeling and analytics. Companies and researchers can publish their data and proposed problems. Top notch statisticians and data scientists (also known as Kagglers) from around the globe will produce models based on the same. This makes sure that an array of solutions is available for the companies to choose from. Right now, Kaggle has around 200,000 scientists from around the globe.

1The name “Kaggle” was selected from a list generated with the help of an algorithm that iterated phonetic domain names.

Kaggle received universal acclaim in a very short period. Anthony was named twice in the Forbes’ annual “30 Under 30” list of young technology leaders. Fast company presented him in their “Who’s Next” series as one of the innovative thinkers who are changing the future of business. He was also a featured speaker at the 2013 Data 2.0 Summit.

WOW: How Does Kaggle Handle the Boggle?

Kaggle started first as a crowdsourcing website for data science competitions. After its move to San Francisco, Kaggle introduced “Kaggle Connect,” a new platform which links companies to the growing community of data scientists. Thus it moved from a crowdsourcing website to a marketplace. Competitions are still held prior to the scientists’ entry to Kaggle platform. The data scientists are ranked objectively and the best of them are invited to enter the platform. They are matched up with client companies. By measuring the accuracy of the end models or solutions, the performers are rated. The best performers can monetize their expertise.

The Kaggle scientists who use the platform are mainly of three classes: The ones who participate in competitions mainly for fun (and not income), academics who want to get experience in dealing with real-world problems, and finally, people who rely on Kaggle and their Kaggle reputation as a full-time income source. The platform is totally free for data scientists. However, the companies have to make payments to Kaggle for availing their services.

SO WHAT Makes Kaggle a Positive Outlier?

Kaggle offers an exciting, innovative venue for finding solutions to data science problems. When compared to other crowdsourcing solutions in the market, it stands out due to the assortment of key roles it is playing.

Kaggle as a Marketplace

Kaggle offers a platform in the labor market for creating a multitude of solutions from big data, allowing the companies to choose the optimal one from among them. Such choices were earlier complemented by gut feeling and intuition. Kaggle thus offers companies the stage for doing predictive modeling and consequently, making objective decisions.

Kaggle as a Benchmark

Kaggle has also replaced traditional resumes as a more substantial, and valuable indicator of proficiency. The work and value in marketplace have become quantifiable, both in terms of outcomes and process. “Kaggle ranking” has become an essential metric in the data science scenario today. Some companies, like American Express and New York Times, are listing Kaggle rank as an essential qualification in their job advertisements for data scientists.

Kaggle as a Job Market/Recruiting Platform

Kaggle has also solved the problems related to cost of hiring the best people in the field in data science. Through Kaggle, companies can avail of the services of the smartest people in the world. The success of Kaggle has started attracting entrepreneurs from other fields, ranging from designers to doctors, to follow his example. Many new disruptive market places, quite similar to that of Kaggle, are now burgeoning. “99 Designs,” a content-based community for designers and “Healthtap,” a community of doctors who use their spare time to give health advice, are two examples.

Kaggle Compared to Adjacent Companies

When compared to adjacent companies in its same domain, Kaggle improves upon the open innovation models of “Innocentive” and “Ninesigma” by introducing the element of competition. Competitors can see the results of others in real time, which accelerates motivation levels. Besides, when compared to the latter companies, Kaggle markets crowdsourcing more as a career choice than as a hobby.

Kaggle as a New Corporate Style

The corporate approach of Kaggle is state-of-the-art as well. Statistical modeling has not yet been able to process big data. For the companies who are aware of, and interested in predictive modeling also, deciding on one best option might be difficult. With Kaggle competitions, these companies can match their needs with a variety of solutions, so as to find the best approach. Thus the ultimate quality control of the solutions/research designs proposed in the Kaggle community is by the customer companies. Goldbloom compares this strategy with the architectural design competitions conducted for deciding big property development contracts. Kaggle could be quite useful for companies in a vast range of arenas, from banking to law. Table 16.1 discusses some of the competitions in Kaggle.

Image
Image

Table 16.1 A Few Examples of Kaggle Competitions, Compiled from Company Website

OOMPH: Kaggle as a Crowdsourcing Venue

Unlike crowdsourcing companies in more established fields (e.g., creative writing, pharmaceuticals, etc.), Kaggle is creating a disruption in an emerging new field. Kaggle offers a chance for professionals to compete and contribute in the field of data science, irrespective of their location or status. Figure 16.1 shows some examples of customers from different fields and the results achieved via Kaggle. Apart from providing a platform for the resource pool, this also helps participants by offering them opportunities to exchange techniques with others, and thereby advance their own skillset. Besides, unlike in temp agencies, meritocracy (and not bureaucracy or job seniority) is the philosophy behind Kaggle. Kagglers are thus not at the bottom of the job pyramid, rather they are at the apex.

Image

Figure 16.1 Some Kaggle customers and the results delivered, compiled from published sources and company website

Challenges

One challenge Kaggle faces is the doubt which companies have regarding the Kagglers and their suitability in solving industry-specific or company-specific predicaments. The main question is how someone with no background in their particular industry or domain could solve such problems. Anthony replies to these skeptics by pointing out that Kaggle is a collaboration between the data scientists and the experts. The latter brings in the business content whereas the former joins forces and offers valuable solutions based on this content and data.

However, Kaggle changed its stance regarding this issue in January 2014, when it decided to move from its generalist outlook and focus more on one particular industry, Oil and Gas. “There is a lot to learn in each area,” says Mr. Goldbloom. “We will still do competitions, they are a great engine for finding talent. But our returns as a business will be higher if we focus.” (Excerpt from The New York Times interview.)

Even though it cannot be stated as a challenge per se, another criticism against Kaggle is that it is just another spin on crowdsourcing, which has been around for decades. Quite recently, many other online companies have used crowdsourcing for getting people’s jobs done. Kaggle however, easily overrules this criticism through two claims. Firstly, it does not incorporate work from everybody. Only expert professionals who want to compete and win join Kaggle. Furthermore, Kaggle does not create the incidental outcome. Rather it creates a novel, disruptive marketplace for work.

Yet another question thrown at Kaggle is regarding the sufficiency of the data in the competition package. What if there is more suitable data out there elsewhere? This indeed is a limitation of the competition model. Maybe one day, data processing would evolve to a level where data is available to all involved people, in real time. The drawback then would be the issues regarding privacy and data security.

Outlook

From its first day of conception, when Anthony could not afford to purchase a domain name for his website, Kaggle has made quantum leaps in the field of data science crowdsourcing. As Ben Hamner (Chief Scientist, Kaggle) says, with deep learning and single machines, anything and everything from analyzing images to chemoinformatics is possible today. Kaggle’s success story vouches this.

From the Perspective of Harikesh Nair, Professor of Marketing, Stanford Graduate School of Business

From an Interview by Dr. Markus Paukku, January 27, 2015

Kaggle is a novel platform currently able to take advantage of two market needs. There is an increasing demand by companies for a type of computational social science and statistical expertise that has primarily been the preserve of academia. Secondly, there is a labor market shortage of these talented data scientists. Indeed, companies rarely have the in-house combination of talent with both the domain knowledge and the requisite statistical know-how by which to leverage big data. Kaggle, together with its competitors, has been successful in filling this gap in the market.

Traditional corporate structures and cultures are not well equipped to experiment with innovative methods. Investing time and resources in novel means to solve problems is a challenge for many companies. Incumbent best practices and processes can even limit the management or bootstrapping of creativity within the organization. Companies have often tried to stoke creativity through skunkworks or through policies of budgeting time earmarked for innovation. However, many companies have found this approach to be an expensive endeavor yielding uneven results.

Moving a project outside the company’s organizational boundary allows external resources, unconstrained by a set of corporate practices and processes, the freedom to solve the problem. Typically this kind of external problem solving has been the domain of hired consultants. However, using a platform such as Kaggle allows for a company to conduct a massive search and leverage the power of the crowd to find a solution. Through Kaggle, multiple project groups can compete for the best solution—a feature rarely seen in the hiring process by which consultants’ firms are selected. The openness and transparency of the platform also allow for the external groups to cumulatively build their projects on each other’s solutions.

Using an innovative platform such as Kaggle does require some stretch by the client companies. The company must be sold on the value of big data analytics, overcome internal inertia, and be open to the implementation of external, crowdsourced solutions. This requires buy-in from ranking stakeholders on the client corporate ladder and a willingness to experiment with novel practices. However, first and foremost, before a solution can be found the company must have the capacity to identify and frame a problem that can benefit from big data analytics and the answers provided by Kaggle.

Going forward what does the future hold for Kaggle? It will take some time for the market to find equilibrium in terms of training the demanded data scientists and companies to develop in-house big data processes. Thus, Kaggle may see competitors enter the attractive market. Kaggle may address this challenge by splitting the verticals it serves and delivering more value by differentiating its offering by sector. Also, the growth of Kaggle’s network may well provide the platform with further network effects raising the switching costs for its current clients and making the company more competitive as an incumbent in a nascent space.

The nature of the analytics’ work demanded will also change. Currently data science is largely exploiting methods by which to identify correlation models in which X happens and an impact is measured on Y. However, in the future more nuanced questions will need answering—not just how, but why is something happening? These challenges will require not just today’s statistical know-how but specific domain knowledge, knowledge that is challenging for specialized data scientists to gain from outside companies and industries.

From the Perspective of Milena Mend, a Student of Organization and Culture at the University of St. Gallen, on Imaginary Futures

What Will Crowdsourcing Be Like in 2025? And Which Implications Can We Derive for Today?

I proudly present an interview with Anthony Goldbloom who founded Kaggle back in 2010. The interview takes place on the third of March 2025 in San Francisco and the interviewer is M. Mend, a journalist for international press.

Mend: Mr Goldbloom, thank you for your time, I am very glad to have you here. Now, Kaggle has had an outstanding success compared to other crowdsourcing business models and former competitors like Microtask or Quirky who have vanished by now. As the founder of Kaggle my first question to you is: How do you explain the ongoing success of Kaggle?

Goldbloom: Well, I think we put all eggs in one basket and it turned out to be the right one. The trend of big data and the need for scientific data analysis started about 20 years ago and has grown ever more since data quantity grew tremendously. On top of that, data quality also improved. As we were the first marketplace to connect data specialists with challenges from the private and public sector we were able to get the most talented workforce for the challenges. Today, we still have Kagglers working for us who started 10 years ago and these people crunch data through our website. They do this for a living. So I think one success factor is that our business model serves a solution to a problem that is widely spread but the knowledge is scarce. We offer the scarce knowledge. And the other success factor is our long-term relationship with our workforce.

The interview is based on the three cases of Kaggle, Quirky, and Microtask.

Mend: So what did you do different that you have actually a solid and loyal workforce?

Goldbloom: We work with experts and we value and understand their needs. Being a Kaggler comes along with a lifestyle and an attitude to life. Kagglers are highly independent, very curious, have high intrinsic motivation, and want to solve the big problems of the world. On top of that, they are very competitive and like to win. So first of all, we have always paid attention to the sort of challenges we put out into the cloud. Because what Kagglers value the most is the good and real data bases they get access to and also the challenges from different industries we announce. That was our lesson learned in 2014, when we tried to focus solely on the oil and gas sector. But we realized, our success comes from the variety of industries we collaborate with. So we stayed a generalist after all and this is how we work today still.

But, of course, if a Kaggler wishes to become an expert in one field, we offer enough challenges in each area to become a specialist. But we come to notice that Kagglers hate routine and after a while, if a Kaggler only crunches data from one industry, it becomes routine and he or she wants to switch the industry, which is totally fine for us. That is also the reason why only very little Kagglers ever decide to work for one big company exclusively.

Mend: In what ways does Kaggler education© add to your success?

Goldbloom: Oh, in a great way. We opened Kaggler education©, an online educational system, just when so-called MOOCs (massive open online courses) came to rise. We sensed that Kagglers are individuals that have scarce and unique skills which will rise in demand. So first, we started with educational and learning aspects for our Kagglers only and shared the knowledge of our best Kagglers internally. But in 2017, we decided to expand into education. The reasons were twofold. On the first hand, there are Kagglers out there who find it highly satisfactory to pass on their knowledge to people and we have an interest in keeping them happy so they continue working for us. Secondly, we sensed that school education does not have the necessary focus on big data analysis and predictive modulation in education—but we and the world need these skills.

So today, we can reach schoolchildren from all around the world with our classes through our online seminars. To promote our courses, we collaborate closely with schools. A school can buy our course in their education portfolio and we get great feedback. And of course, it is also a recruiting tool. We ensure that we get to know the best data scientists and if they want, they can join the great Kaggler community.

Mend: And then how do you explain that Kaggler was not bought by one of the three giants (Google, Facebook, and Amazon) that today almost exclusively dominate the Internet?

Goldbloom: Well, let me admit that we got offers for mergers and takeovers from all three of them. However, the model of success for us is not merging, but collaboration. We put out challenges from all giants and Kaggle is also dependent on the data basis and the real-world challenges that come from these companies. However, it is of interest from all of us that Kaggle stays an independent corporation. The reason for that is the workforce. Five years ago, in 2020, the official union of crowdworkers was founded. This union fights for the rights of workers and fact is that Kagglers value the heterogenity of challenges and datasets. They do not want to work for one company exclusively. They love their independent life and fight for staying independent.

Mend: I am sure you have heard from Quirky. Back in 2015 some people predicted a paradigm shift of product innovation and production. Why do you think Quirky does not exist today anymore?

Goldbloom: Quirky is a very interesting case and the founder Ben Kaufman is a good friend of mine, so I am happy to tell you about what happened in my opinion. I think the crux of the matter was that the basic premise of Quirky did not turn out to be true. Ben’s focus has always been on execution, whereas his real unique selling proposition were ideas and the access to a crowd that was motivated enough to submit ideas.

Even though the products that Quirky produced in the beginning were supposed to solve problems, the truth is that these problems were too nitty-gritty in order to hit big. They were personal problems and nice-to-have products, but not real paradigm shifters. Ben realized that he could never reach the economies of scale of the big players in any industry of product production. He realized that and that is why he decided to collaborate with GE. Out of this, WINK was created as a subsidiary. And this was the move to the right direction. So today, Quirky does not exist anymore. BUT the body of thought lives on in the various subsidiaries that serve as idea creators for the big players in the industries. BILAN is one of them, where ideas for the car industry are collected. The subsidiaries serve now as kind of need databanks of engaged customers.

Mend: And then what about Microtask, why is it not in the crowdsourcing business anymore?

Goldbloom: Well, Microtask was interesting, but the algorithm they developed in 2018 was so clever that they eliminated themselves in a way. They used to convert audio files and paper documents into digital files by scattering small tasks around the world and combining human intelligence with machine learning. They really pushed machine learning to excess and they finally managed to replace humans in the document processing industry. So today, they don’t need the crowd anymore. On top of that they were accused in exploiting their workforce. So it was a good idea of them to only rely on the algorithm which is now their one and only product. But as digitalization of documents is nowadays not a big need anymore, I am curious what they will do next to be honest.

Mend: Mr Goldbloom, on behalf of international press, I really would like to thank you for your insights and valuable contributions. I wish you all the best for your future and see you another time.

Goldbloom: Sure, no worries. Have a great time in San Francisco.

Jump in Time

The learnings for today (4th of March, 2015) are:

1. For the Outliers: Offering a marketplace and distributing problems to be solved is not enough. A marketplace needs to get loaded with products and challenges. Microtask had no incumbents, but abolished itself by technology. Quirky had to realize that it would never really be able to compete with the economies of scale of the incumbents and Kaggler had no incumbents and spotted a real trend.

2. For the Incumbents: The big stay big and get bigger and the smaller companies have to collaborate with them or get bought by them.

3. For the Workforce: Crowdworking or not so-called cloudworker will be the synonym to a lifestyle. One where one has a passion for something and can live it out through the Internet on all of the places of the world. Cloudworkers meet in shared working places and build a large community. They also contribute to a new way of tourism.

All in all what we learn from the outliers now is that if you are an entrepreneur, either make an idea big enough that one of the incumbents buys you (Quirky) or offer a stand-alone solution that really adds value (Kaggle). Don’t let technology win over humans, because then we eliminate ourselves (Microtask).

And my question for the future is what humans will do with all that free time, when algorithms and robots (like the robot vacuum cleaner) solve all our problems, work, and duties?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset