Why 3 celebrity data scientists are willing to work for free — for you

Hadoop-in-the-cloud startup Mortar Data is on a mission to bring recommendation engines to the masses, and it has recruited three well-known data scientists to aid its cause. On Wednesday, the company will start accepting applications on its website from companies that would like to have Mortar Data — as well as’s Hilary Mason, IA Ventures Scientist-in-Residence Drew Conway and freelancer (and former OKCupid data scientist) Max Shron — build a custom recommendation system for them.

The way it works, said Mortar Co-founder and CEO K Young, is that his company will choose eight companies (in addition to the two it has been working with already) to implement custom systems based on their specific needs and businesses. Mason, Conway and Shron will split their time among the 10 total companies, but will be much more than advisers — they’ll actually dig into the data and work hands-on to ensure the right techniques and algorithms are applied in the right places.

The applicant companies will keep any custom code, but the ultimate goal from Mortar’s perspective is to learn some best practices and create reusable building blocks that will let anyone create recommendation engines without pre-existing data science knowledge. Recommendation engines are commonplace on large web sites (Netflix(s nflx), Spotify, iTunes(s aapl), Google(s goog), Amazon(s amzn), LinkedIn(s lnkd), Eventbrite and the list goes on) but smaller companies can sometimes struggle to do them, or to do them well. Young hopes Mortar can establish an open source reference architecture of sorts that makes it easy to implement everything from building data pipelines to the actual algorithms that power recommendations.

“They’re really common and they’re really useful, but they’re really hard,” he said. “That’s why [a reference implementation] hasn’t been done before.”

They can get pretty complex, as evidence by this Netflix example.
They can get pretty complex, as evidence by this Netflix example.

Presently, Young explained, anyone wanting to build a recommendation system probably knows some of the algorithms to begin with and then gets to work researching how to implement them with specific processing frameworks (e.g., MapReduce) and on their specific data. Alternatively, they might have to hire a consultant that helps them build the recommendation engine. Either way, he noted, they’re probably not open sourcing it at the end because it’s presumed too valuable a competitive edge.

Mortar Data’s recommendation framework will be based on Pig, Python and Java, just like the company’s flagship platform for creating Hadoop jobs. Those languages will make the implementation more accessible and customizable by more people, Young said.

Really, he added, any web site or service that has multiple customers and deals with multiple entities — be they restaurants, songs, dating profiles, artisan necklaces, what have you — should have some sort of recommendation engine to help provide a more-intelligent customer experience. “It should become so ubiquitous that any service you go to knows enough about you to put forward the things you actually want to see,” Young said.

There is, however, one catch to Mortar’s plans as they stand: Because the service is hosted on Amazon Web Services, anyone interested in having Mason, Conway, Shron and Mortar work on their systems must have their data in AWS or be able to move it there. The initial reference implementation will likely be AWS-centric, too, but Young hopes contributors will use it and share methods for running it atop other platforms.

Feature image of Hilary Mason at Structure: Data 2011 courtesy of Pinar Ozger (