No, you don’t need a ton of data to do deep learning

[soundcloud url=”″ params=”secret_token=s-lutIw&color=ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false” width=”100%” height=”166″ iframe=”true” /]

There are a couple of seemingly contradictory memes rolling around the deep learning field. One is that you need a truly epic amount of data to do interesting work. The other is that in many subject areas there is a ton of data but it’s not just laying around for data scientists to snarf up.

On this week’s Structure Show podcast, Enlitic Founder and CEO Jeremy Howard and Senior Data Scientist Ahna Girshick address those topics and more.

Girshick, who is our first guest who’s worked with Philip Glass and Björk to create music visualizations, said there are scads of MRIs, CAT scans, x-rays created but once they’re used for their primary purpose — to diagnose your bum knee — they are then squirreled away in some PACS system never to see the light of day again.

All of that data is useful for machine learning algorithms, or would be, if it were accessible, she said.

Ahna Girshick, Enlitic's senior data scientist.
Ahna Girshick, Enlitic’s senior data scientist.

Girshick and  Howard agreed that while deep learning — the process of a computer teaching itself how to solve a problem — gets better as the data set grows, there’s no reason to hold off working with it to wait for that data to become available.

“While more data can be better I think this is stopping people from trying to use big data,” Howard said. He cited a recent Kaggle competition on facial key point recognition that  uses 7,000 images and “the top algorithms are nearly perfectly accurate.”

The reason companies like Baidu and Google say you need mountains of data is because they have mountains of data available, he said.  “I don’t think people should be put off trying to use deep learning just because they don’t have a lot of data.”

Enlitic is using deep learning to provide medical diagnoses faster and provide better medical and outcomes for millions of underserved people.

It’s a fascinating discussion so please check it out — Girshick will speak more on what Enlitic is doing at Structure Data next month.

And, if you want to hear what’s going on with Pivotal’s big data portfolio, Derrick Harris has the latest. Oh and Microsoft makes a bold play for startups by ponying up $500K in Azure cloud credits starting with the Y Combinator Winter 2015 class. That ups the ante pretty significantly compared to what [company]Amazon[/company] Web Services, [company]Google[/company] and [company]IBM[/company] offer. Your move boys.



Hosts: Barb Darrow and Derrick Harris.

Download This Episode

Subscribe in iTunes

The Structure Show RSS Feed


VMware wants all those cloud workloads “marooned” in AWS

Don’t like your cloud vendor? Wait a second.

Hilary Mason on taking big data from theory to reality

On the importance of building privacy into apps and Reddit AMAs

Cheap cloud + open source = a great time for startups