Everything you ever wanted to know about Yahoo’s Hadoop spinoff Hortonworks

Hortonworks, the Hadoop startup that spun off from Yahoo(s yhoo) in 2011, has been a bit of a mystery. Top personnel have changed places without so much as a press release, venture capital investment hasn’t been disclosed, and there are semi-regular rumors about the company spurning acquisition offers (namely from Microsoft(s msft) and Intel(s intc)).

During a wide-ranging interview last week, Hortonworks CEO Rob Bearden touched on all of these things, as well as the even bigger question of whether it’s possible to make money in the Hadoop space. Here’s what he had to say.

On fundraising

Actually, there’s not much to say here that this slide doesn’t tell (near the bottom): Hortonworks raised $23 million from Benchmark and Yahoo before launching in June 2011, a $25 million round led by Index Ventures in November 2011, and a $50 million round led by Dragoneer Investment Group and Tenaya Capital in June 2013.

Bearden said he’s never shied away from disclosing this information. Maybe no one ever thought to ask.

HW 1

On acquisition rumors and plans to go public

“If we were gonna sell it,” Bearden said, “I would have sold it a long time ago.”

Rather, he added, “I started this company from day one to go public.” Everyone he has hired, all the money he’s raised (and from whom he’s taken it), every product decision — they’ve all been carried out in furtherance of that mission. Bearden thinks the company is actually ahead of trajectory and could IPO in five to seven quarters.

“I’ve made it extraordinarily clear to the partners: … ‘Make no mistake, we’re not here to try to finesse you to make us an offer,'” he noted.

HW 3

On profitability and partners

Hortonworks operates the company — which has more than 240 employees — and manages its cash in an “extraordinarily efficient” way, Bearden said. In fact, he added, “We’ll actually cross over into cash-neutrality early next year.”

Aside from its $98 million in venture capital, the company has more than 120 paying customers, with about 75 under “production subscriptions.” Support subscriptions account for about 70 percent of revenue, while training and consulting account for the other 30 percent. Hortonworks, of course, is entirely open source and releases all of its code back to the Apache Hadoop project.

Part of that 30 percent comes from Hortonworks’ more than 140 technology partners. Some of them — such as Microsoft(s msft), Teradata(s tdc) and Rackspace(S rax) — have based their Hadoop product lineups on the Hortonworks Data Platform and could help spur a significant increase in its market share against chief rival and presumed market leader Cloudera. In the near future, Bearden said, the company will announce some “tier one of the tier one” companies as partners.

In three or four quarters, Bearden said, Hortonworks could reach a point “where the channel really is generating a more significant portion of the revenue.”

HW 2

I thought Bearden gave a fair answer when I asked what would happen if a company like Microsoft — full of smart engineers and already building some of its own tools atop Hadoop — decided it didn’t need to pay Hortonworks anymore. He acknowledged there’s always the “threat/option” that could happen, but noted that Microsoft already tried that and came around on open source. And he said, working closely with Hortonworks means Microsoft has a better chance to see its wishes incorporated into the core Apache Hadoop code.

Of course, Hortonworks’ business model is entirely open source. “If we ever don’t add value,” Bearden said, “they’re never in a position where they can’t just take [the code] and move it forward.”

On how customers are using it

The good news for Hortonworks, Bearden said, is that the majority of the Fortune 500, along with many smaller companies, are already doing pilots with Hadoop and getting familiar with how it works. Hortonworks’ sales cycle really kicks in when customers need helping taking it to the next step.

“If we have to sell somebody on why they should use Hadoop,” he said, “that’s a bad place for us to spend our time.”

Hortonworks Vice President of Strategy Shaun Connolly was also on the call with Bearden, and he added some insights about how those customers are deploying Hadoop. More interesting than the industry segments, which are in the usual-suspect categories, are the growth patterns: For new applications or extensions of existing applications, Hadoop clusters usually start around 10 nodes and move up to about 80 nodes. A lot of line-of-business deployments fit this bill.

In bigger environments, where customers want Hadoop to act like a company-wide data repository, Connolly said clusters usually begin around 50 to 100 nodes and can grow into the thousands of nodes.

HW 4

On co-founder Eric Baldeschwieler’s departure

Baldeschwieler’s transition to CTO from CEO really was planned, Bearden said, and turned out to be a pretty smooth transition. But it was rushed. “The company and the tech moved very, very quickly,” Bearden said.

It became a realization much faster than anyone anticipated that building Hadoop to be an enterprise platform and building it to run workloads at a web property are two very different things. The technological skills are the same, but enterprise customers have different expectations. That’s also why Hortonworks brought in an “enterprise-grade” vice president of engineering in Greg Pavlik, Bearden noted.

As for Baldeschwieler’s decision to leave earlier in August: “Very candidly … Eric just got frustrated that he didn’t have the span of control that he once had.”

I’ve reached out to Baldeschwieler for a comment, but have not received a response.