Enrico Signoretti chats with Poojan Kumar of Clumio on the concepts and ideas behind building data protection.
Poojan Kumar is the co-founder and CEO at Clumio. Poojan brings 18 years of experience in cloud computing and storage and is known for seeing an opportunity for change, innovating and capitalizing on it. Poojan founded and built PernixData that was acquired by Nutanix in 2016, he then served as Vice President of Engineering and Products. Earlier in his career, he was Head of Data Products at VMware and founder at Oracle Exadata.
Enrico Signoretti: Welcome, everybody. This is Voices in Data Storage brought to you by GigaOm, and I am Enrico Signoretti. Today we will talk about data protection. Data protection is cool again. There are a lot of startups in this space now. It happened a few years back with a bunch of startups rebuilding the idea of data protection, so getting back ups and appliances, simplified processes, and so on.
Now, we have a new wave of innovation coming from the cloud as well. I remember when I started doing this job. It was almost 30 years ago, and backup was the easiest thing that you can think of. You were able to stop your job to do a backup copy. When you had the issues with your files, you had the time to stop, think about your backup, retrieve your data, and start again.
Everything has changed of course, because capacities that we deal with today and the SLAs, everything. To talk about this topic today, I invited Poojan Kumar, CEO and founder of Clumio. Hi, Poojan, how are you?
Poojan Kumar: Good, Enrico. How are you?
I'm very fine. Thank you very much for joining me today. Poojan, maybe we can start with a short introduction about you and about your company.
Absolutely. So my background, really quick: [I’ve] been for 20 years in enterprise industry. I grew up, from a career perspective, in Oracle, building exodata for Oracle. That was something that I co-founded along with other folks. Spent some time at VMware after that, and then decided to go and do my first company as founder/CEO, which was PernixData, which was acquired by Nutanix four and a half years from its inception.
I spent a couple years at Nutanix leading their products and engineering initiatives and then decided to leave to jump-start Clumio about two years ago. That's been my journey, really quickly. I’m very passionate about building enterprise products and companies and we look forward to doing that with Clumio here.
Specifically on Clumio, we are a two-year-old company. We just came out of stealth a few weeks ago in August of 2019 and [we’re] really going after data protection in a very innovative and different way versus anybody else out there.
I started the episode talking a little bit very briefly about how backup has changed in the last 30 years. I don't know if you agree with me, but it was very, very easy at the beginning – also personal computing and the first several applications, everything was easy. You had time to stop and make a copy of your small database, whatever it was. Now and then, we started to think about backup windows and we started to think about continuous backup. The number of applications raised as well as well as the devices and complexity in general. We have challenges that, I don't know, even ten years ago that we didn't have, right?
Absolutely. More and more, I think even the computing paradigm itself has changed big time because if you look now at a typical enterprise today, data itself...is not just sitting within their data centers on premises. More and more applications are built in the cloud, so data's sitting there for a typical enterprise today, and also data is sitting in other software as a service (SaaS) platforms like Salesforce and Office 365 and many others. The data itself is very fragmented in today's enterprise.
Yeah, in fact, we have a lot of silos. We worked so many years. In the early 2000s, we talked about silos in a data center. We are going back to this silos concept, I mean, cloud silos, and data center is a silo, per se, even if now, it's virtualization to remove the silos in the data center, at least from the infrastructure perspective, but now we have another silo that is Amazon AWS, for example, or yes, you mentioned SaaS applications. Office 365 is a huge silo, right?
Absolutely right, and so all of these silos ultimately for a typical enterprise, somebody has to come in and essentially deliver something across all of these silos. These silos are no longer even co-located together where it is back in the day before virtualization came in. At least there were silos that were co-located and then obviously together and now, these silos are basically physically apart and sitting in different platforms. Somebody has to come in daily and put it all together.
They need the type of data that we are managing in these silos. We have different types of data that we manage in the silos. Everything gets complicated, not only on the fact that they are in a different location. There is different types of data. There are different types of SLAs. We didn't mention yet the fact that in some organizations, the mobile part of the client is very, very important, meaning now they count more than 60, 80% of the devices that they have in the organization.
How does Clumio address these kind of issues?
Yeah, so if you look at our vision, if you look at the last 20 years, every use case out there has gone through its own journey of getting SaaS-ified, so to speak. It all started off with something like Salesforce going in, delivering CRM as SaaS and then the journey kept continuing with what we're doing for HR management, so it's now doing it for your instant management and so on and so forth. Then mostly certainly, Snowflake Computing doing it for warehousing.
Our vision essentially was: if you combine these two things together where not only as we discussed, the data is no longer at one place for an enterprise and the fact that we are in a world where you need simplification and enterprises, more and more, want to go and consume SaaS services whenever possible and really free themselves from the mundane, so to speak. Who in 2019 raises their hand and says I want Microsoft Exchange on premises and I want to buy a bunch of Dell servers and really deliver email to my enterprise? Nobody does that. People want to go to the cloud to either G Suite or Google 365 and basically get email for their enterprise; same thing for CRM as we discussed. People go to Salesforce and get CRM for their enterprise.
We basically said, “What if you could do something similar for essentially delivering backups and data protection to the enterprise and doing it not only in a manner that is truly service-ified but also doing it with a single pane of glass across all of the data sources: on premises, in cloud, and SaaS?” That was the founding vision of Clumio in terms of going in and building the service on top of the public cloud and really delivering a secure enterprise backup as a service across all of these data sources. That's like our step one in terms of what we have started doing.
Let's concentrate a little bit on ‘step one,’ because I know that there is a ‘step two’ in your vision. The step one means – just to do a quick recap, you have this SaaS solution that is now able to back up almost everything on premises, on the cloud, and what else? The problem is, from my point of view – and I want to challenge you on this. When I do backups of my on premises installation, I need performance. I'm okay with the performance on cloud-to-cloud, but actually when I do something on premises, there is a lot of trouble. There is a lot of flights.There is a lot of terabytes or sometimes even more to back up. How do you address these kinds of problems?
Yeah, so that's where I think if you look at it last couple years, we've been really going and building the platform the right way. If you look at the data problems at the peak, you have to go and essentially build on all the technologies that effectively can utilize the network really well and really be able to go and deliver reduction technologies like deduplication and compression across in multiple use cases really well. Doing this as a service required you do it in a multi-tenant fashion, so to speak.
When you look at a solution like ours, we have spent a lot of time and energy to make sure that there's so much efficiency in the system. Obviously there is one time you have to go in and push data into the cloud, which is the seeding state, so to speak, but after that, capturing changes and only making sure the changes are getting shipped to the cloud and making sure that the changes are getting deduplicated and compressed, so effectively transferring a byte only if it absolutely needs to get transferred and doing it across data centers within the same customer because obviously, our straight management is in the cloud, so we can dedupe across data centers and so on and so forth. All of the technologies have to get built to truly do it in an efficient manner.
Obviously we’re also relying on one other thing that has come a long way in a few years, which is; if you look at another network and thanks to all the investments on the big public cloud vendors – AWS Direct Connect, which is a 10-gig network to the cloud. It now boasts about 15,000 customers. For not big dollars, you can effectively get direct connects to the cloud. We see all of our enterprise customers basically having it. Utilizing that – the network bandwidth that they already have to the cloud – but utilizing it effectively is what enables some of these things to happen today.
The answer is optimization on one side and availability of more total bandwidth on the other. At the same time, I really love the fact that technology's evolving and gives us more opportunities to develop these new models. On the other side – so what about security? How can I be sure that dealing with a SaaS provider, putting all my data in your hands, is safe enough for my organization?
So the thing is you can look at our thinking. We have put security first. One of the reasons we actually won the Best of Show Award at VMworld in August was because we look to basically put security front and center in our offering. Some of the things we do in terms of making sure that any data that leaves the customer environment is encrypted in flight; it's encrypted at rest, so to speak, in the cloud. We also have built our own compliance initiatives and we've done all of penetration testing on the SaaS platform. We've shared our penetration testing reports with our customers so on and so forth. We have basically put so much emphasis on security to make sure that customers who are on-boarding on our platform do realize that our posture is ‘security first.’
Again, by having my data already copied on the cloud, so it's no longer on my premises, even though we kept copies, so I can think about it as I am a step closer to having a disaster recovery solution.
Absolutely, that's the other thing. The fact that the data exists – one of our early customers was the city of Davenport, an interesting story. Immediately after they bought our service, they effectively had a big disaster there where there was all the flooding and things like that in that city. Guess what? Because of all of that, at the end of the day, their data centers and stuff like that, they had to go through some outages, but they were sure of the fact that their data, which is the most important asset, is already safe because it was sitting in the cloud with our service.
The fact that you have a service like that that is essentially taking this over basically provides you with this ability. It's not a true data recovery service yet because we don't allow you to run the environment completely; we don't allow you to do fill back and all of those things, but the fact the data is there is a lot of peace of mind that you can essentially go in and instantiate things and do things yourself, which historically was expensive. People had to buy extra hardware, extra infrastructure, extra data centers to essentially have these things before.
Another thing that maybe this is the step two that we didn't talk about yet, but actually having that on the cloud opens up other opportunities for the future. The fact that you can consolidate everything, SaaS, on-prem, other devices that are in the field and all together in a single, large repository is that there's practically unlimited in terms of capacity, gives you other opportunities for the future.
The first thing that I have in mind is data management. You can enable with this huge repository analytics and many other things. You can crawl in your data and find things to find out what is happening, what your users are doing and probably more.
Yeah, and the other thing that's really the big opportunity that we’re after. When we chose data protection, ultimately we are thinking more to go and build a data management in a platform and really deliver services on top of this platform that really help the customer to get more out of the data that we are managing and we are protecting on their behalf.
The reason this becomes possible for the first time – because we know a lot of vendors that had this vision on-premises site, but they were limited because of this piece of hardware and it could do only this much on the piece of hardware. They couldn't take a real big advantage because the piece of hardware was already committed to do its job, so to speak, but we don't have that problem. The baby of architecture which is cloud-native essentially sucks in all the data and is basically the data sitting in the object storages of the public cloud, but ultimately behind the scenes, we have the ability to instantiate the data, summon a bunch of compute from the public cloud, at a fraction of the cost, and really deliver under adjacent services on top of the data, the analytic services we can deliver.
We tell customers more about their own environments, – what's being used, what's not being used, being able to index all of the data that we do today to really provide file-level recovery on top of the platform but really go in, look at all the metadata, and really give that intelligence. More and more, the intelligence becomes even more important because the data sources themselves are so disparate and especially in the public cloud world, it's a wild, wild west right now for a lot of enterprises because they don't know what's really going on.
An ability to put all of these things together at one place and really leverage the compute capabilities of the public cloud is where our step two becomes super interesting. You'll see basically – we have a ton of things to do, obviously, to nail the data protection piece right now in terms of where we are as a company. We're a very young company, but that's where you'll see us going more and more, delivering new and interesting things on top of the platform we have built.
Every time I think about these things, the very first thing that I think of is GDPR or similar regulations that are popping out everywhere in the world. It's almost two years in Europe now, but a similar regulation is coming up in California. Having the ability to understand what you're storing maybe gives you also information to understand if you are compliant or not, or security around Ransomware attacks.
By analyzing what is happening day after day on your backups, you can understand if you see contents? on how the files are changes. If there are too many files changes in a single say, maybe there is something wrong or even more than that, I don't know. Imagination is the only limit when you have all your data. It's more a data sciences kind of problem than a backup problem, right?
Absolutely, and again, I think you can look at some of these other use cases that we can power on. Not all of this needs to get built by us. This is where we can essentially harness some very interesting partnerships in the cloud where we have built this platform, data sitting on this platform, and then your API is available to the platform that other vendors can effectively run their own apps, so to speak, on this platform. Some apps that we built, some that delivered by some of our partners, and that's where it becomes a platform.
Oh, right, in fact if you expose your API to third-parties and your users directly, some of them could find for a very, very niche use case useful to your APIs. That's always a great thing to do. The platform today gives you backup for on-premises environments. I'm sure it covers VMware environments, for example, but looking at the entire vision that we talked about, what are the next steps for Clumio? What can we expect in the next 6, 12 months from the company?
Yeah, so what we delivered in the past, it's an area minimum right now. The services is GR for a few months now. We just basically started off with VMware on-prem. Very soon after that, we added support for VMC, which is the whole VMware on AWS initiative that VMware has had for a couple years now. We are seeing some tremendous amount of traction there. As you can imagine, the VMC fundamentally relies on the fact that it's a service; it's running across multiple regions on AWS, obviously completely managed by VMware. In order to take a legacy solution and run it there and manage it yourself and basically back it up, it doesn't work like that.
The fact that we can essentially deliver that one-click service to essentially protect the VMC environment is huge. We can do it across regions, ultimately across clouds, too, which is where a lot of customers using VMC are seeing us as the only solution that can truly provide backup for all the mission-critical workload that they're going to VMC, and they want to move it into VMC. Any time somebody wants to move some mission-critical stuff into the cloud, guess what? Backup becomes very, very important. That's something we already done. We already have support for VMC and we’re working very closely with our partners like VMware to really give joint great solutions to our customers.
Then as you get into more and more around applications built in the public cloud itself, you'll see some very interesting announcements come out from us by the end of this year around the re:Invent time frame which really allow people to do something similar in a full application built only AWS. It's a different world, as we all know. Public cloud and applications built within the cloud using native services in the cloud, – being able to run across multiple accounts, multiple regions – you can't expect a legacy solution is something a customer will want to use because they'll have to do and manage and upgrade and install the software, run it 24-by-7 across their accounts, across their regions. It doesn't scale. It has to be delivered by a true service, which is what AWS takes a lot of pride at. You’ll see basically us making a big dent and essentially making our first foray into the world of native public cloud, not just VMware.
That's something you will see come from us. Then as we get into next year, we'll start going in the direction of the SaaS apps that we discussed in the earlier part of the conversation and really go and deliver the true single pane of glass across on-prem that we are doing, the cloud-native that we'll do shortly, and then our first SaaS app.
Then obviously the service is only in the US right now. There's a ton to do to seamlessly keep going across all the regions that AWS is bringing online literally every few months, and ultimately we're expanding the platform to be a true multi-cloud platform, which is how we designed it from the get-go and really go and do something similar on other public clouds because we've got to deliver this as a true multi-cloud platform. Obviously at the end of the day, there's only one top of the public cloud. That's what you'll see from us in the next few months.
The most intriguing part was the fact that you're working on doing data protection for the public cloud correctly. I mean, there are several solutions already out there that claim that they protect workloads on the public cloud but actually, if you look into them, they are just doing copies of EC2 machines and it is not enough. Most of the data is not in those instances. If you will be able to work and to make it better, you need backup databases in the cloud and especially in a complex environment, as you said, multiple user, multiple location, multiple everything. That could be a game-changer from this point of view. There are a lot of enterprises waiting for solutions like that.
Exactly and a true full solution in the cloud also has to do it since you're not just a glorified snapshot manager. Basically just building a piece of control plane software that, by the way, the customer still has to manage and do it across accounts, across regions, but really just managing snapshots is not the way to deliver true backups because then you're not truly protecting your environment, you’re protecting on premises. This is a lot of awareness right now about all of this.
Now there is a lot of awareness about backing up their SaaS solutions. People incorrectly assume that yeah, I'm using a SaaS solution and I know my data is backed up. That's not true. It's available; that doesn't mean it's backed up, especially if there was an attack or there is mistakes and stuff like that. Those are the things. We are basically going and attacking the problem the right way and delivering the right solution for our customers.
Can I ask you a little bit more about the internal architecture of the product? You mentioned AWS many times, so it's developed on top of AWS. How does it work?
Yeah, so we have announced it today as the top of AWS, but if you look at our underlying architecture, there's nothing that ties us to the public cloud. We are basically deliver[ing] it with a set of abstractions internally so that we can essentially go and deliver it as a multi-cloud platform as we keep going. That's how we have built it internally.
Whenever we access a service we have our own abstraction layer on top of it so we can essentially tie with that abstraction layer to a different cloud if need be. If you look at how we have done this, – we have basically done four things here. We have basically taken the functionality of the backup itself, all the scheduling, policies, tying it to data sources and stuff like that. That's one piece that a typical backup software vendor does.
Then we have basically gone – number two is we have built this deduplication, compression, encryption engine on top of the public cloud, on top of object storage at scale that was traditionally done in a piece of hardware on the on-premises side. Then we have gone and delivered all of these pieces as a true multi-tenant solution so that every customer, when they on-board on the platform, they're obviously thinking of this as their own tenant but ultimately behind the scenes, it's a multi-tenant platform and hence, we can on-board a customer in less than 30 minutes because we have truly built it like a multi-tenant platform from the get-go while delivering all the features around a Google search for the data and data recovery and so on and so forth.
Last but not least, we have delivered all of this as a true service offering. Today, we have the service going live, live getting updated with a new feature around bandwidth totaling where customers can essentially total the bandwidth they want us to use in the BFE network connection, stuff like that. You get this feature without lifting a finger. You get this feature without even logging out and logging in. You just get this feature by refreshing your screen and suddenly, you see this pop up.
Delivering as a true service requires all of the CICD internally so that when developers write all that code from their laptop, it is showing up in the service in the cloud, which was historically only done by the Netflix and Googles of the world on the consumer side. We are basically delivering this on the enterprise side. All of these four things coming together is what basically is the Clumio platform as it stands today.
It's very interesting that you build the platform to be multi-cloud because in the medium long-term, it will allow your customers to save also on the egress fees, so if you can deliver the platform from Amazon for Amazon customers, it will be less expensive than having it deployed on Amazon for Azure customers. You can match the customers with the platform that they use the most probably.
Absolutely, and that is almost like a table stakes in the cloud world. You cannot expect the customer to essentially bear the egress cost. That's a non-starter. Obviously the customers who might want it in terms of ability to pay extra to have that cloud-to-cloud, that's the one class of customers, but for everybody else, you have to provide another same region, same cloud experience. It also, by the way, requires you to truly scale out the platform so that it can run across multiple regions where there are GDPR reasons that you pointed out and there's reasons around egress to do which you have to truly do it so we can essentially run across the tens of regions on each cloud that are available today and around the clock.
That was a very, very nice conversation, Poojan. I really appreciate your time today. Maybe we can wrap up this episode with a few links on Clumio and maybe you can also share with us your Twitter handle if someone wants to keep the conversation alive on Twitter.
Absolutely, you can find a lot about us obviously on our webpage, Clumio.com. Our Clumio Twitter handle is pretty active. ‘Clumioinc’ is our Clumio Twitter handle. My personal handle is my first name. It's a very unique name, looks like, so I managed to get my personal handle. @Poojan is my personal handle, so yeah, I'd love to continue this conversation and Enrico, it was a phenomenal conversation I've been wanting to do for many years, so I really appreciate doing this. Thank you very much.
Thank you, and for our listeners, if you want to know more about data protection in a hybrid cloud environment, I'm working on research for GigaOm that covers exactly these topics. It will be out at the end of the year, so stay tuned for more information about that. Bye-bye.
- Subscribe to Voices in Data Storage
- Google Play