Database startup MemSQL said today that it open sourced a new data transfer tool called MemSQL Loader that helps users haul over vast quantities of data from sources like Amazon S3 and the Hadoop Distributed File System (HDFS) into either an MemSQL or MySQL database.
While moving data from one source to another may seem relatively straightforward, there’s a lot of nuts and bolts in the process; if one thing goes awry, the whole endeavor can fail. For example, if you’re trying to move over thousands of files and one fails to transfer for some reason, you may have to start the process over again and hope all goes well, according to the MemSQL announcement.
MemSQL Loader is essentially an automation tool that lets users set up multiple transfers and queues that can restart “at a specific file in case of any import issues,” the release stated.
From the MemSQL blog post explaining the tool:
[blockquote person=”MemSQL” attribution=”MemSQL”]MemSQL Loader lets you load files from Amazon S3, the Hadoop Distributed File System (HDFS), and the local filesystem. You can specify all of the files you want to load with one command, and MemSQL Loader will take care of deduplicating files, parallelizing the workload, retrying files if they fail to load, and more.[/blockquote]
The new tool is available in open source through the MIT License and can be downloaded at GitHub.
MemSQL has been on a roll launching new tools and features since its 2012 inception. In September, Gigaom’s Derrick Harris reported that MemSQL now supports cross-data-center replication, which is good for disaster recovery in case a database takes a hit; cross-data-center replication also helps distribute the load across two data centers, which could cut down on latency and boost performance.