Apixio Is Bringing Big Data to Medical Records in the Cloud

Startup medical search company Apixio is trying to save lives by bringing advanced analytics techniques to medical records, giving doctors a patient’s entire relevant medical history via a simple cloud-based search engine. Underneath the familiar veneer, the Menlo Park, Calif.-based Apixio is applying machine-learning and natural-language-processing techniques — two techniques that caught the public eye during the hype over IBM’s (s ibm) Jeopardy!-playing Watson system — to analyze the entire medical records for each patient so doctors can get a sense of the patient’s entire medical history, even as they move from one doctor to another. The goal is to make information-sharing among medical providers not only more prevalent, but also far more intelligent.

The foundation of Apixio’s business is its Medical Information Navigation Engine (MINE),  a web-based application that analyzes structured and unstructured patient data to return relevant results when a care provider enters search terms. The way it works, as explained by CEO Shawn Dastmalchi, is that doctors and personnel within hospital ecosystems (i.e., the doctors, hospitals and clinics that operate under the same banner) upload data from their existing data-capture systems to Apixio’s cloud-based servers. The data can be pretty much anything, from forms to CT scan images to email messages. Medical personnel then conduct their search by patient, generally including a specific symptom or health issue, and the MINE system determines everything from that patient’s records related to that specific concern. MINE also utilizes semantic association to determine whether multiple results are the same thing and presents them as a combined result, so the searcher isn’t unnecessarily overwhelmed by data. According to Vishnu Vyas, a natural-language-scientist at Apixio, the end product is like Google (s goog) for doctors, only better, because it’s patient-centric and determines how data relate to one another.

Bob Rogers, Apixio’s chief scientist, explained the importance of machine learning and unstructured-data analysis in the medical field. He said because of the proliferation of ontologies — area-specific terminology for everything from billing to scan results — any sort of search engine must be able to create degrees of association between the various ontologies, as well as common language. For example, when a doctor types a patient’s name and “chest pain” into the search box, MINE is able to find ontological references to chest pain that bear little resemblance to the actual term.

Apixio is aiming to expand its capabilities to allow for population-wide searches and more-advanced queries, too, and it’s funding a project at Stanford University’s Center for Biomedical Informatics Research (CBIR) in order to help meet this goal. The CBIR team is applying “machine learning and natural language processing … approaches to unstructured data with the semantics encoded in medical ontologies to discover valuable knowledge from the unstructured portion of a medical record.” In layman’s terms, Rogers explained this project is focused in large part on being able to determine cause-and-effect relationships from information included the ontological data. Vyas says this might help hospital administrators, for example, determine who among their patient population needs to come in for a particular procedure based on a variety of seemingly unrelated factors. These could be particular drugs, symptoms or lifestyle choices that are associated with a particular condition, even if the patient hasn’t been diagnosed with the condition.

IBM actually has been talking a lot lately about taking the question-answering technology that powers Watson to work in the health care field, but although it shares underlying technologies with Apixio, the two serve very different purposes. As I’ve explained before, Watson is better suited for situations more the Jeopardy! competition from which it became famous: providing possible answers (or diagnoses) based on specific questions. Instead of being populated with patient data like Apixio is, medicine-specific Watson systems will likely be filled with data from medical journals and other sources of general medical information.

Technologically, Apixio has a lot going for it in terms of big data expertise and cost-effectiveness. Apixio’s production servers run in Amazon EC2 (s amzn), although Dastmalchi says the company says it will eventually distribute that across multiple clouds for the sake of availability. It uses Hadoop and Pig for much of its analytic workload and the NoSQL Cassandra database as the infrastructure to serve search queries. The team actually has quite a bit of experience with Hadoop, because Vyas is a former Yahoo (s yhoo) employee who used the technology heavily, as is Board Member and “Platform Thought Leader” Farzad Nazem, who spent 10 years as Yahoo CTO before retiring in 2007. Ultimately, Rogers said, Apixio wants to be able to return search results from hundreds of millions of patient records at sub-hundredth-millisecond latencies.

At present, Rogers says Apixio’s customers have uploaded data for millions of patients. As of the close of its Series A round in July, Apixio had raised $2 million from unattributed sources.

CT scan image courtesy of Wikipedia Commons contributor Stevenfruitsmaak.