With ‘GitHub for data’, Gable.ai wants to connect software engineers and ML developers

Rate this post

Visit our on-demand library to view sessions in VB Transform 2023. Register here

AI applications are booming. But to prevent them from breaking down, the data flowing into those apps must be high-quality — that is, reliable, complete and accurate.

That’s the problem Gable.ai is poised to solve as the Seattle-based startup launches out of stealth today with $7 million in seed funding. It offers the first data collaboration platform that allows software and data/ML developers to create and manage repeatable, high-quality data assets, but investors have dubbed it the “GitHub for data”—which other data companies such as Kaggle and Hex are investing in. are

“GitHub is actually impacting culture — it’s helping software engineers around the company communicate more effectively with each other,” said Chad Sanderson, CEO and co-founder of Gable.ai. “But it doesn’t exist for data at all.”

Gable.ai’s platform allows data producers and data consumers to work together, he told VentureBeat. It helps software and data developers avoid critical data workflow changes in their existing data infrastructure. Data asset identification features by connecting platform data sources; creating data contracts to establish data asset owners and set meaningful limits; and implementation of data contracts through continuous integration/continuous deployment in GitHub.


VB Transform 2023 On-Demand

Did you miss a session of VB Transform 2023? Register to access the on-demand library for all our featured sessions.

Register now

The founder led the data division at Convoy

Before founding Gable.ai, Sanderson and his co-founders, Adrian Kreuziger and Daniel Dicker, led the data division at Convoy, a $4 billion digital freight network that moves thousands of truckloads daily through an optimized, connected network of carriers across the country. . Complex data about shipments, shippers, facilities, carriers, trucks, contracts and prices came fast and furious.

While the company had a modern data stack, using the latest and greatest technology, no one trusted the data — there were constant data quality issues, outages for valuable models, and billions of rows of data unusable.

“When our data science team and analytics team ask ‘how many shipments did we make in the last 30 days?’ When trying to understand a simple question like this, all that complexity makes it almost impossible to answer that question,” Sanderson said. “And that was the problem with machine learning — the models were very sensitive, and the data scientist needed to figure out exactly what data from this very complex system was needed to feed into that model. When the quality of the data was wrong, when something suddenly changed, all these sensitive models started to break down and all the predictions they made were wrong.”

Ultimately, he explained, the problem was a communication gap between software engineers and ML developers. “Once we helped bridge that gap, we saw a rapid improvement in data quality,” he said.

To scale AI, communication issues around changes in data must be solved, Sanderson emphasized.

“If you don’t have a change management system for your data, you’re not going to be able to scale AI — you just can’t,” he explained. “The way the Googles and the Metas and the Amazons solved the problem was to throw bodies at the problem. When a new machine learning model is shipped, there need to be two, three, four data engineers in the room.” But at a company like Convoy, he explained, “we didn’t have the ability to do that. Our data engineering team was six people.”

A new part of the data stack

Gable.ai’s data contract is a completely new category Gable.ai is able to establish as an emerging data primitive – that is, a basic data type. Over the past few months, Sanderson has built a “Data Quality Camp” of 8,000+ engaged data practitioners around these new concepts.

These concepts aim to be a significant step towards reshaping the data landscape, becoming a new part of the company’s data stack, said Apoorva Pandhi, managing director at Zetta Venture Partners, which led the funding round.

“The founders of all the successful data companies, be it dbt Labs, Monte Carlo, Hex, Kaggle, Hightouch, Great Expectations, have invested in the company and affirmed the fact that it is an integral part of the data stack,” he said.

VentureBeat’s mission It is a digital town square for technical decision makers to gain knowledge about transformative enterprise technologies and practices. Find our briefing.

Leave a Comment