Because at this point in the evolution of AI, the success and the future of real-world AI applications hinges on the quality of data. While ML/AI hardware infrastructure and modelling have made significant progress, it is time to move AI’s center of mass towards data. A campaign was needed to shift the community’s thinking. And no-one better than Andrew to do it.
The Akridata team couldn’t agree more! We, too, believe in Data Centric AI, and join this campaign to do our part to help drive this crucial shift.
As Andrew has said, data is the food of AI. Just like the diet of a human is critical to health and performance success, how we fuel AI models is crucial to their success. The quality of data has a huge impact on how well the whole system works. Learning demands good and relevant data. Inference demands recognizing new data.
Why a Campaign?
Data is the most under-valued and de-glamorized aspect of AI, according to a Google White Paper that included a survey of 53 AI practitioners. Looking at a sampling of recently published papers on AI, only 1% covered data-centric themes while 99% discussed model-centric topics.
Unglamorous and underserved, yet increasingly critical, make for a challenging situation. Clearly, the ML/AI community needs more investment and strategic thinking around this new way forward. It also needs more evangelism and advocacy. So, like any great thought leadership idea, an organized effort – a campaign – was organized to drive change in thought and action. Key initiatives include a competition to select three winners reaping the best performance out of a fixed AI model through data quality improvements; new online course curriculum for Data Centric AI; the the NeurIPS Data-Centric AI workshop.
All these efforts are designed to drive an expansion, if not a full-blown pivot, of AI away from repeatedly tweaking models against fixed datasets, to improving the quality of data so the model does not have to make up for what’s lacking in data.
We at Akridata are big believers in the importance of the data side of the AI equation. For our part, we just launched a revolutionary new product, the Akridata Edge Data Platform, designed to fuel the autonomous world and make it easier for ML teams to ensure data quality and efficiency. What makes it revolutionary? Akridata enables the decentralized, holistic and end-to-end management of AI workflows and dataflows across the Edge, Core and Cloud – an industry first.
AI becomes complex as the scope of automation and decision making becomes more sophisticated. It turns out that real-world AI needs to see, feel and hear its surroundings, and that means a steady stream of rich, unstructured and uncurated data – the food of production AI.
Real-world AI data is very different from the nicely curated datasets that have characterized the work-to-date in ML. These fixed, benchmark datasets have built-in redundancies, outliers and edge cases, each of which must be carefully considered in production AI. ML teams that are interested in moving their work into production vs. improving the model quality as an “abstract goal,” have no choice but to work with the realities of production AI – large, uncurated data volumes that underlie their problem domain. That’s why production AI deployment is such a challenge and why getting datasets “AI-ready” is crucial to moving AI to the next phase of real-world application.
Autonomy is an Exascale-Class Data Challenge
The emerging autonomous world depends on an avalanche of rich streaming real-time data. That’s a massive challenge. We see it as an Exascale-class data problem. In fact, we joined Exascale Day (October 18th, as in 10 to the power of 18) to help shed light on the importance of solving this data problem to drive AI forward.
Akridata has also spent the last three years focused on solving this AI data management challenge: how to manage the avalanche of continuous rich data generated at the Edge, how to find the relevant “scenes” to create “AI-Ready Data,” how to set it all up for learning and then inference, how the data flows to the cloud and back, how it enables “edge commerce,” and the holistic data environment of geographically distributed and logically decentralized.
Data-Centric Edge-to-Core-to-Cloud Platform
Just last month we launched Akridata, the world’s first Edge Data Platform for Data Centric AI, which creates and manages a global name-space, publish/subscribe, smart data pipelines, and AI workflows spanning Edge-Core-Cloud resources. Think of Akridata as a revolutionary new, and decentralized, database designed to manage the intensity of real-world AI data. It’s a smart software layer that helps Data Scientists and ML engineers organize, catalog, find, track-and-trace, and process “good relevant” data to drive AI into production.
In Data-Centric AI, data scientists are concerned with data volume, quality and consistency. And, they spend an inordinate amount of time — 80% by most estimates – doing data prep. The Akridata platform has been shown to deliver ten times faster time-to-access the right data and two times better productivity for data scientists and ML engineers. With Akridata’s solution, real-world AI-enabled products can be brought to life.
AI’s Promise & the Future of Data Science
John W. Tukey was among the first “Data Science” thought leaders, as he pondered the importance of data in The Future of Data Analysis in 1962. Data Science Journal, dedicated to publishing papers on “the management of data and databases in Science and Technology,” was launched in the early 2000s, when Big Data hit the scene and the role Data Scientist was essentially born.
Corporations and brands are now focused on ways to monetize their data and figuring out how best to catalog, organize, store and retrieve their data has become a mission-critical function for corporations worldwide and no longer the sole role of IT departments. With AI, those efforts become ever-more important and complex.
AI and ML have great promise in just about every industry sector with lofty expectations to deliver lower costs, enhanced precision, improved customer experiences and lots of innovations. But there are major barriers and top on the list is data. Experts estimate 31 percent of ML projects die because of a lack of access to production-ready data.
Learn more from the Akridata team led by our co-founders who have solved some of the most perplexing problems on the planet. Schedule a consultation to learn more about the revolutionary Akridata platform and how it can help your ML/AI and Data Scientist team make your innovative AI projects, reality.