Next in Tech Ep. 197: Data Pipelines for AI
S&P Global Market Intelligence by Eric Hanselman · · Podcast · 32:05
"Data pipelines are having a DevOps moment, starting with a cultural and technical shift toward continuous integration and delivery."
On S&P's Next in Tech I argued that enterprise AI is won on data pipeline quality, not model size, and that data infrastructure is having a DevOps moment right now.
Eric Hanselman hosted me on Next in Tech to extend the data-pipelines argument I had been making in writing. The playbook for getting AI right at the enterprise already exists in the DevOps playbook, and data teams are now rebuilding it for a different medium.
We went deep on why smaller models with localized datasets beat pushing sensitive enterprise data into massive cloud models, and on the governance and cost controls that have to sit at every stage of the pipeline.
Eric Hanselman hosted me on S&P Global’s Next in Tech to argue that enterprise AI success depends on data pipeline quality more than model scale, and that the infrastructure patterns for getting it right already exist in the DevOps playbook.
The DevOps moment for data
Enterprise AI data infrastructure is in the same place software delivery was a decade ago. DevOps introduced continuous integration and delivery to replace brittle manual deployment, and data teams are now building pipeline automation for model training, evaluation, and governance. The cultural shift matters as much as the technical one. Organizations have to treat data delivery with the same rigor they eventually brought to code delivery.
Start small, iterate locally
I advocate starting with smaller models and localized datasets. The approach is the same one I have run since Chef and Amazon: prove the pattern works at small scale, measure what matters, then expand. Enterprises that try to solve the data pipeline problem at full scale first build fragile architectures and burn through budgets before they learn what actually works.
Pipeline quality as competitive advantage
The companies that win at enterprise AI are the ones that build the best data pipelines. Clean data in. Reliable inference out. Governance and cost controls at every stage. The data delivery layer is where the next generation of developer tools and infrastructure companies will be built.
Also Mentioned
Further reading
- The Data Pipeline is the New Secret Sauce — The written articulation of this same thesis, at Heavybit in September 2024.
