Can You Trust Your Data to Power Your AI?

Dec 31, 2025
Interview
Can You Trust Your Data to Power Your AI?

In a world racing to adopt AI, many enterprises are discovering that their greatest obstacle isn’t the technology itself, but the fragmented and inconsistent data it relies on. We’re joined by Vernon Yai, a data protection and governance expert who has spent his career helping businesses navigate these complex challenges. He specializes in building the kind of robust data foundations that not only support but accelerate AI innovation. Today, we’ll explore the deep-seated issues of data inconsistency that a recent survey of 100 senior IT leaders revealed are still plaguing businesses. We will delve into what it truly means for an organization to be “AI-ready,” how to build trust in AI through the critical lens of observability, and the practical steps leaders can take to free themselves from the costly trap of vendor lock-in.

The first episode argues that “semantic fragmentation” is still a widespread issue. Can you elaborate on the business impact of this fragmentation and provide a step-by-step example of how creating a unified semantic foundation can resolve inconsistencies and build trust within an organization?

It’s an incredibly pervasive and costly problem, one that I see cripple momentum in even the most forward-thinking companies. The business impact isn’t just about inaccurate reports; it’s about a fundamental erosion of trust. When your head of sales and your head of marketing show up to a meeting with different numbers for the same metric, like “customer acquisition cost,” the conversation derails. Instead of strategizing, they spend an hour arguing about whose data is right. This happens every day in countless organizations, wasting thousands of hours and breeding a culture of skepticism. Imagine the ripple effect: decisions are delayed, opportunities are missed, and people simply stop trusting the data.

To fix this, you need a unified semantic foundation. Let’s walk through it. First, you get the right people in a room—from sales, marketing, finance—and you facilitate a process to agree on a single, universal definition for that critical metric. This isn’t just a technical task; it’s a business negotiation. Once you have that definition, you codify it in a central semantic layer, a sort of universal translator for your data. This layer sits between your raw data and all your analytics tools. Finally, you re-route all your dashboards and reports to pull from this single source of truth. The result is transformative. Suddenly, everyone is speaking the same language. The arguments stop, and a palpable sense of confidence begins to grow.

Your second episode defines “AI-readiness” through capabilities like unified business definitions and robust governance. Could you share an anecdote where a failure in one of these fundamental areas derailed an AI initiative, and what specific governance changes were implemented to correct the course?

Absolutely. I worked with a major e-commerce company that was incredibly excited about implementing a predictive analytics model to manage inventory. The goal was to use AI to forecast demand and automate reordering to prevent stockouts. The problem was, they rushed into it without establishing robust governance or unified definitions. Their “daily sales” data was a mess; one system recorded a sale when an order was placed, while their warehouse system recorded it when the item shipped. The AI model was trying to learn from this chaotic, contradictory data, and its predictions were all over the place. Millions of dollars in inventory were on the line.

The project was a failure and had to be halted. The backlash was severe, and trust in the data science team plummeted. To get back on track, we had to go back to basics. We implemented a formal data governance council with representatives from every business unit. Their first mandate was to define and certify core business metrics, starting with “daily sales.” We then established a rule: no AI model could be deployed into production unless its training data was sourced exclusively from these certified, governed pipelines. It was a cultural shift, moving from a “fail fast” mentality to a “build it right” philosophy. It slowed them down initially, but it was the only way to build a foundation for scalable, trusted AI.

In discussing the “new imperative of observability,” your podcast highlights that confidence in AI transparency is alarmingly low. How do data lineage and explainability work together to build that trust? Please provide a concrete example of how this visibility helps an organization strengthen both security and compliance.

The low confidence is completely understandable. Executives are being asked to bet their companies on algorithms they can’t see inside of, and that’s a terrifying prospect. Data lineage and explainability are the two pillars that turn that black box into a glass box. Think of it this way: data lineage is the “what and where.” It gives you a complete, unborken audit trail showing exactly which data sources fed the model, every transformation the data went through, and who accessed it along the way. Explainability is the “why.” It reveals the internal logic of the model—which features or variables it weighed most heavily to arrive at its decision. You need both to build real trust.

For a concrete example, let’s consider a financial institution using AI for fraud detection. A transaction gets flagged as potentially fraudulent. With data lineage, security teams can instantly see that the model used the correct, approved customer data sources and that no sensitive information was exposed. That’s a huge security win. Then, for compliance, when an auditor or a customer asks why the transaction was flagged, explainability provides the answer. It can show that the model flagged it because the transaction originated from a new location and was for an unusually high amount. This combination proves the system is not only secure but also fair and non-discriminatory, satisfying both security protocols and regulatory compliance requirements.

The introduction mentions that embedded logic creates costly vendor lock-in. What are the common early warning signs of this problem, and what are the first three practical steps an IT leader should take to begin untangling their critical data from a specific vendor’s ecosystem?

Vendor lock-in is a subtle trap. It doesn’t happen overnight. One of the first warning signs is when your business teams can no longer self-serve. They have a simple business question, but the only way to answer it is by filing a ticket and waiting for a specialist—often a vendor consultant—to build a custom report. Another red flag is when your own IT team tells you they can’t easily integrate a new, best-of-breed tool because all your core business logic is hard-coded inside the incumbent vendor’s proprietary platform. It starts to feel like you’re paying a toll to access your own intelligence. The costs just keep climbing, and your flexibility keeps shrinking.

For an IT leader looking to escape this, the first practical step is to conduct a thorough audit to identify and map all the critical business logic currently embedded in the vendor’s tool. You have to know what you need to extract. The second step is to start architecting a universal semantic layer that is independent of any single application. This is where your business definitions and logic will live, serving as a stable, centralized hub. The third, and most crucial, step is to start small. Don’t try to boil the ocean. Pick one high-value domain, like sales analytics, and migrate its logic to your new semantic layer. Prove that you can serve the existing reports from this new, independent source. That success story will build the momentum you need to tackle the rest.

What is your forecast for trustworthy AI over the next five years, and what single capability will be the most critical for enterprises to master in order to achieve it?

My forecast is that within five years, “trustworthy AI” will no longer be a differentiator; it will be table stakes. The era of accepting black-box AI is ending, driven by a perfect storm of regulatory pressure, customer demand for transparency, and employee ethics. We will see a major shift where proving an AI system is fair, secure, and explainable becomes a non-negotiable requirement for it to even be deployed. Companies that fail to do this won’t just face fines; they will face a catastrophic loss of customer trust that could be impossible to recover from.

The single most critical capability to master will undoubtedly be

AI observability

. It’s the umbrella that covers everything we’ve discussed. You simply cannot achieve trustworthy AI without it. True observability provides that complete, end-to-end visibility—from data lineage and quality monitoring to model explainability and performance tracking. It is the only way to provide the accountability that regulators, executives, and customers will demand. Mastering observability isn’t just a technical challenge; it’s the foundational capability that will separate the leaders from the laggards in the next wave of the AI revolution.

Trending

Subscribe to Newsletter

Stay informed about the latest news, developments, and solutions in data security and management.

Invalid Email Address
Invalid Email Address

We'll Be Sending You Our Best Soon

You’re all set to receive our content directly in your inbox.

Something went wrong, please try again later

Subscribe to Newsletter

Stay informed about the latest news, developments, and solutions in data security and management.

Invalid Email Address
Invalid Email Address

We'll Be Sending You Our Best Soon

You’re all set to receive our content directly in your inbox.

Something went wrong, please try again later