A Guide to Shifting Data From Project to Product

Jan 22, 2026
Interview
A Guide to Shifting Data From Project to Product

In the complex world of data, Vernon Yai has carved out a reputation as a leading expert in data protection and governance. He brings a pragmatic, battle-tested perspective to an industry often lost in jargon, focusing on how organizations can build trust and extract real value from their data assets. His work centers on risk management and creating robust frameworks that safeguard sensitive information while empowering innovation. We sat down with him to discuss the core principles of designing modern data products, from establishing their initial value to ensuring their long-term success and adoption.

The conversation explores the practical application of foundational concepts like the GAP framework—Governance, Architecture, and People—and the DRIVE lifecycle for data products. We delve into the difficult balance between centralized standards and decentralized ownership, the architectural decisions that define data quality, and the critical importance of creating clear data contracts. Vernon also shares his insights on managing a data product post-launch and fostering a culture of continuous improvement, ensuring that these products evolve with the ever-changing needs of the business.

A core idea is that data products must deliver value, distinguishing them from data assets or projects. How should a team practically define and quantify that value before building anything? Please walk through the key steps and metrics you would use to build that initial business case.

That’s the fundamental starting point, isn’t it? If you can’t articulate the value, you’re just building another data pipeline that will likely end up in a technical debt graveyard. The first step is to stop thinking about data and start thinking about the business outcome. We need to clearly distinguish between financial and strategic impact. A financial metric might be “reduce operational costs by 15% through optimized logistics,” whereas a strategic one could be “improve customer retention by creating a more personalized experience.” The key is to make these metrics tangible. You have to ground the business case in what the consumer of the data will actually do with it. A data product isn’t just a dataset; it’s the dashboard, the API, or the ML model that someone interacts with to make a better decision. So, the value is measured by the impact of those decisions.

The GAP framework—Governance, Architecture, and People—is positioned as a foundational triad. Since “governance” can be a difficult concept to sell internally, how can leaders embed it successfully? Describe a few tactics for focusing on quality outcomes rather than on governance programs themselves.

The word “governance” often makes people cringe; they picture bureaucratic committees and endless approval cycles. I’ve found the most successful tactic is to stop trying to sell a “governance program” and instead focus the conversation entirely on delivering “high-quality, trustworthy data products.” You frame it as a non-negotiable requirement for value. Instead of saying, “We need a data governance initiative,” you say, “To trust this sales forecast dashboard, we need to ensure the data is accurate, compliant, and reliable. Let’s embed the necessary quality checks and ownership into the building process.” This shifts the focus from a political argument to a practical discussion about outcomes. Governance then becomes an embedded activity—part of the architecture that ensures robustness and part of the culture that fosters trust—rather than a separate, dreaded initiative.

The DRIVE lifecycle culminates in value extraction, while the CIA mindset emphasizes continuous improvement. How does this combination change how a team manages a data product after launch? Could you share some effective feedback loops for ensuring a product evolves with business needs?

It completely transforms the post-launch phase from a maintenance chore into an active product management cycle. The old way was “build it, ship it, and forget it until it breaks.” The combination of DRIVE and CIA means the product is never truly “done.” The launch is just the beginning of its value creation. One of the most effective feedback loops is creating a dedicated community around the product. This isn’t just a support channel; it’s a forum for users to share how they’re using the data, what new questions they have, and what’s not working. You can supplement this with regular, scheduled update sessions where the product team showcases recent improvements and gathers direct input. This ensures the product doesn’t become a static artifact but evolves in lockstep with shifting business priorities and new technological capabilities.

When deciding what to build, teams must weigh both impact and feasibility. Can you provide an example of a high-impact data product idea that was correctly deprioritized due to low feasibility? What specific architectural, people, or governance gaps typically undermine a project’s feasibility?

Certainly. I recall a retail company that wanted to build a real-time, global inventory optimization engine. The potential impact was enormous—millions in savings by preventing stockouts and overstocking. However, it was correctly deprioritized. The feasibility was incredibly low because of gaps across the entire GAP framework. Architecturally, their systems were all batch-based; they had no streaming infrastructure to handle real-time data from thousands of stores. From a people perspective, their teams were skilled in traditional data warehousing, not the complex event-driven architectures and ML modeling required. And the biggest killer was governance; each country had its own sovereign data laws and siloed systems, and there was no framework for securely sharing and integrating that data globally. The business case was great, but without the foundational readiness, it was destined for failure.

A hub-and-spoke organizational model aims to balance central standards with decentralized ownership. In this structure, what are the most common points of friction between a business-embedded Data Product Owner and central platform teams? How can they collaborate effectively to ensure both speed and quality?

The most common friction point is almost always the tension between standardization and customization. The central platform team is responsible for providing robust, scalable, and secure tools—the “paved road.” Their goal is consistency. The business-embedded Data Product Owner, on the other hand, is driven by the unique needs of their domain and wants to move fast to deliver value. They might see the central platform’s standards as restrictive or too slow. Effective collaboration hinges on clear communication and a shared understanding of trade-offs. The central team must treat the domain teams as internal customers, providing excellent documentation and support. The Data Product Owner must understand that adhering to central standards for things like security and data contracts ultimately protects their product and makes it more reliable. It works best when there’s a partnership, not a mandate.

The medallion architecture uses bronze, silver, and gold layers to mature data. How should a team decide which transformations belong in the silver layer versus the gold layer? Give an example of a business rule that would clearly belong in one and not the other, and explain your reasoning.

This is a critical distinction for maintaining a clean architecture. The silver layer is about conforming and cleansing the data, making it consistent and queryable across the enterprise. It’s where you standardize date formats, join different source tables, and validate data against enterprise-wide standards. The gold layer, however, is purpose-built for a specific business use case. It’s highly aggregated, denormalized, and refined for consumption by a dashboard or an ML model. For example, a rule like “unify all customer records from three different source systems into a single master customer entity” belongs in the silver layer. It’s a foundational, reusable transformation. In contrast, a rule like “calculate the 90-day rolling average of sales for our top 100 products” clearly belongs in the gold layer. It’s a specific business-facing aggregation designed for a sales performance dashboard and has no broader, foundational use.

Effective data retrieval relies on clear data contracts between producers and consumers. What are the essential, non-negotiable components of a data contract? Can you share an anecdote about what happens when a contract is poorly defined or not enforced, and how to prevent it?

The non-negotiable components are schema definition, data quality expectations, and a change management protocol. The schema must define every field, its data type, and whether it can be null. Quality expectations should specify things like freshness, completeness, and accuracy thresholds. And crucially, there must be a process for how schema changes are communicated and deployed. I saw a situation where a source system team changed a timestamp field from UTC to a local timezone without notice. It was a subtle change not covered in their informal agreement. The result was catastrophic: a financial reporting data product started showing transactions happening hours before they actually did, completely breaking the downstream reconciliation models. This could have been prevented with a formal data contract, enforced through automated schema validation in the pipeline, which would have immediately flagged the change and stopped the faulty data from corrupting everything downstream.

Product adoption is treated as a critical final step, suggesting tactics like creating “change ambassadors.” Beyond identifying willing volunteers, what specific support, training, and communication channels do these ambassadors need from the product team to be truly effective in their roles?

Identifying volunteers is just the first step; empowering them is what makes the strategy work. First, they need exclusive access to the product team. This could be a dedicated chat channel or bi-weekly office hours where they can get their questions answered directly and provide unfiltered feedback. Second, they need advanced training and a sneak peek at the product roadmap. This makes them feel like insiders and equips them to answer tougher questions from their peers. Finally, they need tangible resources. This includes well-crafted documentation, short video tutorials, and one-page summaries of “what’s in it for you” that they can easily share. By equipping them with knowledge, access, and materials, you turn them from enthusiastic users into genuine, effective extensions of your product team, building trust and adoption from within the business.

What is your forecast for the future of data products?

My forecast is that the line between “data teams” and “product teams” will continue to blur until it practically disappears. We’re moving away from a world where data is a technical backend service and into one where data is a first-class, value-driving product. This means that roles like the Data Product Owner will become as standard and essential as a traditional software Product Manager. The most successful organizations will be those that fully embrace this mindset, embedding data expertise directly into their business domains and measuring the success of their data initiatives not by the volume of data processed, but by the tangible business value extracted. The focus will shift entirely from building pipelines to delivering trusted, usable, and continuously improving products that people rely on to do their jobs better every single day.

Trending

Subscribe to Newsletter

Stay informed about the latest news, developments, and solutions in data security and management.

Invalid Email Address
Invalid Email Address

We'll Be Sending You Our Best Soon

You’re all set to receive our content directly in your inbox.

Something went wrong, please try again later

Subscribe to Newsletter

Stay informed about the latest news, developments, and solutions in data security and management.

Invalid Email Address
Invalid Email Address

We'll Be Sending You Our Best Soon

You’re all set to receive our content directly in your inbox.

Something went wrong, please try again later