Main / Data Governance / Can Data-First Discipline Make S P Setia AI-Ready?

Can Data-First Discipline Make S P Setia AI-Ready?

Apr 28, 2026

Interview

Can Data-First Discipline Make S P Setia AI-Ready?

Vernon Yai has spent his career safeguarding data and reshaping how organizations govern it, and in 2019 he stepped into a CIDO role at Malaysia’s largest property developer to turn that rigor into real-world outcomes. In this conversation, he reflects on moving from land and bricks to code and models, the stubborn analog gaps in inspections and legal workflows, and why “data first, AI second” is not a slogan but a sequencing discipline. We explore how to influence contractors you can’t mandate, build Innovation Squads that ship outcomes in 8–12 weeks, and embed controls that satisfy sovereignty without strangling speed. Threaded through are two grounded missions—expanding homeownership for M40 families and using AI to reduce construction defects—plus the venture-building lessons drawn from partnering with Antler Ibex.

You took on the CIDO role in 2019 at Malaysia’s largest property developer; what surprised you most in year one, which bets underperformed or exceeded expectations, and what metrics do you use today to judge whether transformation is actually moving the needle?

Year one, the biggest surprise was how “dozens” of disconnected systems could quietly dictate culture—work moved at the speed of the slowest integration, not the boldest idea. Our front-end digitization bet exceeded expectations, compressing discovery-to-booking steps in visible ways within the first 6–12 months, but the site and legal back-ends lagged. The underperformer was any AI built before data pipelines; that lesson cemented “data first, AI second.” Today I look at a tight set: cycle-time deltas across key handoffs, defect-to-handover trends, and the percentage of sprints (8–12 weeks) that land in production—not demos—because production is the only scoreboard that matters six years on.

Many pilots never reach production; what are the top three failure modes you’ve seen, can you share a project that stalled and how you revived it, and what stage-gates now govern scale-up?

The top three failure modes: building on brittle data, solving symptoms instead of outcomes, and skipping change management because “the pilot worked.” One AI inspection pilot stalled when our image capture varied by site; the model drifted as lighting and angles shifted. We rebooted with a data standards sprint first—defining capture protocols and QA—then retrained and only moved to production after a multi-site A/B run. Our stage-gates are now explicit: Gate 1 is problem clarity and baseline; Gate 2 is data readiness; Gate 3 is a controlled 8–12 week pilot with pre-set KPIs; Gate 4 is production hardening with rollback and cost-to-serve verified.

Contractors can’t be mandated, only persuaded; what’s your playbook for influence without authority, which incentives or penalties actually changed behavior, and what adoption rates have you achieved by trade or region?

I start with shared pain and visible wins—if a foreman sees defects drop between “week 8 and week 12,” adoption becomes self-enforcing. We used milestone-linked bonuses tied to verified digital submissions and defect closure SLAs, and we withheld progress claims when mandatory safety checks weren’t logged digitally. More than penalties, transparency shifted behavior: league tables by site created peer pressure across trades. Adoption remained uneven by region, but once two challenge statements—homeownership and defect reduction—became shared narratives, momentum spread beyond any single mandate.

Middle-income (M40) families often face affordability and credit hurdles; how are you modeling risk and eligibility, what datasets and consents are essential, and which outcome metrics—approval times, default rates, or take-up—best prove progress?

We modeled risk around stability, not just scores: income consistency, expense rhythms, and tenancy histories where consented. Essential datasets were applicant-declared data, verified income records, and repayment proxies; every data element flowed only with explicit, time-bounded consent. We tuned the model to simulate stress at construction milestones, since cash-flow shocks often emerge then. Progress shows up first in approval times—measured in days, not weeks—and ultimately in take-up and sustained repayment, but we never move without the audits that a data protection specialist would insist on.

Reducing construction defects with AI sounds straightforward; which defect categories are you prioritizing, how do you capture reliable site data, and what baseline-to-target defect rates or rework hours are you tracking?

We started with visible, high-frequency categories—surface finish, alignment, and MEP terminations—before tackling latent issues. Reliability came from standardizing capture: fixed vantage points, lighting heuristics, and a minimum frame set per area, all enforced over an 8–12 week learning window. We tracked defects per unit from baseline to handover and correlated with rework hours logged by trade. Targets were staged reductions across sprints rather than a single cliff-drop; the key was seeing trend lines bend consistently by the second sprint.

The homebuying front end is digitized, but inspections and legal documentation remain analog; can you map today’s end-to-end journey, pinpoint the three most stubborn handoffs, and outline a step-by-step plan to close the last mile?

The journey runs from discovery and booking to construction progress, inspection, legal completion, and handover. The stubborn handoffs are site inspections to defect rectification, contractor updates into central systems, and legal document finalization. Our plan: first, unify data schemas so inspections and legal artifacts aren’t “attachments” but structured records; second, push offline-first capture on site with automated sync; third, digitize legal steps with templated clauses and consent-driven redaction to enable secure sharing. Each step shipped in successive 8–12 week sprints to avoid a big-bang stall.

A single development touches dozens of disconnected systems; which systems create the worst fragmentation, how are you tackling integration (APIs, lakehouse, MDM), and what governance model secures data quality without slowing delivery?

Fragmentation peaked at the junction of project management, contractor portals, and legal repositories—each with its own truth. We used APIs at the edges, a lakehouse for analytical workloads, and MDM for core entities like unit, lot, and party. Governance is product-based: each domain has a data owner, a steward, and sprint-aligned quality SLAs. The rule is simple—if it can’t be measured in a two-week increment, it won’t be controlled in a 12-month plan.

Data first, AI second is easy to say; what sequencing, milestones, and funding model make it real, and can you share a before/after case where better data pipelines unlocked measurable AI impact?

Sequencing starts with capture, lineage, and quality thresholds; only then do we greenlight feature engineering and model training. Milestones are measurable every 8–12 weeks: schema conformance, fill-rate, drift detection, and cost-to-serve. Funding follows gates—release the next tranche only when the prior data milestone lands. Before we fixed capture standards, our defect AI stalled; after standardization, we saw stable inference across sites, turning a brittle pilot into production that endured seasonality.

Your Innovation Squads run 8–12 week sprints; how do you charter problems, select cross-functional talent, and define exit criteria, and can you walk through one sprint’s week-by-week arc and resulting KPIs?

We charter around a single job-to-be-done with a baseline and a target anchored in production impact. Squads mix product, data engineering, domain SMEs, and site voices—at least one contractor perspective is non-negotiable. Exit criteria are production deployment, measured delta, and an operations playbook. Week 1 maps the problem and baseline; Weeks 2–3 nail data contracts; Weeks 4–6 build and field-test; Weeks 7–8 harden and train; and by Week 12, we ship with KPIs like cycle-time reduction and defect closure velocity.

An F1 pit crew mindset meets GLC governance; how do you reconcile speed with compliance, what decision rights or escalation paths keep momentum, and can you share a moment when the model was stress-tested?

We separated experimentation from exposure—sandboxes move fast, production gates are strict. Decision rights sit with product owners for scope, security for data use, and a steering group for funding; escalations have a 48-hour SLA so issues don’t idle. During a governance review, a legal digitization release paused; we resolved it by inserting an additional audit step without breaking the 8–12 week cadence. The tension is healthy—precision and pace can coexist if roles are crisp.

Venture-building with external partners can be messy; what made the Antler Ibex model valuable, how did you handle IP and commercialization, and which investor-style milestones helped internal stakeholders commit?

The value was forced customer obsession—interviews, prototypes, and pitching sharpened our problem statements. IP sat within ring-fenced entities with contribution logs so ownership was never a debate after the fact. Commercialization hinged on milestone triggers—prototype, pilot, production—mirroring investor rounds to unlock internal commitment. Keeping the two challenge statements in focus—M40 access and defect reduction—prevented scope creep.

Site technology adoption is notoriously uneven; how do you equip foremen and subcontractors, what offline-first or mobile workflows proved essential, and which training or coaching formats delivered durable usage on site?

We issued rugged devices where needed and designed workflows for zero bars of signal—capture now, sync later. Essential flows were punch list creation, photo-based evidence, and time-stamped confirmations. Training shifted from classrooms to shoulder-to-shoulder coaching in the first 2–3 weeks of a rollout, with quick reference cards on every device. By the next 8–12 week cycle, usage stabilized because the tools saved hours, not just clicks.

Digitizing legal and land documents raises security and sovereignty concerns; what controls, audits, and redaction methods are non-negotiable, and how do you balance user experience with regulatory obligations?

Non-negotiables include consent tracking, field-level encryption, immutable audit trails, and strict role-based access. We use templated redaction rules so sensitive fields never leave the boundary, and we log every view, export, and share. To balance UX with sovereignty, we front-load consent in human language, minimize re-consent prompts, and pre-fill where lawful so users feel progress, not friction. Sovereignty isn’t a blocker when you build it into the architecture, not as a bolted-on afterthought.

Talent fuels transformation; what’s the ideal mix of product managers, data engineers, and domain experts, where do you build versus buy, and which retention or productivity metrics tell you the team is healthy?

The core is a three-way braid—product to define value, data engineers to make it real, and domain experts to keep it honest. We build the spine—data engineering and product—and buy selectively for specialized models and tooling. Health shows up in stable 8–12 week delivery, low context-switching, and the percentage of sprints that hit production rather than proofs. Retention follows meaning: when teams see two concrete outcomes—homes unlocked for M40 and defects reduced—they stay.

Proving ROI in property tech can be long-cycle; how do you construct a benefits case across pre-sales, construction, and handover, what payback periods are acceptable, and how do you reinvest savings to compound value?

We map benefits across the lifecycle: faster pre-sales conversions, reduced rework during construction, and smoother handover reducing post-completion churn. Payback is staged to match 8–12 week increments—each sprint needs a contribution to a larger, six-year arc rather than a single cliff event. Savings recycle into data foundations first, because those amplify every future initiative. Compounding value is real when your second and third sprints cost less because the plumbing already exists.

Do you have any advice for our readers?

Anchor on two or three grounded problems and say them out loud until everyone can repeat them. Commit to “data first, AI second,” and enforce it with gates you actually honor. Treat contractors as partners and design for zero-connectivity realities; persuasion beats mandates in the long run. And remember: in a world of dozens of systems and 8–12 week clocks, progress is the quiet click when pilots become production—again and again.

Can Data-First Discipline Make S P Setia AI-Ready?

Read Next:

Trending

Subscribe to Newsletter

We'll Be Sending You Our Best Soon

Subscribe to Newsletter

We'll Be Sending You Our Best Soon