// THOUGHT RESEARCH SERIES 8 min read May 20, 2025

What Enterprises Really Ask
Before Trusting AI with Consumer Research

“If we save time and budget but the direction is wrong, we’ll only drift further off course.”

R&D Lead, China

“Luxury purchases sometimes have no logic — even consumers themselves can’t explain what triggered the buy. How does AI handle this unpredictable humanity?”

Research Director, Switzerland

“On social media, ten people review a product — eight probably never bought it.”

Digital Director, France

“Can this duplicate and replace traditional market research?”

Product Research Lead, India

“I don’t care about consistency. What I care about is accuracy — whether what it infers actually matches what this person really thinks.”

Consumer Insights Lead, Malaysia

“AI tends to be overly rational — it lacks the irrational, unconventional reactions of real users.”

Research Team, United States

AI consumer simulation builds models of real consumers from behavioral data, then researches with them as you would with real respondents. The models are called synthetic consumers. The field is new, growing fast, and largely unregulated in how it measures its own accuracy. Nobody has established how far these digital constructs are from the real people they claim to represent in a live commercial scenario.

We build synthetic consumers. Over the past year, we sat across the table from consumer research teams at some of the world's largest companies — FMCG, luxury, food & beverage, automotive, internet, beauty — as they evaluated whether this technology could play a role in understanding their customers. We recorded the technical questions from 36 of these conversations.

After removing commercial and pricing discussions, we were left with over 150 distinct technical concerns. They cluster into four questions that every enterprise works through before deciding whether to trust AI-generated consumer insights enough to act on them. Taken together they draw the map of what “trust” actually requires in this space.

Enterprise evaluations

150+

Technical concerns

Countries represented

Enterprise functions

What are enterprises really asking?

50%

Data sourcing

“Did you find the right people to model?”

21%

Simulation fidelity

“How accurately does it reconstruct a real consumer?”

50%

Differentiation

“Why can't I just use ChatGPT?”

12%

Security & access

“Can this get through our front door?”

18%

01. Data sourcing — “Did you find the right people?”

When enterprises learn that synthetic consumers are built from online data, the reaction comes quickly. The skepticism isn't about AI — it's about online data as a source for understanding real consumers.

The concern surfaces in three layers.

The first is noise. A battery company's R&D team described trying to research their own product category on social media — and finding nothing but a wall of sponsored content:

"There's a ton of advertising in there… inherent bias. How do you validate that? How do you prove the insight you end up with is actually correct?"

Product R&D, United States

Behind the noise is the question of who is actually speaking. A fintech research team saw through it clearly: influencer accounts are a business unto themselves — the persona is manufactured, the content is scripted:

"The persona itself is probably manufactured — it's not real."

Research Team, Singapore

A model trained on this material may only be learning a performance.

The second layer is structural absence. In many categories, the people generating social media data are categorically different from the people being studied:

"The actual users of adult diapers are elderly people in their sixties and seventies who basically don't use social media. The people discussing this category online are their children. Their children's views and the elderly person's own feelings may be completely different."

CMI Lead, United States

B2B faces the same gap — procurement directors and factory floor managers don't share purchase decision logic on social platforms. Luxury has a different version: in high-consideration categories, a large share of online reviewers never purchased the product. The data surface exists; the signal underneath it is someone else's behavior.

The third layer is about depth. When a team asks how many real accounts a single synthetic consumer is built from, or what the minimum information required for a credible one looks like, they're asking a question the field hasn't answered:

"Harry Potter took seven books to build one character. You have 300,000 personas — what's each one's 'seven books'?"

Consumer Insights Lead, Malaysia

The underlying question is foundational: what is the minimum viable representation of a person, and what does the richest case look like? Without an answer, buyers have no way to know whether what they're paying for is genuine insight or sophisticated hallucination.

02. Simulation fidelity — “How accurately does the AI reconstruct a real consumer?”

2.1 Individual reconstruction — “How far is it from the real person?”

Over half the concerns we recorded land on “fidelity.” But the word conceals significant variation — teams asking whether synthetic consumers are “realistic enough” turn out to be asking three fundamentally different questions.

The insights lead at an FMCG group drew the line immediately. He'd tested multiple AI tools and found the industry's go-to quality check — asking the same question repeatedly and seeing if the answer holds — entirely beside the point:

"I don't care about consistency — whether it gives the same answer twice. I care about accuracy — whether what it infers matches what this person actually thinks."

Consumer Insights Lead, Malaysia

Consistency is a technical metric with established benchmarks. What he's asking for — whether the model can reconstruct how a real person actually makes decisions — is what the field calls fidelity. It's a business metric, and the industry has no agreed-upon way to measure it.

A luxury research director in Switzerland approached from a different angle. His team relies on depth interviews to surface the contradictions and irrational impulses that actually drive purchase decisions. His concern is that AI smooths away precisely the messiness that makes insight useful:

"AI tends to converge results toward a rational, explainable mean. But real consumer decision distributions aren't normal."

Research Director, Switzerland

Preferences and attitudes leave traces and are tractable to simulate. But an internet platform's research lead found that once you cross category boundaries, the picture shifts entirely:

"One person's decision logic for buying milk versus buying a tech device might be completely different."

Research Lead, China

This isn't the familiar attitude-versus-behavior gap. It's that a single person runs fundamentally different decision-making processes depending on the category — and whether simulation technology can follow someone across those switches remains unresolved.

Some teams pushed the questioning beyond individual accuracy into temporal and group-level concerns. An edtech company ran several rounds of testing and found the same synthetic consumer producing different answers over time:

"You ask the same question at a different time, you get a different answer — is that simulating how people change, or is the model itself unstable?"

Research Team, United States

In group settings, a different worry emerged. When multiple synthetic consumers were assembled into a focus group, the uniformity of their responses was striking:

"Every persona gave the same answer to the same question — that's not what happens when you're sitting across from real people."

CMI Lead, China

If every synthetic consumer is permanently stable and highly consistent, what a researcher holds isn't a population — it's a polished mean. The variance within groups, the drift of individuals over time — the things traditional research builds expensive longitudinal panels to capture — are precisely what AI is most likely to flatten out.

2.2 Capability boundaries — “Which research scenarios work, which don't?”

The second line of questioning skips “how close” and goes straight to testing limits against specific use cases.

A food company tried using synthetic consumers to screen new flavors and hit a wall immediately: the model has no body.

"For something like food and beverage — where you need to actually taste it to have an opinion — on what basis is AI making sensory judgments?"

Food R&D Lead, China

An internet company's UX team ran into the same barrier. They need the immediate responses of users swiping, tapping, hesitating, abandoning — and the synthetic consumer has never interacted with their product:

"We need feedback from users who've actually used the product. It's never touched our interface — what's its opinion based on?"

UX Research Lead, Singapore

It's not just the outcome that matters — the interaction process itself carries information:

"Most of what we're testing isn't a static screen — it flows."

Product Lead, Singapore

New product teams face an even more extreme version. When the product hasn't launched yet, the synthetic consumer has no experience to draw on — only analogical reasoning:

"The product isn't live yet and no general-purpose LLM has ever seen this product form. Can you really get useful feedback just by describing the concept in a prompt?"

Product Innovation Team, China

A beauty brand hit yet another boundary when testing packaging visuals. The AI's choices appeared to follow its own aesthetic pattern rather than reflecting how real consumers evaluate design:

"AI seems to favor high-contrast colors when selecting packaging — we suspect that's the model's own visual bias, not how real consumers actually choose."

Brand Team, China

Scenario	Experiential cognition required
01Food & beverage	Taste and sensory perception
02UX & interface testing	Swipe, tap, and real-time interaction
03Unlaunched products	Analogical reasoning beyond training data
04Packaging & visual design	Aesthetic judgment and visual preference

01Food & beverage

Taste and sensory perception

02UX & interface testing

Swipe, tap, and real-time interaction

03Unlaunched products

Analogical reasoning beyond training data

04Packaging & visual design

Aesthetic judgment and visual preference

These scenarios share a common structure: they require first-hand experiential cognition — the body's sensory response, the fingertips navigating an interface, the genuine reaction to encountering something for the first time. Simulation technology cannot provide this in principle. No one has mapped where these boundaries lie; each company is discovering them through trial and error.

2.3 Knowledge reliability — “Can you trust what it says?”

The third line of questioning moves past “how close” and “where does it work” to a more fundamental risk: whether the information a synthetic consumer outputs is itself reliable.

LLM hallucination — output that sounds coherent and logically consistent but is entirely fabricated — is a known problem. It's easy to catch in everyday contexts, but once professional domains are involved, the stakes rise sharply:

"People inside the industry can spot the obvious errors right away."

R&D Lead, China

The operative phrase is “inside the industry.” In one food R&D evaluation, a synthetic consumer's description of a testing method raised immediate alarm:

"The AI told us consumers could use test strips to detect it — we checked, and that's simply not possible with current technology."

R&D Team, Singapore

A similar case surfaced in a discussion of baking formulations, where the synthetic consumer confused the application contexts of a specific ingredient:

"AOP butter goes in bread and pastry, not cake. If a real person made that mistake in an interview, we'd throw out their data."

CMI Lead, China

In traditional research, human respondents getting facts wrong isn't unusual — you filter them out and continue. But when the same error comes from a synthetic consumer, nobody's first instinct is to “filter out and retry.” The instinct is to question whether the entire system is trustworthy. The same factual mistake: from a human, it's sampling noise; from a machine, it's systemic risk. That asymmetry in tolerance is itself worth examining.

At this point in our conversations, the dynamic shifted. Enterprises stopped only probing for weaknesses and started offering paths forward — could domain knowledge be injected into the model?

"Could we do some optimization on our end… help it understand rigid packaging better?"

R&D Lead, Vietnam

A healthcare platform went further, arguing that the depth of industry-specific training data requires collaboration from within the sector:

"Pharma-level corpus depth — probably only a platform like ours could provide that kind of partnership. You'd need full-text comprehension of their academic positions."

Industry Platform, China

Others proposed connecting the model to the enterprise's existing proprietary data:

"I have an ingredients database… how do I get the model to actually use that in research?"

R&D Team, India

Of the three lines of questioning in this section, this is the only one where enterprises shift from challenger to co-builder. The first two probe the ceiling of what simulation can do; this one begins discussing how to raise it.

03. Differentiation from general AI — “Why can't I just do this with ChatGPT?”

This question tends to arrive at a specific moment: after a team has become interested and is now thinking about how to justify the investment internally.

"I've been trying to give ChatGPT memory and context to simulate consumers. Every platform is doing something similar. What's the essential difference?"

Innovation Lead, Switzerland

The underlying situation is the same across all of these: a team has already prompted a general-purpose LLM to role-play as a consumer segment, gotten plausible-sounding results, and now needs to understand whether a purpose-built system produces meaningfully different outputs — different enough to justify procurement, integration, and organizational change.

There's also the question of defensibility. An industry partner raised concern about large consulting firms building similar tools:

"If any company with LLM capability can do the same thing, what's the actual technical barrier?"

Strategy Lead, Switzerland

At its core, this category is a procurement justification question, not a technical one. When a team asks “what's the essential difference?”, they're asking for language they can use internally — with procurement, with IT, with a CFO who already has a ChatGPT subscription. A compelling answer isn't primarily technical. It's about the standards of evidence the tool is held to, and whether those standards are visible and auditable.

04. Data security & enterprise access — “Can this even get through our front door?”

One global company's IT department blocked access to the product website on the first attempt — the company's security policy screens all AI tools by default. The evaluation ended in its first five minutes. This is not an unusual story.

The concerns fall into three categories: data flow (does input leave the organization's control?), data sovereignty (where is data stored and processed, and does that create jurisdictional compliance issues?), and training contamination (does user input feed back into external model development?). On that last point, the line is clear:

"The consumer insights we've accumulated are our core IP. Once uploaded to the platform, they can't leak out, and they won't serve as input for training external models — that's a red line."

R&D Lead, United States

Private deployment is the request behind many of these conversations — and where the gap between what enterprises need and what AI tools can currently offer is widest.

What the pattern reveals

Traditional research built trust through disclosed process: a methodology section you could read and evaluate — how the sample was constructed, how participants were recruited, what the margin of error was, where error sources were named. AI simulation doesn't have the equivalent of a methodology section. The four categories above are the enterprise buyer's attempt to construct one, in the absence of the field providing it.

What each category is actually asking for:

Gate	Validity required	Core verification
01Data sourcing	Population Validity	Confidence that the right people were the basis for the model. If the foundation population is misaligned or filled with noise, everything downstream is hallucination.
02Simulation fidelity	Construct Validity	Confidence that the model captures what it claims to capture — irrational impulses, preference shifts, decision-making detours that averages would erase.
03Differentiation	Comparative Validity	Evidence that purpose-built simulation produces outputs materially different from general-purpose prompting — auditable, repeatable, and traceable.
04Security & access	Enterprise Operability	The governance infrastructure — sandboxed deployment, zero training-set contamination — that makes deployment possible at all.

01Data sourcing

Population Validity

Confidence that the right people were the basis for the model. If the foundation population is misaligned or filled with noise, everything downstream is hallucination.

02Simulation fidelity

Construct Validity

Confidence that the model captures what it claims to capture — irrational impulses, preference shifts, decision-making detours that averages would erase.

03Differentiation

Comparative Validity

Evidence that purpose-built simulation produces outputs materially different from general-purpose prompting — auditable, repeatable, and traceable.

04Security & access

Enterprise Operability

The governance infrastructure — sandboxed deployment, zero training-set contamination — that makes deployment possible at all.

These are not product-specific requirements. They are what the field needs to establish before AI consumer simulation functions as a standard research tool rather than an experimental one.

What comes next

We chose to publish these questions openly — rather than answering them one meeting at a time — because they belong to the entire field. We're working through them ourselves using train/test validation against real consumer data, blind-test protocols, and consistency scoring calibrated to human self-agreement baselines.

We'll follow this map with a framework for how sampling, construction, and simulation connect as phases — where the quality of each determines the reliability of the next.

// THOUGHT RESEARCH SERIES 8 min read May 20, 2025

What Enterprises Really Ask
Before Trusting AI with Consumer Research

“If we save time and budget but the direction is wrong, we’ll only drift further off course.”

R&D Lead, China

“Luxury purchases sometimes have no logic — even consumers themselves can’t explain what triggered the buy. How does AI handle this unpredictable humanity?”

Research Director, Switzerland

“On social media, ten people review a product — eight probably never bought it.”

Digital Director, France

“Can this duplicate and replace traditional market research?”

Product Research Lead, India

“I don’t care about consistency. What I care about is accuracy — whether what it infers actually matches what this person really thinks.”

Consumer Insights Lead, Malaysia

“AI tends to be overly rational — it lacks the irrational, unconventional reactions of real users.”

Research Team, United States

Enterprise evaluations

150+

Technical concerns

Countries represented

Enterprise functions

What are enterprises really asking?

50%

Data sourcing

“Did you find the right people to model?”

21%

Simulation fidelity

“How accurately does it reconstruct a real consumer?”

50%

Differentiation

“Why can't I just use ChatGPT?”

12%

Security & access

“Can this get through our front door?”

18%

01. Data sourcing — “Did you find the right people?”

The concern surfaces in three layers.

The first is noise. A battery company's R&D team described trying to research their own product category on social media — and finding nothing but a wall of sponsored content:

"There's a ton of advertising in there… inherent bias. How do you validate that? How do you prove the insight you end up with is actually correct?"

Product R&D, United States

"The persona itself is probably manufactured — it's not real."

Research Team, Singapore

A model trained on this material may only be learning a performance.

The second layer is structural absence. In many categories, the people generating social media data are categorically different from the people being studied:

CMI Lead, United States

"Harry Potter took seven books to build one character. You have 300,000 personas — what's each one's 'seven books'?"

Consumer Insights Lead, Malaysia

02. Simulation fidelity — “How accurately does the AI reconstruct a real consumer?”

2.1 Individual reconstruction — “How far is it from the real person?”

"I don't care about consistency — whether it gives the same answer twice. I care about accuracy — whether what it infers matches what this person actually thinks."

Consumer Insights Lead, Malaysia

"AI tends to converge results toward a rational, explainable mean. But real consumer decision distributions aren't normal."

Research Director, Switzerland

Preferences and attitudes leave traces and are tractable to simulate. But an internet platform's research lead found that once you cross category boundaries, the picture shifts entirely:

"One person's decision logic for buying milk versus buying a tech device might be completely different."

Research Lead, China

"You ask the same question at a different time, you get a different answer — is that simulating how people change, or is the model itself unstable?"

Research Team, United States

In group settings, a different worry emerged. When multiple synthetic consumers were assembled into a focus group, the uniformity of their responses was striking:

"Every persona gave the same answer to the same question — that's not what happens when you're sitting across from real people."

CMI Lead, China

2.2 Capability boundaries — “Which research scenarios work, which don't?”

The second line of questioning skips “how close” and goes straight to testing limits against specific use cases.

A food company tried using synthetic consumers to screen new flavors and hit a wall immediately: the model has no body.

"For something like food and beverage — where you need to actually taste it to have an opinion — on what basis is AI making sensory judgments?"

Food R&D Lead, China

"We need feedback from users who've actually used the product. It's never touched our interface — what's its opinion based on?"

UX Research Lead, Singapore

It's not just the outcome that matters — the interaction process itself carries information:

"Most of what we're testing isn't a static screen — it flows."

Product Lead, Singapore

New product teams face an even more extreme version. When the product hasn't launched yet, the synthetic consumer has no experience to draw on — only analogical reasoning:

"The product isn't live yet and no general-purpose LLM has ever seen this product form. Can you really get useful feedback just by describing the concept in a prompt?"

Product Innovation Team, China

A beauty brand hit yet another boundary when testing packaging visuals. The AI's choices appeared to follow its own aesthetic pattern rather than reflecting how real consumers evaluate design:

"AI seems to favor high-contrast colors when selecting packaging — we suspect that's the model's own visual bias, not how real consumers actually choose."

Brand Team, China

Scenario	Experiential cognition required
01Food & beverage	Taste and sensory perception
02UX & interface testing	Swipe, tap, and real-time interaction
03Unlaunched products	Analogical reasoning beyond training data
04Packaging & visual design	Aesthetic judgment and visual preference

01Food & beverage

Taste and sensory perception

02UX & interface testing

Swipe, tap, and real-time interaction

03Unlaunched products

Analogical reasoning beyond training data

04Packaging & visual design

Aesthetic judgment and visual preference

2.3 Knowledge reliability — “Can you trust what it says?”

The third line of questioning moves past “how close” and “where does it work” to a more fundamental risk: whether the information a synthetic consumer outputs is itself reliable.

"People inside the industry can spot the obvious errors right away."

R&D Lead, China

The operative phrase is “inside the industry.” In one food R&D evaluation, a synthetic consumer's description of a testing method raised immediate alarm:

"The AI told us consumers could use test strips to detect it — we checked, and that's simply not possible with current technology."

R&D Team, Singapore

A similar case surfaced in a discussion of baking formulations, where the synthetic consumer confused the application contexts of a specific ingredient:

"AOP butter goes in bread and pastry, not cake. If a real person made that mistake in an interview, we'd throw out their data."

CMI Lead, China

At this point in our conversations, the dynamic shifted. Enterprises stopped only probing for weaknesses and started offering paths forward — could domain knowledge be injected into the model?

"Could we do some optimization on our end… help it understand rigid packaging better?"

R&D Lead, Vietnam

A healthcare platform went further, arguing that the depth of industry-specific training data requires collaboration from within the sector:

"Pharma-level corpus depth — probably only a platform like ours could provide that kind of partnership. You'd need full-text comprehension of their academic positions."

Industry Platform, China

Others proposed connecting the model to the enterprise's existing proprietary data:

"I have an ingredients database… how do I get the model to actually use that in research?"

R&D Team, India

03. Differentiation from general AI — “Why can't I just do this with ChatGPT?”

This question tends to arrive at a specific moment: after a team has become interested and is now thinking about how to justify the investment internally.

"I've been trying to give ChatGPT memory and context to simulate consumers. Every platform is doing something similar. What's the essential difference?"

Innovation Lead, Switzerland

There's also the question of defensibility. An industry partner raised concern about large consulting firms building similar tools:

"If any company with LLM capability can do the same thing, what's the actual technical barrier?"

Strategy Lead, Switzerland

04. Data security & enterprise access — “Can this even get through our front door?”

"The consumer insights we've accumulated are our core IP. Once uploaded to the platform, they can't leak out, and they won't serve as input for training external models — that's a red line."

R&D Lead, United States

Private deployment is the request behind many of these conversations — and where the gap between what enterprises need and what AI tools can currently offer is widest.

What the pattern reveals

What each category is actually asking for:

Gate	Validity required	Core verification
01Data sourcing	Population Validity	Confidence that the right people were the basis for the model. If the foundation population is misaligned or filled with noise, everything downstream is hallucination.
02Simulation fidelity	Construct Validity	Confidence that the model captures what it claims to capture — irrational impulses, preference shifts, decision-making detours that averages would erase.
03Differentiation	Comparative Validity	Evidence that purpose-built simulation produces outputs materially different from general-purpose prompting — auditable, repeatable, and traceable.
04Security & access	Enterprise Operability	The governance infrastructure — sandboxed deployment, zero training-set contamination — that makes deployment possible at all.

01Data sourcing

Population Validity

Confidence that the right people were the basis for the model. If the foundation population is misaligned or filled with noise, everything downstream is hallucination.

02Simulation fidelity

Construct Validity

Confidence that the model captures what it claims to capture — irrational impulses, preference shifts, decision-making detours that averages would erase.

03Differentiation

Comparative Validity

Evidence that purpose-built simulation produces outputs materially different from general-purpose prompting — auditable, repeatable, and traceable.

04Security & access

Enterprise Operability

The governance infrastructure — sandboxed deployment, zero training-set contamination — that makes deployment possible at all.

These are not product-specific requirements. They are what the field needs to establish before AI consumer simulation functions as a standard research tool rather than an experimental one.

What comes next

We'll follow this map with a framework for how sampling, construction, and simulation connect as phases — where the quality of each determines the reliability of the next.