Synthetic Human Insight Data: Generative AI Solutions for Market Research

Jul 27, 2023 | AI, Our Thinking, Technology + Media

By Maury Giles, Chief Growth Officer at Heart+Mind Strategies

Talk about generative AI is easily found everywhere you look today. And the tone and tenor seem to suggest the two camps of AI enthusiasts and skeptics are further entrenching. We are seeking practical use cases to advance what we do at Heart+Mind Strategies — synthesize patterns in data to drive stronger human connections.

The accessibility and advancement of large language models (LLMs) powered by machine learning and artificial intelligence make this technology a potent insights tool. These text-based generative AI engines convert human-created meaning through language into novel synthetic data with meaning strategic thinkers can leverage. 

Synthetic Data Defined: Artificially Generated Data

Synthetic data has long been been used, through machine learning and artificial intelligence, to address problems in data science and to support scenario exploration. In short, it is artificially generated data that imitates real data distributions, patterns, and characteristics. 

ChatGPT-4 offers this easy to follow explanation of synthetic data:

“Imagine you have a dataset of real information, like people’s names, ages, and favorite hobbies. Now, synthetic data is like a clever computer-generated version of that dataset. It’s made by a special computer program that looks at the real data and creates new, artificial data that’s very similar to the real stuff.

One of the cool things about synthetic data is that it doesn’t have any real people’s private information, so it’s safe to use and share without worrying about privacy concerns. But, even though it’s not real, it still looks and behaves like real data, which makes it super useful for different things.

For example, researchers can use synthetic data to test their new computer programs and make sure they work well. They can also use it to imagine what could happen in different situations without having to use real data. And, when there’s not enough real data, synthetic data can help by creating more examples to work with.

So, in a nutshell, synthetic data is like a smart and safe pretend version of real data that helps scientists, researchers, and computer programmers learn, create, and make better decisions.”

Common Uses of Synthetic Data

Synthetic data has been used to run data analysis while protecting privacy (personal information), augment data to increase the database size and improve modeling, develop and test algorithms, explore scenarios, share data across an organization without compromising data privacy, train artificially generated data sets, and to simulate anomalies or rare events. 

In every use case of synthetic data, a program creates data that closely mimics the real thing. This allows for experimentation, planning, and exploration in safe and creative ways.

New Human Insight Synthetic Data

Market researchers now have new synthetic data tools with generative AI that we need to learn how to use.  For example, text-based generative AI LLMs (I.e., ChatGPT and Bard) arm the insights professional with a clever new way to explore human thought, beliefs, values, and motivations. Ask the LLM a question and it will give a novel response that closely resembles what a live human would say. But the answers it “makes up” come from the LLMs understanding of meaning it has “learned” from massive language-based data sets originally created by live humans. 

Essentially, we have a tool with which we can have interactive conversations about nearly every topic that mimic human dialogue and reflect human thought. The conversations with a generative AI LLM are a reflection of what humans have written and shared about these topics. The answers are, in fact, synthetic data we can analyze to better understand people, markets, and culture.

RELATED: Practical Applications for AI in Business + Marketing

generative AI

Types of Synthetic Human Insight Data

At Heart+Mind we have already used generative AI for multiple applications of synthetic qualitative human insight data via ChatGPT and Bard. 

  1. Persona-based virtual in-depth AI interviews. These are simple role play conversations with the AI engine using the same discussion guide for a live interview with a human.
  2. Persona-based virtual focus group discussions. These are role play conversations in which the AI engine plays multiple “people” throughout a full session. A moderator conducts the exercise using the same discussion guide for a group with real people.
  3. Scenario exploration for how people, markets, and culture respond, act, or think under specific circumstances and/or conditions.

Earlier this year we compared in-depth interviews (IDIs) with the generative AI persona to IDIs with real people. The results taught us the importance of prompt engineering and the power of what can be learned. The outcomes closely mirrored up to 80% (more in some cases) of what actual humans told us in the interviews with the real people for whom the AI personas were created.

RELATED: Heart+Mind Shares Pioneering Experimental AI Study

Just recently we recreated a six-person focus group discussion using a role play activity with the generative AI tool. We gave each virtual “participant” a character with persona descriptors to direct the generative AI tool in how to inform the “dialogue” in the session. Check out the full text of the session here

We have also proven the potential for using the generative AI tool to produce unique human-centric responses under different market conditions with multiple persona definitions. This creates the possibility for synthetic data scenario planning tools.

And we are just getting started.

Meaningful Synthetic Data or Garbage Data?

An understandable critique of these generative AI tools from a trained researcher is to question the validity lacking an identifiable data source. How do we know if it is accurate and not a hallucination? 

The validity concern is irrelevant when you classify the text-based generative AI output as synthetic data or AI-generated “pretend data”. It is a reflection of meaning derived from a large language database on which the models have been trained.

In other words, we should not look at these role play interviews, group discussions, or scenario planning as definitive “truths”. Rather, they offer an informed exploration of a topic, idea, concept, or scenario harnessing the insights from documented human culture and meaning in words.

Such a tool can be a discovery exercise, a get-smart activity, a precursor to fielding custom research, or even a pre-test of a line of questioning before investing in the “real thing” with recruited participants. It also offers an incredibly immersive thinking tool to anticipate how specific target audiences may react in multiple scenarios when building strategy.

Reach Out To Explore Together

By definition, synthetic data will never be real data. But just as we have found great value in synthetic data uses with large datasets, there is great potential for synthetic text-based data with endless possibilities through existing generative AI tools.

We would love to partner with you on this journey. Reach out to Heart+Mind Strategies for creative and actionable ways to generate synthetic data we can help you use to champion the human in the strategies you create for the people you seek to connect within across your target markets.

Share This