Ask ChatGPT to recommend running shoes today. Ask again tomorrow. You will almost certainly get a different list of brands, in a different order, with different citations. A major study from SparkToro and Gumshoe found there is less than a 1-in-100 chance any two runs of the same prompt return the same brand list. The odds of getting the same list in the same order? Less than 1 in 1,000.
TL;DR: AI search results are fundamentally non-deterministic. The same query produces different brands, different citations, and different rankings on nearly every run. This is not a bug — it’s how large language models work. The good news: while individual rankings are meaningless, visibility frequency (how often your brand appears across many runs) is a stable and trackable metric.
The Scale of the Problem
The SparkToro/Gumshoe study is the most rigorous look at AI recommendation consistency to date. 600 volunteers completed 2,961 runs across ChatGPT, Claude, and Google AI, testing 12 controlled prompts and 142 human-written variations.
The results were stark:
- Less than 1% chance of getting the same brand list twice from ChatGPT or Google AI
- Less than 0.1% chance of getting the same list in the same order
Separate cross-platform research reinforces the fragmentation. SearchAtlas tracking data found only 11% domain overlap between ChatGPT and Perplexity for identical prompts, and Ahrefs research found only 12% of URLs cited by AI systems overlap with Google’s top 10 organic results.
If you are tracking your AI search visibility by checking where you rank in a single query, you are measuring noise.
Why AI Search Can’t Give the Same Answer Twice
This is not a fixable software bug. It is a core property of how large language models generate text.
LLMs work by assigning a score to every possible next word, then picking one — not always the top-scored word, but a weighted random choice from the top candidates. Even when you set AI tools to their most predictable mode (called “temperature zero”), results still vary. Research from Thinking Machines Lab found that 1,000 runs of the same prompt at temperature zero produced 80 different completions.
The primary culprit is how AI servers handle many requests at once. When multiple queries hit the same server, tiny rounding errors in the math — smaller than a decimal point — snowball into different word choices in the final answer. A technical fix exists that would remove this randomness, but it roughly doubles the time needed to generate each answer. No major platform has implemented it.
There are additional sources of instability. Model updates happen continuously. Retrieval-augmented generation (where AI pulls fresh content from the web before answering) uses indexes that change daily. Personalization, geographic signals, and session context all shift the output. The FINOS AI governance framework identifies non-determinism as a core risk category for AI systems precisely because it cannot be fully eliminated. NIST reaches the same conclusion in its Generative AI Profile (AI 600-1), flagging output reliability as a foundational trustworthiness concern for generative AI systems.
Citation Drift Is Real and Measurable
The instability compounds over time. Research tracking citation patterns across AI platforms (according to SearchAtlas’s tracking data) found significant monthly drift:
| Platform | Monthly Citation Drift |
|---|
| Google AI Overviews | 59.3% |
| ChatGPT | 54.1% |
| Perplexity | 40.5% |
That means more than half the sources cited by Google AI Overviews in any given month will be replaced by different sources the following month. If your brand appeared as a citation in January, there is roughly a coin-flip chance it will still appear in February — even if nothing about your content changed.
The Good News: Visibility Frequency Is Stable
Here is where the data gets encouraging. While ranking position is meaningless, the set of brands that AI considers for a given topic is relatively stable.
The SparkToro study found strong consistency in which brands appeared at all:
- City of Hope (cancer treatment): appeared in 97% of 71 ChatGPT runs
- SmartSites (e-commerce agency): appeared in 89% of 95 Google AI runs
- Top headphone brands: 55-77% visibility across hundreds of runs
The same study tested 142 different human-written prompts about headphones. People phrased the question in wildly different ways — almost no overlap in wording. Yet AI tools recognized the same intent and returned brands from the same consistent pool.
This tells us something important: AI systems maintain a stable “consideration set” for each topic. Your goal is not to rank #1 in a single query. Your goal is to appear in that consideration set as frequently as possible. Understanding what GEO actually means is the first step toward building that consistent presence.
What You Can Do
Stop tracking single-query rankings. A single AI query tells you almost nothing. You need 60-100+ runs per prompt to get statistically meaningful visibility data.
Measure visibility percentage instead. Track how often your brand appears across many runs of relevant prompts. A brand that shows up in 70% of runs has strong AI visibility. One that shows up in 15% of runs has work to do. That ratio is stable and actionable.
Optimize for the consideration set, not position. Focus on the signals that get your brand into the pool AI draws from: topical authority, structured content, strong citation networks, and consistent entity information (your brand name, descriptions, and facts appearing the same way across the web).
Monitor across platforms. With only 11% overlap between ChatGPT and Perplexity citations, visibility on one platform does not guarantee visibility on another. Track each platform separately.
Expect and plan for citation drift. If more than half your citations change monthly, ongoing optimization is not optional. Freshness, regular content updates, and sustained authority signals matter more than any one-time fix. AI visibility has a measurable shelf life — understanding the decay mechanics helps you build a sustainable refresh strategy.
Common Questions
Why does AI search give different answers every time?
AI tools don't look up answers — they generate each word based on weighted randomness. Even when set to their most consistent mode, tiny math differences in how servers handle parallel requests cause different outputs. On top of that, the sources AI pulls from change daily, so the same question reliably gets a different answer each time.
Can I track my brand's ranking in AI search results?
Single-query rankings are statistically meaningless — there is less than a 1-in-100 chance of getting the same brand list twice. Instead, track visibility percentage: how often your brand appears across 60-100+ runs of the same prompt. This metric is stable and actionable.
How often do AI search citations change?
Citation drift is significant across all major platforms. Google AI Overviews shows 59.3% monthly citation drift, ChatGPT shows 54.1%, and Perplexity shows 40.5%. This means ongoing content optimization is essential — a single citation today may disappear next month even if your content hasn't changed.
Key Takeaways
- AI search results are non-deterministic by design, not by accident. The same prompt returns different brands, rankings, and citations on nearly every run.
- There is less than a 1-in-100 chance of getting the same brand list twice from ChatGPT or Google AI.
- Single-query rank tracking is meaningless. You need 60-100+ runs per prompt for statistical validity.
- Visibility percentage (how often you appear) is a stable, trackable metric. Ranking position is not.
- Citation drift exceeds 50% monthly on Google AI Overviews and ChatGPT. Ongoing optimization is required.
- AI systems maintain a stable “consideration set” for each topic. Your goal is to be in that set consistently.
- Cross-platform visibility varies dramatically, with only 11% domain overlap between ChatGPT and Perplexity.
References
- SparkToro & Gumshoe. “New Research: AIs Are Highly Inconsistent When Recommending Brands”. January 2026.
- Search Engine Journal. “AI Recommendations Change With Nearly Every Query”. 2026.
- Thinking Machines Lab. “Defeating Nondeterminism in LLM Inference”.
- FlowHunt. “Defeating Non-Determinism in LLMs”.
- SearchAtlas. “AI Results Keep Changing”.
- iPullRank. “Probability & AI Search”.
- FINOS. “Non-Deterministic Behaviour Risk”.
- NIST. “Generative AI Profile (AI 600-1)”. July 2024.