June 2, 2026·8 min read·Technical

How Google's AI Actually Decides What to Cite — A Technical Breakdown

Lorena Ly

Founder

Most advice about ranking in AI search reads like guesswork. "Optimize for AI." "Make your content AI-friendly." But there isn't one algorithm — there's an ensemble of specialized systems, each handling retrieval, ranking, or language understanding, all running simultaneously.

This article covers how these systems actually work together to decide what gets cited in AI search, drawn from Google's own documentation and public statements from the engineers who built them.

The Two Core Technologies Behind AI Search

Retrieval-Augmented Generation (RAG)

AI Overviews don't generate answers from the model's training data alone. They use Retrieval-Augmented Generation:

Google's traditional ranking systems retrieve relevant pages for a query
Those pages are fed to an LLM as context
The LLM generates a synthesized response grounded in that retrieved content
The response includes clickable citation links back to the source pages

The LLM only works with pages the ranking systems already selected — making those ranking systems the gatekeepers of AI citations. Everything that matters for traditional search ranking also matters for AI citations.

Query Fan-Out

When a user asks a complex question, Google's AI generates related sub-queries to fetch additional results. A query like "best CRM for small agencies that integrates with Slack" might trigger sub-queries about CRM comparisons, Slack integrations, and small business software — pulling results from each. Your content doesn't need to match the exact query typed; it needs to match the sub-queries the system generates.

The Ranking Systems Ensemble

As Pandu Nayak (Google Fellow and VP of Search) put it: "Search runs on hundreds of algorithms and machine learning models... Each algorithm and model has a specialized role." Here are the major ones.

RankBrain (2015) — Concept-Based Ranking

Google's first deep learning system for search. RankBrain maps queries to conceptual meanings rather than matching keywords literally — a search for "consumer at the highest level of a food chain" maps to "apex predator" even though those words never appear. It handles ranking (ordering results by conceptual relevance), not retrieval. Content that thoroughly covers a concept gets ranked higher in the retrieval set that feeds AI Overviews.

Neural Matching (2018) — Concept-Based Retrieval

Where RankBrain orders results, Neural Matching finds them. It understands "super fuzzy" concept representations, matching entire queries to entire pages even when surface-level terms don't overlap. For example, a search for "insights how to manage a green" gets connected to pages about color-based personality management frameworks — despite the query being nearly incomprehensible on its face. If Neural Matching doesn't recognize your content as conceptually relevant, the AI never sees it and can never cite it.

BERT (2019) — The Language Understanding Layer

BERT handles both retrieval and ranking for nearly every English-language query. Its key capability: understanding how small words change meaning. A search for "can you get medicine for someone pharmacy" hinges on "for someone" — previous systems might have dropped those as filler. BERT understands they change the entire query. Content with natural, precise language where word relationships are clear aligns with what BERT is optimized to understand.

System	Year	Primary Role	Scope
RankBrain	2015	Ranking (ordering results)	Concept-to-query matching
Neural Matching	2018	Retrieval (finding candidates)	Fuzzy concept matching
BERT	2019	Both retrieval and ranking	Language understanding for ~every English query
MUM	2021	Specific applications only	Multimodal, multilingual understanding

MUM (2021) — The Most Powerful System Google Barely Uses

MUM is 1,000x more powerful than BERT, trained across 75 languages, and processes both text and images. But Google only deploys it for narrow applications: vaccine information, Google Lens, crisis responses. It is not used for general search ranking or retrieval. For today's AI citations, BERT, Neural Matching, and RankBrain are doing the heavy lifting.

The Supporting Cast — Systems That Shape the Candidate Set

Beyond the four major systems, several others directly shape which pages feed AI Overviews.

PageRank still matters. Links from authoritative sources remain a core trust signal. No amount of content quality overcomes a complete absence of authority signals.

Original Content Systems elevate original reporting and research over derivative content. When five sites cover the same story, the site that broke it gets citation priority.

Passage Ranking evaluates specific sections within a page independently. A 3,000-word guide can have each section match different queries and be independently cited by AI Overviews. Well-organized pages with clear headings and self-contained passages are structurally more citable.

Freshness Systems weight recent content for time-sensitive queries. A page last updated in 2023 loses to one updated this quarter when freshness matters.

Helpful Content Signals were absorbed into core ranking in March 2024. People-first content is now rewarded by the core systems themselves; ranking-first content is penalized.

SpamBrain acts as a negative filter. Pages flagged for manipulative link building, auto-generated filler, or cloaking never make it into the retrieval set.

15% of Queries Have Never Been Searched Before

Google has confirmed that 15% of daily queries are completely new — hundreds of millions of never-before-seen searches every day. Pre-programmed rules can't handle queries that don't exist yet; concept matching can. This is why Google invested in RankBrain, Neural Matching, and BERT rather than maintaining lookup tables of queries to results.

For content strategy, this means you cannot anticipate every query that might surface your content — and you don't need to. If your content comprehensively covers a concept with the right depth, structure, and authority signals, Google's systems can match it to queries no one has imagined yet. This is the technical reality behind "write for topics, not keywords."

The Ensemble Effect — Why No Single Trick Works

All of these systems operate simultaneously. A page needs to perform well across conceptual relevance (RankBrain, Neural Matching), language precision (BERT), authority (PageRank), originality (Original Content Systems), structure (Passage Ranking), freshness, helpfulness, and spam signals — all at once.

Gaming one system while neglecting others doesn't work. The only strategy that performs well across the entire ensemble: create genuinely useful, original, well-structured content that demonstrates real expertise. That's not marketing advice — it's an engineering constraint.

What This Means for AI Citations

Concept coverage beats keyword targeting. RankBrain and Neural Matching operate on concepts, not keywords. A page that thoroughly covers "customer data platform implementation for mid-market retailers" will match queries that use none of those specific words — as long as the conceptual coverage is comprehensive. Write naturally, use your audience's terminology, cover adjacent concepts.

Comprehensive pages beat scattered thin pages. Passage ranking lets Google cite specific sections of a longer page for different queries. One thorough, well-structured page can rank for dozens of queries through passage-level matching. Create definitive resources with clear headings where each section is a self-contained, citable passage — not separate pages for every keyword variation.

Original research creates structural advantages. Proprietary data, original surveys, first-hand case studies, and novel frameworks all register as source material rather than derivative commentary. If your content summarizes what others already published, those others get citation priority.

Authority still requires real links. Content without backlinks from relevant, authoritative sources faces a structural disadvantage regardless of quality. This isn't about link building in the traditional SEO sense — it's about creating content valuable enough that industry publications and professional communities reference it naturally.

Traditional search ranking and AI visibility are the same problem. AI Overviews use RAG — the AI generates responses from pages that Google's existing ranking systems retrieved. There is no separate "AI algorithm." Improve your traditional search ranking and your AI visibility improves with it. The two are architecturally coupled.

The Technical Reality vs. The Marketing Narrative

Marketing says AI search is a new paradigm requiring entirely new strategies, and one system (often MUM or "the AI") decides what to cite. The reality: AI search features are built on top of the same ranking systems that power traditional search. The AI doesn't think — it generates responses from retrieved pages. The retrieval is done by ranking systems that evaluate well-understood signals. Hundreds of specialized systems run simultaneously; no single system is in charge.

This distinction determines where you invest effort. Chasing "AI optimization" as a separate discipline leads to busywork. Investing in content quality, topical authority, and genuine expertise compounds across every ranking system — including the ones that feed AI search.

A Framework for Thinking About AI Citations

Based on how these systems actually work, here's a framework for evaluating whether your content is positioned to earn AI citations:

Question	System(s) Evaluating This	What Good Looks Like
Does your content comprehensively cover the concept?	RankBrain, Neural Matching	Thorough topical coverage using natural language, not keyword stuffing
Does your language precisely match user intent?	BERT	Clear, specific writing where word relationships convey accurate meaning
Is your content the original source?	Original Content Systems	First-party data, original research, novel analysis
Do authoritative sources reference your content?	PageRank	Earned editorial links from industry publications and experts
Are sections clearly structured and self-contained?	Passage Ranking	Clear headings, each section independently answers a specific question
Is your content current?	Freshness Systems	Regular updates, especially for time-sensitive topics
Was your content created for readers?	Helpful Content signals (core)	People-first content with genuine utility, not ranking-first content
Are your practices clean?	SpamBrain	No manipulative link schemes, no auto-generated filler content

Every "yes" makes your page more likely to appear in the retrieval set that feeds AI-generated responses. Every "no" is a reason the AI might cite a competitor instead.

Where This Goes Next

The ensemble will evolve — MUM's capabilities will likely expand, new systems will be added — but the architectural pattern is set: multiple specialized systems evaluating different quality signals simultaneously. Sustainable AI citation performance comes from the same place sustainable search performance always has: being the most authoritative, original, and comprehensive source on topics within your domain.

The ranking system details in this article are drawn from Google's Ranking Systems Guide, Google's AI Optimization Guide, and Pandu Nayak's blog post "How AI Powers Great Search Results" (February 2022). System descriptions reflect their documented roles as of the publication date.