GEO: The New Rules of Search Visibility

GEO: The New Rules of Search Visibility

Most SEO advice is stuck in 2020.

Optimize your title tag. Build backlinks. Hit keyword density targets. That stuff still matters. But there's a parallel game now, and most people aren't playing it.

When someone asks ChatGPT "what's the best framework for X" or runs a query on Perplexity, the answer doesn't come from a ranked list of ten blue links. It comes from a synthesized response that cites a handful of sources. Your page is either one of those citations, or it doesn't exist.

That shift — from ranking to citation — is what Generative Engine Optimization is about. I've been applying these principles to a new project built from scratch with GEO in mind, and retrofitting an existing one that was built before any of this mattered. The difference between starting fresh and retrofitting teaches you a lot about what's essential versus what's nice-to-have.

What GEO actually is

GEO — Generative Engine Optimization — was formalized in a 2023 paper from Princeton, IIT Delhi, Georgia Tech, and the Allen Institute for AI, later accepted at KDD 2024. The researchers built GEO-bench (10,000+ queries across domains), tested nine content optimization methods against generative engines, and measured how each affected visibility in AI-generated responses.

The headline finding: the right methods can boost your visibility in AI responses by up to 40%. The top three — citing sources, adding statistics, and including expert quotations — each delivered 30-40% visibility improvements when validated on Perplexity.ai. But here's the part most summaries skip — keyword stuffing, the bread and butter of old-school SEO, actually decreased visibility by roughly 10%. AI engines penalize it.

The nine methods tested, grouped by impact tier:

Impact TierMethodsVisibility Boost
HighestCite sources, statistics, quotations+30-40% each
MediumAuthoritative tone, easy-to-understand language, technical terms+18-25% each
ModerateUnique vocabulary, fluency optimization+15-30% each
NegativeKeyword stuffing-10%

The best combination: fluency + statistics, which outperformed any single method by an additional 5.5%. Clean writing with real numbers. Not complicated. Just rigorous.

This tracks with what I've seen in practice. The content that gets cited isn't the most keyword-optimized — it's the most useful. Clear structure, real data, direct answers.

Each AI engine wants something different

Here's where it gets interesting. These aren't interchangeable systems. Each AI search engine has its own index, its own ranking signals, and its own quirks.

ChatGPT

ChatGPT uses a Bing-based web index for real-time retrieval. Two factors dominate what gets cited: domain authority and recency. High-trust domains with strong backlink profiles get cited far more often than third-party aggregators. Freshness matters — ChatGPT noticeably favors recently updated content over stale pages, even if the older page has better information.

The other thing I've observed: content-answer fit. ChatGPT tends to cite content that matches the style of its own responses — clear, direct, conversational. If your page reads like a well-structured Wikipedia article, it's more likely to be cited than a marketing page with the same information. Your content needs to be easy for the model to extract and paraphrase, not just topically relevant.

Perplexity

Perplexity uses retrieval-augmented generation (RAG) across its own index and Google. It has aggressive quality filtering — results that don't meet a relevance threshold get discarded entirely before the model ever sees them.

What I've found works best: FAQ schema and semantic relevance. Pages with FAQPage JSON-LD markup get cited more often. Clear, atomic paragraphs — single ideas per paragraph, easy to extract and quote — outperform long-winded explanations. Publicly hosted PDF documents also seem to get preferential treatment.

You also need to explicitly allow PerplexityBot in your robots.txt. Sounds obvious, but I've seen sites that block it without realizing.

Google AI Overview

Google's AI Overviews (formerly SGE) layer AI responses on top of traditional search, using Gemini for reasoning alongside E-E-A-T evaluation and structured data signals.

The key stat: in an Authoritas study of 1,000 commercial keywords, 93.8% of generative URLs came from sources outside the top page 1 organic results. Only 4.5% of generative URLs exactly matched a page 1 organic URL. Ranking well in traditional Google isn't enough — AI Overviews pull from a different pool. Structured data, topical authority (content clusters with strong internal linking), and authoritative citations all carry weight.

If you're already doing E-E-A-T well — author bios, credentials, first-hand experience — you're ahead. But don't assume your page 1 ranking translates to AI Overview visibility. It probably doesn't.

Claude

This one surprised me. Claude uses Brave Search, not Google or Bing. If your site isn't indexed by Brave, Claude can't find you. One analysis found an 87% correlation between Claude's citations and Brave's top results — making it the most predictable AI engine to optimize for, if you know where to look.

Claude's selection is also extremely picky. According to Cloudflare Radar data from mid-2025, Anthropic's crawl-to-refer ratio was 38,065:1 — meaning Claude crawls roughly 38,000 pages for every one it sends a visitor back to. That's down from 286,000:1 earlier in the year (improved dramatically after Claude launched web search), but it's still the most selective among major platforms. What gets through: high factual density, clear structure, verifiable data points.

You need to allow both ClaudeBot and anthropic-ai in your robots.txt.

Microsoft Copilot

Copilot pulls from the Bing index. If you're not in Bing, you don't exist for Copilot. Beyond that, it rewards fast page speed (under 2 seconds), clear entity definitions, and presence in the Microsoft ecosystem — LinkedIn profiles, GitHub repos, and other Microsoft-adjacent platforms give a signal boost.

What actually moves the needle

After working through optimizations on both a greenfield project and a retrofit, here's what I've found matters most — in order of impact.

1. Let the bots in

Table stakes, but it's where most people fail. Your robots.txt should allow all major AI crawlers:

User-agent: GPTBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: anthropic-ai
Allow: /

User-agent: Bingbot
Allow: /

If you're on a platform that auto-generates robots.txt, check what it's actually serving. I've seen defaults that block everything except Googlebot.

2. Schema markup — especially FAQPage

FAQPage schema consistently delivers across platforms. It's one of the highest-signal structured data types for AI engines. The FAQ format maps directly to how these systems want to present information: question, direct answer, done.

A basic implementation:

{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [{
    "@type": "Question",
    "name": "What is generative engine optimization?",
    "acceptedAnswer": {
      "@type": "Answer",
      "text": "GEO is the practice of optimizing content to be cited by AI search engines like ChatGPT, Perplexity, and Claude. Unlike traditional SEO which focuses on ranking in search results, GEO focuses on being selected as a source in AI-generated responses."
    }
  }]
}

Start every answer with a direct response. Not "Well, it depends..." — that gets skipped. "GEO is the practice of..." — that gets cited.

3. Answer-first content structure

AI engines extract information top-down. Put the answer at the top of the section, then provide context. This is the opposite of how most blog posts are written (build-up, then reveal). For GEO:

  • H1 with the primary topic
  • First paragraph answers the core question directly
  • Subsequent paragraphs add depth, examples, data
  • H2/H3 hierarchy that's clean enough to parse programmatically

Think of your headings as an API. If someone extracted just your H1, H2s, and the first sentence under each — would the content still make sense? That's what AI engines are doing.

4. Statistics and citations

The Princeton research showed citations and statistics each boost visibility by 30-40%. In practice, these work best when they're specific and sourced:

  • Vague: "Most companies use AI now."
  • Specific: "According to [source], 67% of Fortune 500 companies use AI chatbots for customer service, handling 85% of routine inquiries without human intervention."

The second version is extractable, quotable, and verifiable. AI engines favor all three qualities.

5. Keep content fresh

Recency matters across AI engines. ChatGPT noticeably favors recently updated content, and Google's AI Overviews use freshness signals too. If a post is more than a few months old, update it. Add a "last updated" date. Refresh statistics. Sometimes updating a few data points and the dateModified in your schema is enough to signal freshness.

A real optimization, start to finish

To make this concrete: I worked through a full optimization on a developer tools site. Starting point was rough — no schema markup, basic meta tags, no FAQ section, AI bots not configured.

Keyword research revealed the brand name conflicted with an industrial automation protocol in English-speaking markets. The fix: pivot to long-tail keywords. "AI agent skills for solopreneurs" instead of the brand acronym. Match user intent, not your brand identity.

Schema implementation added four types: WebPage, SoftwareApplication, FAQPage (12 questions), and Organization. The FAQPage alone made the content eligible for rich results and gave AI engines 12 extractable question-answer pairs.

GEO methods applied:

  • Statistics bar: "10+ Skills | 5 Platforms | One-Click Install | 100% Open Source" — specific, scannable, quotable
  • Answer-first FAQ structure — every answer opens with a direct statement
  • Authoritative tone in the hero section
  • Citations to official platform documentation

Before/after:

MetricBeforeAfter
Schema types04
FAQ items012
Meta description length20 chars155 chars
AI bot access configuredNoYes
Rich results eligibleNoYes

Based on the Princeton GEO research, applying multiple methods (citations, statistics, answer-first structure, authoritative tone) should yield meaningful visibility improvements — the paper measured up to 40% for combined methods. The real proof takes time; you need to monitor actual citation rates over months. But the structural improvements are measurable immediately.

What I got wrong about GEO optimization

I'll be honest about what I'm still figuring out.

Brave Search indexing is opaque. You can't just submit a sitemap to Brave the way you can with Google or Bing. Getting indexed by Brave (and therefore visible to Claude) is less predictable. I haven't found a reliable way to accelerate it.

Citation tracking is hard. There's no "Search Console for AI engines." You can manually test queries, but systematically measuring your citation rate across ChatGPT, Perplexity, Claude, and Google AI Overview? No good tooling exists yet. It's a lot of manual spot-checking.

The recency treadmill is real. AI engines favor fresh content, which means you're incentivized to keep updating — but there's a point where constant updates feel like gaming the system rather than genuinely improving content. I don't have a clean answer for where that line is.

Platform fragmentation adds overhead. Optimizing for five different AI engines plus traditional Google means a lot of surface area. In practice, I focus on the universal wins (schema, bot access, answer-first structure, statistics) and only go platform-specific when there's a clear gap.

Why GEO is just good technical writing

GEO isn't some exotic new discipline. It's what good technical writing has always been. Clear structure. Real data. Direct answers. Proper citations. The difference is that now there's a measurable reward for doing it — AI engines can quantify what "useful content" looks like, and they're selecting for it.

The sites that do well in AI search are the same ones that were already doing good work. GEO just makes the invisible visible.

If you're writing content with real expertise, structuring it clearly, and including verifiable data — you're already most of the way there. The schema markup, bot access, and platform-specific tweaks are the last mile. Important, but not the hard part.

The hard part is having something worth citing in the first place.