The 20 Checks That Determine Whether AI Cites Your Content AEO

The 20 Checks That Determine Whether AI Cites Your Content

Learn the exact criteria AI engines evaluate before citing content. Master these 20 checks to increase your visibility in ChatGPT, Perplexity, Claude and other AI platforms.

Oli Guei

Oli Guei

·
14 min read

AI citation checks are the specific criteria that language models like ChatGPT, Perplexity, and Claude evaluate before deciding whether to quote or reference your content in their responses. These checks span five dimensions: answerability, citability, entity alignment, freshness, and parseability. Understanding them is the difference between being cited and being invisible.


Key Takeaways

  • AI engines evaluate content against 20 specific checks across 5 dimensions before citing it
  • Answerability (definitional lede, heading structure, key takeaways) determines if AI can extract a clean answer
  • Citability (sourced statistics, authoritative links, expert attribution) signals trustworthiness
  • Freshness signals (visible dates, temporal markers) reduce AI’s risk of citing outdated information
  • Schema markup (JSON-LD with correct types) is the API that makes your content machine-readable
  • Pages passing all checks get cited disproportionately more than pages passing most checks

I spent three months trying to figure out why some pages get cited by AI engines and others get ignored.

Not ranked. Not surfaced in search. Cited. As in, ChatGPT or Perplexity or Claude actually pulling a quote or fact from your page and attributing it to you in their response.

This started because I noticed something strange. Two articles covering the same topic, roughly the same quality, similar domain authority. One gets cited repeatedly across AI platforms. The other might as well not exist.

At first I assumed it was random. Maybe the AI just happened to crawl one and not the other. But the more I dug into this, the more I realized there’s nothing random about it. These systems are evaluating content against specific criteria before deciding whether to cite it.

After analyzing hundreds of pages that get cited versus those that don’t, I’ve identified 20 checks that seem to matter. Some are obvious. Others surprised me. And a few completely changed how I think about content.

Why this matters now

Before I get into the checks, here’s why I think this is worth your attention.

ChatGPT went from 300 million weekly users in December 2024 to 800 million by October 2025, according to OpenAI’s official announcements. That’s nearly 3x growth in under a year. Perplexity is processing millions of queries daily. Google’s AI Overviews now appear in roughly 16% of all desktop searches in the US, as of Q4 2025 per Search Engine Land’s analysis.

The traffic isn’t disappearing. It’s just going somewhere else.

What’s changing is how people find information. Instead of clicking through ten blue links, they’re asking an AI and getting a synthesized answer. If your content isn’t being cited in those answers, you’re invisible to a growing share of discovery.

Here’s the uncomfortable part: ranking well on Google doesn’t automatically mean AI engines will cite you. Research from Seer Interactive found that 87% of SearchGPT citations match Bing’s top 10 organic results, but that still leaves a meaningful gap where traditional rankings don’t translate. I’ve seen plenty of pages sitting at position one that never get mentioned in ChatGPT responses. The criteria are related but not identical.

So what are AI engines actually looking for?

The 20 checks

I’ve grouped these into five categories based on what they’re evaluating. Think of each category as a different lens the AI is using to decide whether your content is worth citing.


Answerability: Can AI lift a clean answer from this page?

This is the foundation. If an AI can’t extract a clear, confident answer from your content, it won’t cite you. Simple as that.

1. The definitional lede

Your first paragraph needs to define the topic clearly. Not tease it. Not build suspense. Define it.

I’ve found the sweet spot is 50 to 150 words. Too short and you haven’t said anything useful. Too long and the definition gets buried in context. AI systems seem to scan for that opening paragraph and use it as a quick test: does this page know what it’s talking about?

Here’s a pattern that works: “[Thing] is a [category] that helps [audience] achieve [goal].” Direct. Unambiguous. Easy for machines to parse.

2. H1 and topic alignment

This one sounds basic, but I see it fail constantly. Your H1 needs to match what the page is actually about.

If your title says “Complete Guide to Email Marketing” but your H1 is “Welcome to Our Blog” or something generic, you’ve created confusion. AI engines use the H1 as a signal of what the page covers. When it doesn’t match the content, trust drops.

I aim for at least 60% word overlap between the title and H1. Anything below 30% and you’re essentially telling AI systems that even you don’t know what this page is about.

3. The heading ladder

This is about structure. H1 leads to H2, H2 leads to H3. No skipping levels. No starting with H3 because you liked how it looked.

AI engines use your heading hierarchy as a table of contents. When you skip from H1 directly to H4, or have multiple H1s on a page, you’re breaking the outline. The content becomes harder to navigate and extract from.

Single H1. Logical progression down through H2s and H3s. It’s basic HTML semantic structure, but I’m constantly surprised how many pages get this wrong.

4. Procedural structure

If you’re writing how-to content, you need actual steps. Not vague suggestions. Numbered steps with action verbs.

I look for at least three distinct steps with imperative language. “Click the settings icon” beats “You might want to consider looking at settings.” AI systems trained on instructional content have learned to recognize this pattern.

If your page isn’t procedural content, this check doesn’t apply. But if you’re explaining how to do something and don’t have numbered steps, you’re making it harder to get cited.

5. Key takeaways block

This is a quick win that most people miss.

Adding a labeled section called “Key Takeaways” or “Summary” or “TL;DR” gives AI systems a pre-packaged answer block. They can lift it directly without having to synthesize from multiple paragraphs.

I put these at the top or bottom of every article now. Three to five bullets summarizing the main points. It takes five minutes and dramatically increases the chance of being cited.


Citability: Does this page supply verifiable facts?

AI engines have been trained on content that cites sources. They’ve learned to pattern match what “trustworthy” looks like. If your content makes claims without backing them up, it registers as less reliable.

6. Numeric claims need citations

Every statistic, percentage, or specific figure should have a citation within about 200 characters.

I used to sprinkle stats throughout articles without much thought about attribution. Now I treat every number as a claim that needs a source. “Content marketing costs 62% less than traditional marketing [DemandMetric]” beats “Content marketing costs 62% less than traditional marketing.”

The citation doesn’t have to be a footnote. Inline links work fine. The point is proximity. The source needs to be close to the claim.

Not all links are equal. AI systems seem to weight links to .gov, .edu, research institutions, and official documentation more heavily than random blog posts.

I aim for at least two authoritative outbound sources per piece. Government data, university research, official product documentation. These signal that you’re grounding your claims in verifiable information.

Research from Stanford’s Internet Observatory and MIT’s Media Lab on how LLMs evaluate source credibility suggests that citation to institutional sources correlates with higher trust scores in model training.

8. Expert attribution

Anonymous content gets deprioritized. Having a named author with visible credentials helps.

This can be a byline at the top, an author bio at the bottom, or expert quotes within the content. The key is that there’s a real person attached to the claims being made.

I’ve noticed that pages with clear authorship get cited more often than pages that read like they came from a faceless brand. AI systems are looking for signals of expertise, and names are one of those signals. This aligns with Google’s E-E-A-T guidelines, which AI systems have been trained on.

9. Citation recency

Your sources need to be recent. I use three years as a rough threshold.

If you’re citing research from 2019 to make claims in 2026, that’s a freshness problem. AI engines are paranoid about citing outdated information. They’re trained to prefer recent sources.

This doesn’t mean every citation needs to be from the last six months. But if more than 30% of your sources are older than three years, you might be signaling that the information is stale.


Entity Alignment: Can AI confidently identify who or what this is about?

Knowledge graphs power AI responses. If the entity you’re writing about isn’t clearly disambiguated, AI can’t confidently attribute information to the right subject.

10. sameAs disambiguation

This is a technical one, but it matters.

In your JSON-LD schema, the sameAs property should point to authoritative references for the entity. Wikipedia page. Wikidata entry. LinkedIn profile. Official company page.

This helps AI systems connect your page to the broader knowledge graph. Without it, you’re an isolated data point. With it, you’re part of a verified network of information.

11. Name consistency

The entity name should be consistent across title, H1, URL, and schema.

If your company is “Apple Corp” in the title, “Apple Corporation” in the H1, “apple-inc” in the URL, and “Apple” in the schema, you’ve created ambiguity. Is this one entity or four different ones?

I aim for at least 60% keyword overlap between title and H1. Consistency signals clarity.

12. Publisher identity

There should be a visible Organization or Person in your schema and on the page itself.

Copyright notice in the footer. “Published by” attribution. Author bio. These signals help AI systems understand who’s responsible for the content.

Anonymous pages from unclear sources don’t inspire confidence. Make it obvious who published this.


Freshness: Is this content demonstrably current?

AI engines are terrified of citing outdated information. They’re trained to look for recency signals. If you don’t provide them explicitly, you’re leaving it to chance.

13. Visible and schema dates

You need both a visible “Updated” date on the page and dateModified in your JSON-LD schema.

One without the other is incomplete. The visible date tells human readers when this was last touched. The schema date tells machines. AI systems check both.

I put “Last updated: [Month Year]” near the top of every article and ensure it matches the schema.

14. Temporal context markers

This one surprised me.

Phrases like “As of January 2026” near key statistics dramatically increase citation confidence. It’s not enough to cite a source. You need to timestamp the claim.

I now add temporal context to any statistic that might change over time. “ChatGPT has 800 million weekly users as of October 2025.” The AI knows exactly how fresh that data point is.

Pages with more internal and external links tend to signal active maintenance.

I aim for at least five links per piece. This shows the content is connected to current information, not an orphaned page that hasn’t been touched in years.


Parseability: Can AI easily extract and structure this information?

This is where technical SEO meets AI optimization. Schema markup is essentially the API that AI engines use to understand your content. Without it, they’re guessing.

16. Type-appropriate schema

Your JSON-LD should use the right type for your content.

Article or BlogPosting for editorial content. HowTo for tutorials. FAQPage for Q&A content. Product for product pages.

Generic WebPage schema is better than nothing, but specific types give AI systems more context to work with.

17. Valid JSON-LD syntax

Broken schema is worse than no schema.

Check that your @context and @type are correct. Validate with Google’s Rich Results Test. I’ve seen pages with schema that looked fine but had syntax errors that made it completely unparseable.

18. FAQ schema for Q&A content

If your page has three or more question headings, you should have FAQPage schema.

AI systems love FAQ content because it’s pre-structured in question and answer format. Adding the schema makes it even easier to extract and cite.

19. Article schema for editorial content

Blog posts and articles need Article or BlogPosting schema with headline, author, and dates.

This isn’t optional anymore. AI systems use these properties to understand what the content is, who wrote it, and when. Missing any of them reduces your citation chances.

20. HowTo schema for procedural content

If you’re explaining how to do something, HowTo schema with explicit steps helps AI systems understand the structure.

Each step becomes a discrete, citable unit. Without the schema, AI has to infer where steps begin and end.


The pattern behind the checks

Looking at these 20 checks, there’s a common thread: AI engines are trying to answer the question “Can I confidently cite this?”

Confidence comes from:

Clarity. The topic is unambiguous. The structure is logical. The entity is identified.

Verifiability. Claims have sources. Statistics have citations. Authors have names.

Freshness. Dates are visible. Sources are recent. Links are active.

Parseability. Schema is present. Markup is valid. Structure is machine-readable.

Pages that fail on any of these dimensions become riskier to cite. And AI systems are trained to be conservative. When in doubt, they cite something else.

The compounding effect

Here’s what I didn’t expect when I started this research: the checks compound.

A page that passes 18 out of 20 checks doesn’t get cited 90% as often as a page that passes all 20. It might get cited 50% as often. Or less.

Each failed check seems to multiply the risk in the AI’s evaluation. Missing your author bio might not matter if everything else is perfect. But missing your author bio plus having stale citations plus lacking a summary block? Now you’re three signals away from being citation-worthy.

This is why partial optimization doesn’t work well. You can’t just add FAQ schema and call it done. The system is looking for consistent signals across all these dimensions.

What I’m doing with this

I’ve started auditing every piece of content I publish against these 20 checks. It takes about 15 minutes per page to do it manually.

The quick wins are adding key takeaways blocks, timestamping statistics, and fixing heading hierarchy. These take five minutes each and have an outsized impact.

The harder fixes involve schema markup, citation structure, and entity disambiguation. These require more technical work but pay off in the long run.

I’m also building tools to automate this. Manually checking 20 criteria across hundreds of pages isn’t sustainable. But the methodology itself is what matters. Once you know what AI engines are looking for, you can optimize for it.

The shift that’s happening

I don’t think AI citation is replacing traditional SEO. But I do think it’s becoming a parallel channel that matters.

When someone asks ChatGPT “what’s the best way to [do thing]” and your content gets cited, that’s a referral you didn’t have to pay for. That’s visibility in a channel growing faster than any of us predicted.

The 20 checks aren’t magic. They’re just observable patterns in what AI engines choose to cite. Following them won’t guarantee anything. But ignoring them means you’re leaving visibility on the table.

And right now, most content on the web ignores them completely. That’s an opportunity for anyone willing to pay attention.


I’m building Genrank to automate these checks across your entire site. If you want to know when we launch, join the waitlist.

Related Articles

AI platforms are answering your customers' questions. Are they mentioning you?

Audit your content for AI visibility and get actionable fixes to improve how AI platforms understand, trust, and reference your pages.