Technical diagram showing JSON-LD schema markup and HTML structure patterns that influence AI citation selection Guide

The Technical Blueprint for AI Citation: JSON-LD, Semantic HTML, and What Actually Matters

81% of AI-cited pages use schema markup but content-specific types like FAQPage appear in less than 2%. Discover what JSON-LD implementation and HTML structure actually drive AI citations, and what the industry gets wrong.

Oli Guei

Oli Guei

·
9 min read

Most advice about getting cited by AI boils down to “create great content.” That’s true, but it’s not helpful if you’re a technical SEO or content lead trying to figure out what to actually implement.

The AI systems powering Google’s AI Overviews, ChatGPT, and Perplexity don’t read your page like a human. They parse it like a machine. They’re looking for the fastest, most confident path to a quotable answer. If your content is structurally ambiguous, the AI will move on to a competitor whose page is technically cleaner.

At Genrank, we’ve built an AEO Audit feature that tracks schema presence across cited and non-cited pages. We’ve analyzed the technical markup patterns of content that consistently earns AI citations. What we’ve found is that the technical fundamentals matter but not always in the ways the industry assumes.

The schema question

There’s a stat that gets cited a lot in AEO circles: 81% of AI-cited pages use some form of schema markup [1]. On the surface, this sounds like schema is essential for getting cited.

But that’s not quite what the data shows.

AccuraCast’s research dug deeper into the types of schema present on cited pages [1]. What they found was revealing: the most common schema type was Person (58.9% of cited sources), which helps establish author authority but doesn’t describe the page content itself. Meanwhile, FAQPage schema appeared on only 1.8% of cited sources. Product schema showed up on 6.9%. HowTo and Review schema were present in less than 1%.

The implication is important: having schema correlates with getting cited, but having the “right” content-focused schema types doesn’t seem to matter as much as the industry assumes. What likely matters more is that sites with schema tend to be technically well-maintained overall with good HTML structure, clear hierarchy, fast loading, proper metadata.

Schema is a signal of technical quality, not a magic ticket to citations.

What we check in our AEO Audits

When we audit a page for citation readiness at Genrank, we look at several technical factors. Here’s what actually moves the needle:

Structural schema presence. We check for appropriate JSON-LD schema types (Article, HowTo, FAQPage, Product) depending on the content. Schema markup helps AI understand content type and structure. But we’re checking for appropriateness, not just presence. Adding FAQPage schema to a page that isn’t actually an FAQ doesn’t help.

Valid JSON-LD syntax. This sounds basic, but we see it broken constantly. We validate that the JSON-LD has correct syntax and required fields (@context, @type). Invalid structured data means AI systems can’t interpret your content at all which means you’re invisible to them even if the schema is technically present.

Article schema completeness. For editorial content, we check that Article schema includes the required properties: headline, author, datePublished. This helps AI understand and properly attribute your content. Missing author information is a common gap and it matters because AI systems weight authorship signals heavily when deciding what to trust.

FAQ schema alignment. We check for FAQPage schema when FAQ content is actually detected on the page. The schema should match the content. If you have questions and answers on the page but no FAQ schema, you’re missing an opportunity. If you have FAQ schema but no actual Q&A content, you’re creating a trust problem.

Structure and readability. We check for good content structure: subheadings, lists, short paragraphs. Well-structured content is easier for AI to parse and extract answers from. This is less about schema and more about basic HTML hierarchy.

The HTML structure question

Here’s something we’ve observed but can’t yet quantify with confidence: we haven’t found concrete patterns in HTML structure that reliably correlate with citation success.

That might sound surprising. You’d expect that using semantic elements like <article>, <section>, <dl> for definitions, or <details>/<summary> for Q&A would give you an edge. And they might. But in our data, the signal isn’t strong enough to make definitive claims.

What we can say is that the basics matter: clean heading hierarchy (H1 → H2 → H3, not jumping around), proper use of lists for list content, paragraphs that aren’t walls of text. These aren’t exotic optimizations - they’re just good HTML.

The more important factor seems to be content clarity rather than markup sophistication. A page with basic HTML but a crystal-clear answer in the first paragraph will often beat a page with perfect semantic markup but a buried answer.

The definitive answer block

If there’s one technical pattern that does seem to matter, it’s this: putting a clear, direct answer near the top of the page.

AI engines are trained to extract short, authoritative definitions. If you don’t provide one, the AI will synthesize one from your text which is often poorly, and pulling from the wrong section.

The best practice is straightforward: place a concise answer to the page’s core question in the first 1-2 paragraphs. In our experience, answers between 40-60 words tend to perform well. They’re long enough to be complete, short enough to be extractable. Not a teaser. Not a “in this article we’ll explore…” throat-clearing. The actual answer.

The markup for this can be simple. A clearly labelled section works:

<section aria-label="Definition">
  <h2>What is Answer Engine Optimization?</h2>
  <p>Answer Engine Optimization (AEO) is the practice of optimizing 
  content to appear in AI-generated answers from systems like ChatGPT, 
  Google AI Overviews, and Perplexity. Unlike traditional SEO, which 
  focuses on ranking in search results, AEO focuses on being cited 
  as a trusted source within synthesized answers.</p>
</section>

You can also use a definition list if the content genuinely is a definition:

<dl>
  <dt>Answer Engine Optimization</dt>
  <dd>The practice of optimizing content to appear in AI-generated 
  answers from systems like ChatGPT, Google AI Overviews, and 
  Perplexity...</dd>
</dl>

Both work. The key is that the answer is explicit, early, and unambiguous, not the markup pattern you choose.

JSON-LD implementation that matters

Based on what we see in our audits, here’s what actually matters for JSON-LD implementation:

Get the basics right first. Before worrying about advanced schema types, make sure your Article schema is complete and valid. That means:

{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "The Technical Blueprint for AI Citation",
  "author": {
    "@type": "Person",
    "name": "Your Name",
    "url": "https://yoursite.com/about"
  },
  "datePublished": "2026-01-20",
  "dateModified": "2026-01-20",
  "publisher": {
    "@type": "Organization",
    "name": "Your Company",
    "logo": {
      "@type": "ImageObject",
      "url": "https://yoursite.com/logo.png"
    }
  }
}

Match schema to content type. If your page is a how-to guide, use HowTo schema. If it’s an FAQ, use FAQPage. If it’s a product page, use Product. Don’t add schema types that don’t match what’s actually on the page.

Use the about property for entity alignment. This is underutilized. The about property explicitly tells AI what your content is about, linking it to known entities:

{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "The Technical Blueprint for AI Citation",
  "about": {
    "@type": "Thing",
    "name": "Answer Engine Optimization",
    "sameAs": "https://en.wikipedia.org/wiki/Search_engine_optimization"
  }
}

The sameAs property is particularly useful because it links your content to a known, trusted entity definition, giving the AI a high-confidence signal about what you’re discussing.

Validate before deploying. Use Google’s Rich Results Test or Schema.org’s validator. Broken JSON-LD is worse than no JSON-LD because it signals technical sloppiness to crawlers.

What we’ve learned building Genrank

After analyzing over 500,000 generative queries and more than 1 million citations, here’s what we’ve concluded about technical markup:

Schema helps, but it’s not the primary factor. The AccuraCast data showing 81% of cited pages have schema [1] probably reflects that well-maintained sites tend to have schema not that schema itself drives citations. Focus on schema as part of overall technical quality, not as a silver bullet.

Content clarity beats markup sophistication. A page with a clear answer in plain HTML will often outperform a page with elaborate semantic markup but a buried or ambiguous answer. Structure your content for humans and machines simultaneously because both want the same thing: clarity.

The basics matter most. Valid JSON-LD, complete Article schema, proper heading hierarchy, explicit answers early in the content. These aren’t exciting optimizations, but they’re the ones that consistently correlate with citation success.

Match your schema to your content. Don’t add FAQPage schema to a page that isn’t an FAQ. Don’t add HowTo schema to a page that isn’t a guide. AI systems can detect mismatches, and it hurts your credibility.

The technical side of AEO isn’t about finding clever markup tricks. It’s about removing friction and making it as easy as possible for AI systems to understand what your content is, who wrote it, and what answer it provides.

A practical checklist

Based on our AEO Audit criteria, here’s what to check on any page you want optimized for AI citation:

  1. Is there a clear answer in the first 1-2 paragraphs? Not a teaser - the actual answer.

  2. Is your JSON-LD valid? Run it through a validator. Syntax errors are common and fatal.

  3. Does your Article schema include author, datePublished, and publisher? These are the minimum required fields for proper attribution.

  4. Does your schema type match your content type? Article for articles, HowTo for guides, FAQPage for FAQs, Product for products.

  5. Is your heading hierarchy clean? H1 → H2 → H3 in logical order, not jumping around.

  6. Are lists marked up as lists? Use <ul> or <ol>, not paragraphs with bullet characters.

  7. Is the content structured in short, scannable sections? Walls of text are hard for humans and machines alike.

None of this is exotic. That’s the point. The technical foundation for AI citation is the same as the technical foundation for a well-built website. Get the basics right, and you’re ahead of most of the web.

References

[1] AccuraCast, “Does Schema Markup Increase Generative Search Visibility?” https://www.accuracast.com/articles/optimisation/schema-markup-impact-ai-search/ - Research analyzing 9,000 citations across ChatGPT, Google AI Overviews, and Perplexity, finding that 81% of cited pages use schema markup, but content-specific schema types (FAQPage, HowTo, Review) appear in less than 7% of cited sources.

Related Articles