The Technical AEO Checklist: Schema, llms.txt & Structured Data for 2026

  • Author
    Neha Garg
  • Publish
    June 15, 2026 12:24 pm
  • Read Time
    8 Min
Technical AEO Checklist

TABLE OF CONTENTS

    AI answer engines ChatGPT, Gemini, Perplexity are now where your buyers research vendors. If a machine cannot read your page cleanly, it quotes a competitor instead. This checklist covers the nine technical fixes that determine whether AI engines can find, parse and cite your site: opening your robots.txt to AI crawlers, serving real HTML instead of JavaScript shells, adding JSON-LD schema, writing answers before context, staying indexed in Bing, building entity signals, keeping pages fast and publishing an llms.txt. Fix the first three and you close most of the gap. The rest is finishing work.

    Search has split into two jobs. People still type queries into Google, but more of them now ask ChatGPT, Gemini, Perplexity or Claude and read the answer without clicking anything. For a business in India, that shift is not theoretical. ChatGPT crossed 180 million monthly users here at the start of 2026 and Gemini was close behind at 118 million. India also reports the highest AI adoption rate of any major market at 59 percent. The buyers you want are already asking machines first.

    Most AI-visibility problems are not content problems. They are technical. A page can be well-written and still be invisible to ChatGPT, Perplexity or Google AI Overviews because a crawler cannot reach it, cannot parse it, or cannot trust what it finds. This checklist walks your dev team through each fix, ordered by impact, so if you only ship the first few items you still close most of the gap.

    TL;DR

    AEO (Answer Engine Optimisation) is the practice of structuring your site so AI engines can extract, verify and cite your content. The technical layer is the foundation. Get it wrong and nothing else you do matters. Schema and structured data carry most of the weight; answer-first content and llms.txt complete the picture.

    Why Indian businesses should care right now

    AI search referrals are growing 130 to 150 percent year over year as of early 2026. Google AI Overviews now appear on more than 40 percent of queries. Technology, SaaS and finance lead this shift; local service businesses trail behind, which means early movers in those categories have real room to win.

    Picture how a buyer researches a vendor today. They open ChatGPT, ask for the top NetSuite partners in India and read a short list with reasons. They may never visit a search results page. A NetSuite partner in Mumbai or a fintech in Bengaluru that gets cited by ChatGPT today builds an advantage that is hard to copy later. Schema and clean structure are how you get into those answers.

     

    1. Open the right crawlers in robots.txt

    Start here, because if the bots cannot fetch you, nothing else counts. Check your robots.txt for accidental blocks on these agents:

    • OpenAI: GPTBot, OAI-SearchBot, ChatGPT-User
    • Anthropic: ClaudeBot, Claude-SearchBot
    • Perplexity: PerplexityBot
    • Google: Google-Extended (controls training use)

    A useful distinction: training crawlers (GPTBot, ClaudeBot) feed model training, while search crawlers (OAI-SearchBot, Claude-SearchBot, PerplexityBot) fetch content to answer live user questions. If you want to appear in AI answers but not contribute to model training, you can allow the search bots and block the training ones. Decide on purpose do not let a default security rule decide for you.

    Pro tip: Run a quick grep on your robots.txt for ‘Disallow: /’ with no user-agent filter. That single line blocks every bot including AI crawlers. It is the most common reason a well-built site never appears in AI answers.

    2. Serve your content in HTML, not only JavaScript

    This is the single biggest technical failure teams overlook. As of mid-2026, none of the major AI crawlers execute JavaScript not GPTBot, OAI-SearchBot, ClaudeBot or PerplexityBot. They read your raw HTML and leave. Google’s Gemini is the main exception, since it borrows Googlebot’s rendering pipeline.

    If your site is a client-rendered React, Vue or Angular app, AI crawlers often see an empty shell a div tag and some script references. Your content never loads for them. The fix is to render your important content server-side (SSR) or with static generation (SSG) so the words exist in the initial HTML response. This is the highest-leverage change on the list after crawler access.

    What to check

    View source on your homepage in a browser (Ctrl+U). If the body contains only script tags and an empty root div, AI crawlers see the same empty page. SSR or SSG puts real text in that initial response.

    3. Ship clean JSON-LD structured data

    Schema markup is code, usually written in JSON-LD, that labels what your content means. It tells an answer engine that a block of text is a question, a price, an author or a business. Pages with FAQPage markup are around 3.2 times more likely to appear in Google AI Overviews. Sites with complete, accurate structured data get cited noticeably more often by one analysis, up to 3.1 times more.

    Use JSON-LD, the format the industry standardised on (around 90 percent share). It sits in a script tag, decoupled from your visible HTML, so bots extract it cleanly. Ship these types as your baseline:

    • FAQPage and QAPage: Pair clear questions with short direct answers. These feed AI Overviews, voice results and chat answers.
    • Article and BlogPosting: Name the author, the publish date and the publisher. This builds the attribution chain that E-E-A-T depends on.
    • Organization and Person: Define your brand and your experts as real entities with consistent details across the web.
    • Product and Offer: Expose price, availability and reviews so commerce answers stay accurate.
    • HowTo: Lay out steps in order so an engine can repeat instructions without jumbling them.
    • BreadcrumbList: Show where a page sits in your site so engines understand your structure.

    TL;DR

    Schema does not boost rankings on its own. It makes your content machine-readable, which raises your odds of being the page an AI quotes. Start with FAQPage, Article, Organization and Product.

    4. Match schema to what is actually on the page

    AI engines now cross-check your markup against your visible content. Schema that claims a price, rating or fact the page does not show can cause the whole page to be skipped. Two rules apply:

    • Only mark up what a user can actually see on the page.
    • Validate every change. A small JSON-LD syntax error can make a bot ignore the entire page.

    Run pages through Google’s Rich Results Test and the Schema.org validator before shipping. A practical example: a SaaS firm publishes a pricing page in plain prose with no markup. An answer engine has to infer the price, the plan names and the currency from sentences. Add Product and Offer schema and the same engine reads the price as a labelled value it can quote with confidence. Same content, very different machine confidence.

     

    5. Write answers first – the step most teams skip

    Schema labels your content. Formatting decides whether the content is worth lifting. The pattern that works in 2026 is straightforward: answer the question in the first one or two sentences of every section, in about 40 to 60 words, then add the detail below.

    Think about how an engine reads a page. It scans for a clean, self-contained answer it can quote. A section that opens with three paragraphs of background gives it nothing to grab. A section that opens with a direct answer gives it exactly what it needs. This habit helps human readers too, since a clear opening respects their time and reduces bounce.

    Pro tip: Write each H2 as a real question your buyer asks, then answer it in the first sentence. ‘What does NetSuite implementation cost in India?’ beats ‘Pricing considerations.’ The question matches how people speak to AI, and the answer sits ready to be quoted. This single habit can double your AI citation rate on existing content.

    6. Keep a clean sitemap and stay indexable in Bing

    Two frequently missed signals:

    • xml:should list your important URLs and stay current. It helps crawlers prioritise what to fetch and tells them when content was last updated.
    • Bing indexability:matters more than most teams expect. Roughly 92 percent of ChatGPT’s web-search answers draw on Bing’s index. If you are not in Bing, you are missing from a huge share of ChatGPT answers. Verify your site in Bing Webmaster Tools and resolve any crawl errors there.

     

    7. Strengthen entity and ‘about’ signals

    AI engines want to know what and who your brand is, with confidence. Help them by treating your organization as a defined entity:

    • Use sameAs in your Organization schema to link your verified profiles LinkedIn, Crunchbase and official social accounts.
    • Keep your name, category and description consistent across your site and third-party listings.
    • Maintain a clear About page that states plainly what you do and who you serve.
    • Link an author Person entity to each article and a publisher Organization to your site.

    Consistent entity signals reduce the chance an AI describes you incorrectly or confuses you with a similarly named brand. For Indian B2B firms, this is especially important: a Bengaluru fintech and a Singapore firm with similar names can look identical to a model that lacks clear sameAs anchors.

     

    8. Watch page speed and crawler timeouts

    AI crawlers are impatient. Many use tight timeouts of roughly one to five seconds and will not wait for a slow page. If your server is slow or your critical content loads late, the bot may skip you entirely. Treat Core Web Vitals, server response time and overall page speed optimisation as AEO signals, not only UX metrics.

     

    9. Add llms.txt with realistic expectations

    llms.txt is a plain text file you place at the root of your domain. It links to your most important pages with short descriptions so AI agents can find them quickly. Jeremy Howard of Answer.AI proposed it in late 2024 and it sits alongside robots.txt rather than replacing it.

    Here is the honest picture as of May 2026. Adoption sits near 10 percent of sites. Crawler logs show that the major AI crawlers including GPTBot, ClaudeBot and PerplexityBot, mostly skip it and read your HTML directly. Google has compared it to the old meta keywords tag. But the cost is half a day of work and the upside is genuine optionality. Coding agents like Cursor, Claude Code and GitHub Copilot already fetch llms.txt when pointed at documentation.

    • Does: Give a clean map of your key pages for agents that choose to read it.
    • Does: Help coding and documentation agents route to the right content.
    • Does not: Improve your Google rankings.
    • Does not: Force any AI crawler to use it.
    • Does not: Replace robots.txt, which still controls real access.

    Pro tip: Generate llms.txt from a single source of truth and rebuild it on every deploy. A stale file pointing to dead pages is worse than no file at all.

    The complete technical AEO checklist for 2026

    Ordered by impact. Ship the first three and you close most of the gap before you touch anything else.

    1. Crawlers: Audit robots.txt. Allow OAI-SearchBot, Claude-SearchBot, PerplexityBot. Decide intentionally on training bots.
    2. Rendering: Confirm key pages serve real text in initial HTML. Add SSR or SSG to any page your SEO cares about.
    3. Schema:Run an SEO audit of your top 20 pages with the Rich Results Test. Add JSON-LD for FAQPage, Article, Organization and Product where each fits.
    4. Schema accuracy: Make sure on-page questions and answers match your schema word for word. Validate before shipping.
    5. Answer-first writing: Rewrite every key section to open with a 40 to 60 word direct answer. Reframe H2 headings as questions.
    6. Sitemap and Bing: Keep sitemap.xml current. Verify your site in Bing Webmaster Tools and resolve crawl errors.
    7. Entity signals: Add sameAs links to Organization schema. Build a clear About page. Link author Person entities to articles.
    8. Speed: Bring TTFB under 800ms on key pages. Fix Core Web Vitals issues that cause crawler timeouts.
    9. txt: Publish a file listing priority pages with short descriptions. Regenerate it on every deploy.
    10. Measurement:Track AI Overview impressions in Search Console. Use LLM tracking tools to run your target questions through ChatGPT, Gemini and Perplexity monthly and check citations.

     

    Schema vs llms.txt: where to spend effort first

    Both get lumped together as AI optimisation. They solve different problems, so the order matters.

    • Schema and structured data: Proven, read by every major engine, directly tied to AI citations. Most of your effort belongs here.
    • Answer-first content: Equally important and free to implement. Pair it with schema work.
    • txt: Low cost and low yield today, with genuine upside later. Worth shipping, not worth obsessing over.

    If you have limited hours this quarter, the order is clear: fix schema, rewrite your answers, then add llms.txt as a finishing touch.

     

    Frequently asked questions

    Does schema markup directly improve Google rankings?

    No. Google has confirmed schema is not a ranking factor. What it does is make your content eligible for rich results and far easier for AI engines to read and cite. The ranking benefit is indirect through higher click-through rates and stronger AI visibility.

     

    Why isn’t my well-built site appearing in ChatGPT answers?

    The two most common technical causes are a robots.txt block on AI crawlers and client-side JavaScript rendering that leaves AI bots an empty page. Check both first. Bing indexability is a frequent third cause, since roughly 92 percent of ChatGPT search answers draw on Bing’s index. Verify your site in Bing Webmaster Tools.

     

    Which schema type matters most for AI search?

    FAQPage tends to give the best return because it pairs questions with short answers that engines can quote directly. Article and Organization schema come next, since they build the author and brand trust that E-E-A-T relies on. For commerce pages, Product and Offer schema is essential.

     

    Is llms.txt worth adding in 2026?

    Yes, with realistic expectations. Most AI crawlers do not read it yet and it will not lift your traffic this quarter. But it costs little to set up and coding agents already use it, so it is sensible insurance for the agentic web that is forming. Ship it after items 1 through 8 are solid.

     

    How often should we audit our structured data?

    Quarterly is a sensible cadence. Schema decays as your content and products change, and stale or mismatched markup can cause pages to be skipped. Re-validate after any significant template or content change.

     

    How do I measure whether AEO work is paying off?

    Track AI-specific signals, not only keyword positions. Watch AI Overview impressions in Search Console, run your target questions through ChatGPT, Gemini and Perplexity to see whether you are cited, and monitor referral traffic from AI sources. Citations are the new rankings.

     

    The Takeaway

    Technical AEO is not a rebrand of SEO. It is the layer that decides whether a machine can read your work and trust it enough to repeat it. Schema and structured data carry most of the weight. Answer-first writing makes your content easy to lift. llms.txt is a small, smart bet on where search is heading.

    The brands that win AI visibility in India will not be the loudest. They will be the clearest. AEO success is mostly unglamorous engineering: open the right crawlers, serve real HTML, ship accurate JSON-LD, stay in Bing, write answers first and keep your pages fast. Do those well and you have solved most of the visibility problem before you touch a single trendy tactic.

    At White Bunnie we treat that clarity as an engineering problem our AI SEO work is clean schema, sharp answers and a site an answer engine can navigate without friction. Work through the checklist above, fix the gaps and you give every major engine a reason to quote you instead of the next result.


    RELATED ARTICLES

    Let's Build Something Remarkable Together

    You know the potential your business has. We're here to help more people see it, trust it, and choose it. Together, we'll turn visibility into growth and growth into lasting success.

    get-touch

    Get In Touch

    One form. Endless growth possibilities







      Ask AI about White Bunnie
      whatsapp
      Scroll to Top