Web Development • HIPAA-Compliant Websites • SEO • AI Search Optimization (407) 409-8383   |   [email protected]
AI Search Optimization

LLMs.txt Optimization for AI Discovery

A hand-authored llms.txt at the root of your domain that gives AI crawlers and AI agents a curated map of what your business actually does. And a robots.txt that explicitly welcomes the crawlers that read it.

llms.txt optimization illustration

Overview

Search engines have had robots.txt for thirty years and sitemap.xml for nearly twenty. AI engines, until recently, had to figure out a site by crawling it. llms.txt. Proposed by Jeremy Howard in late 2024 and adopted by a growing number of major documentation, tooling, and content sites. Is the AI-era equivalent: a curated, markdown-formatted file at the root of a domain that says "here's what this site is, here are the pages that matter, in this order, with this context."

Adoption isn't universal yet. Some AI crawlers explicitly look for llms.txt; others don't. But two things are true regardless: the cost of shipping one is essentially zero, and the file is read by AI agents users point at your site (Claude with browsing, custom RAG setups, agent-style research tools) even when no major AI crawler reads it directly.

Our llms.txt service hand-authors the file, ships an AI-aware robots.txt alongside it, and (for content-heavy sites) optionally delivers a longer llms-full.txt with the actual content of priority pages in markdown.

What is llms.txt?

llms.txt is a proposed standard for a markdown file served at the root of a domain (e.g., https://example.com/llms.txt). Its structure is defined: an H1 with the site or project name, a blockquote with a one-sentence summary, optional context paragraphs, then H2 sections containing curated links to the most important pages on the site, each with a short description.

The format is designed to be parseable by humans and large language models alike, prioritizes signal over completeness, and is meant to be hand-curated rather than auto-generated. Think of it as a hand-written README for your entire site, written for an AI to read directly.

How we work

  1. Site taxonomy reviewMap the site's actual content surface. Services, sub-services, hub topics, key content pieces. The llms.txt structure follows this taxonomy, so getting it right matters more than the writing.
  2. Curated section designDecide which sections appear in the llms.txt and in what order. Most sites have 4 to 8 sections (Core Services, Sub-services per area, Resources, About). Each section gets curated links with one-line descriptions.
  3. Authoring and reviewThe file is written by hand, reviewed against the official spec, and validated as parseable markdown. Average length: 60 to 120 lines. Anything longer is usually a sign it should be in llms-full.txt instead.
  4. AI-aware robots.txtUpdated robots.txt explicitly allowing GPTBot, ClaudeBot, anthropic-ai, Google-Extended, PerplexityBot, CCBot, Applebot-Extended, and other AI crawlers. Sites that block AI crawlers by accident are not unusual; we fix that.
  5. Versioning and maintenancellms.txt is committed alongside your site code, gets a version dated in a comment header, and is updated whenever services change. Quarterly review is included in any maintenance retainer.

What this service includes

  • Site taxonomy review and section design
  • Hand-authored llms.txt at /llms.txt
  • Optional llms-full.txt for content-heavy sites
  • AI-aware robots.txt with explicit crawler allows
  • Versioned, comment-headered, parseable file
  • Markdown validation against the llms.txt spec
  • Linked-page reachability and 200-status check
  • Quarterly review included in maintenance retainers

llms.txt vs. sitemap.xml vs. robots.txt

Three discovery files, three different jobs.
robots.txtsitemap.xmlllms.txt
AudienceAll crawlersSearch-engine crawlersAI engines and agents
FormatPlain text rulesXMLMarkdown
PurposeAccess rulesURL inventoryCurated overview
CurationNoneComprehensive (every URL)Highly curated (priority URLs)
Adoption (2025)UniversalUniversalGrowing

Engagement example

A specialty B2B services firm had no llms.txt and a robots.txt that (by oversight) blocked GPTBot and ClaudeBot, which meant the site was effectively invisible to those crawlers' direct access. We hand-authored a 95-line llms.txt covering their service taxonomy, fixed the robots.txt to explicitly allow major AI crawlers, and added a quarterly-review note to their maintenance retainer.

1Hand-authored llms.txt at the root
~15AI crawlers now explicitly allowed
2Crawlers unblocked (was blocked by accident)

Representative engagement. Client identity withheld for privacy.

Frequently asked questions

llms.txt is a proposed standard for a markdown-formatted file at the root of a domain (/llms.txt) that gives AI systems a curated, structured overview of the site's most important content. Analogous to robots.txt or sitemap.xml, but designed to be read by large language models. It was proposed by Jeremy Howard in late 2024 and has been adopted by a growing number of major sites since.

It's evolving. Some AI crawlers and tools explicitly look for llms.txt; others don't yet. The strategic argument for shipping one anyway is that the file costs almost nothing to maintain, may be read directly by AI tools, and absolutely gets read by AI agents users point at your site for due-diligence research. It's a low-cost asymmetric bet.

The standard defines two related files. llms.txt is the short, navigable index. H1 site name, blockquote tagline, and curated link sections to the most important pages. llms-full.txt is the long-form version that may include the actual content of those pages in markdown form. We typically deliver llms.txt and recommend llms-full.txt only for content-heavy sites where it adds real value.

sitemap.xml is a comprehensive list of every URL the site wants crawled. robots.txt is a set of access rules. llms.txt is a curated, hierarchically-organized markdown overview of the site's most important content, with descriptions. Essentially "here's what this site is about and where the canonical pages live, in a format an AI can read directly".

Hand-authored, almost always. The whole point of llms.txt is curation: which pages matter, in what order, with what one-line descriptions. An auto-generated version that lists every URL defeats the purpose. We hand-author the file from your service taxonomy, version it, and update it as services change.

No llms.txt yet? Want to ship one this month?

Send your URL. We'll hand-author an llms.txt against your service taxonomy and ship it alongside an updated robots.txt. Typically inside two weeks.