robots.txt tells crawlers which paths they may and may not fetch. Most
reputable AI and search crawlers publish a named group there (for example
GPTBot or Googlebot). Rankly checks every AI-bot and search-engine request
against those rules and flags the ones that ignore them.
How it works
Fetch your robots.txt
For hosts with AI or search-crawler traffic, Rankly reads your live
robots.txt and caches it for a few hours.Match the bot's group
Rankly finds the
User-agent group in your robots.txt that matches the
crawler making the request.Check the path
Using standard rules (RFC 9309, longest-matching path wins), Rankly checks
whether the requested path is disallowed for that crawler.
Where the rules come from
Rankly keeps a curated, regularly synced catalog of how the major AI crawlers identify themselves and whether each one claims to respectrobots.txt. That
catalog is built from the public
ai.robots.txt project plus
Rankly’s own Agent Directory, so new
crawlers are recognized as they appear.
How it reads in your dashboard
You see, per crawler:- Whether it fetched your
robots.txtbefore crawling (a good sign). - The count of robots.txt violations, paths it fetched despite being disallowed.