Your robots.txt tells crawlers which paths they may and may not fetch. Most reputable AI and search crawlers publish a named group there (for example GPTBot or Googlebot). Rankly checks every AI-bot and search-engine request against those rules and flags the ones that ignore them.

How it works

1

Fetch your robots.txt

For hosts with AI or search-crawler traffic, Rankly reads your live robots.txt and caches it for a few hours.
2

Match the bot's group

Rankly finds the User-agent group in your robots.txt that matches the crawler making the request.
3

Check the path

Using standard rules (RFC 9309, longest-matching path wins), Rankly checks whether the requested path is disallowed for that crawler.
4

Flag a violation

If the crawler fetched a path it was told not to, the request is flagged as a robots.txt violation. If your robots.txt is missing or can’t be read, no violation is flagged. Rankly is safe by default and never invents a violation.

Where the rules come from

Rankly keeps a curated, regularly synced catalog of how the major AI crawlers identify themselves and whether each one claims to respect robots.txt. That catalog is built from the public ai.robots.txt project plus Rankly’s own Agent Directory, so new crawlers are recognized as they appear.

How it reads in your dashboard

You see, per crawler:
  • Whether it fetched your robots.txt before crawling (a good sign).
  • The count of robots.txt violations, paths it fetched despite being disallowed.
A crawler that never reads your robots.txt and then fetches disallowed paths is worth watching. Pair this with verification to tell whether it’s the real vendor misbehaving or an impersonator.