The easiest way to answer this question is to explain what I use this tool for.
First off, I like to help robots out on my websites by giving instructions out via robots.txt files.
In order for me to be sure that I'm giving the instructions the robots need (even if the instructions are simply to "keep out!"), I like to check my files out using this tool.
Second, if I'm considering crawling a site for some reason, I wrote this tool to make the experience a little more interactive.
The tool saves me time and helps readability of the files, especially the longer files.
On that note, I recommend taking other concerns into consideration before crawling! (read our disclaimer for more on that).
Can I crawl a website if this site says it's OK?
As with many things in life, it's not that easy.
While we do our best to help you interpret a robots.txt file, this is not the only consideration for crawling a site.
We have a nice list of other considerations that we thought of on our disclaimer page if you'd like to read up on the topic.
What's a robots.txt file anyways?
A robots.txt file contains instructions for robots to use when crawling your website, and is defined by the Robots Exclusion Standard.
Note that the standard optional for robots, and in fact may be used by malicious robots to find out details about a site to gain access to information on a site.
While the standard cannot be enforced, it is obeyed by most major search engine web crawlers.
What's a user agent and why are they mentioned in robots.txt files?
User agents are a string of text that clients (whether browser or bot) use to identify themselves to a website.
Robots.txt files contain "User-Agent" directives that allow the website to specify specific instructions for specific user agents.
For example, you may want to give Googlebot one set of instructions of pages to crawl or avoid, and BingBot a totally different set of instructions.
You would differentiate in the robots.txt file by using different "User-Agent" directives.