About "Can I Crawl?"
This tool attempts to answer the question "can I crawl a website?" from the perspective of the robots.txt file on the website. There are many times that I either created a robots.txt file on one of my sites, or needed to read another site's robots.txt file, and wanted something a little more interactive than just a text file. Some robots.txt files are harder to understand because of sheer size, or invalid / mis-used directives, and I wanted a way to cut to the chase and understand the file's directives at a glance instead of having to stare at it and parse it mentally. Again, it's not that the robots.txt standard is particularly difficult to understand, it's just that the files can be somewhat cumbersome and a more straightforward representation would speed things up.
Bottom line, the tool primarily attempts to enhance quick readability of the robots.txt files via color coding, sectioning, hyperlinking URL's from the file, and helping the user understand the meaning of each directive.
Here is a list of features that this tool provides:
Divides files into sections
Since robots.txt files can contain different directives for different user agents, we divide the files by User-Agent directives into sections.
Color coding for each directive
The different directives in a robots.txt file have different meaning, so we color coded each unique directive to help readers skim for relevant sections.
Each row is parsed and described in plain English - just click / tap a row in the file, and the description will slide into view.
It may seem like a simple feature, but it solves one of the biggest frustrations of reading a robots.txt file. This has saved me quite a few clicks!
Toggle between "Interactive" and "Raw" views
When the tool retrieves the robots.txt file, it displays in "Interactive" view, which contains the above mentioned features. There is also a "Raw" view that users can toggle back to if they prefer plain-text viewing.
One more thing...
Just in case you've made it this far into the site and not read our disclaimer, please take a moment to do this. Not only do we not accept liability for your crawling decisions, we also try to advise you to take other concerns into consideration before crawling a site.