Take control of how the pages on your site are (or aren’t) accessed by search engines by creating a robots.txt file. The goal is to block certain pages or parts of your site from search engine crawlers, which may seem counterintuitive — you want search engines to find your web pages, right? — but it’s a solid SEO tactic.
Here’s a quick guide on how and when to create a robots.txt file, and why it’s so important to get it exactly right.
How the robots.txt file works.
Search engines send “spiders” to gather information from your site. This information is used to index your web pages in search results where users can find them. These spiders are also referred to as robots.
When search engine spiders visit your site, the robots.txt file tells them which sections or pages on your site that they shouldn’t crawl. Some of the types of content you may not want spiders to crawl include these examples:
- Duplicate content: A printer-friendly version of a page or a copy of a manufacturer’s description of a product you sell.
- Non-public pages: A development or staging site.
- Pages along the user path: A page displayed to site visitors in response to actions they’ve taken, such as a thank-you page.
Creating a robots.txt file.
You can find your robots.txt file in your site’s root directory. Google Search Console has step-by-step instructions for creating or editing an existing robots.txt file here, which includes a tool for testing your file to verify that your URL has been properly blocked. Once you’re done, make yourself a note on your calendar to regularly review the file to make sure that, as your site changes in any way, it remains properly configured.
Also keep a few things in mind:
- Be very careful—there is the chance that you could disallow access to your entire website with just a few minor mistakes.
- According to Google in 2014, “Disallowing crawling of Javascript or CSS files in your site’s robots.txt directly harms how well our algorithms render and index your content and can result in suboptimal rankings.” Leave Javascript and CSS files unblocked—if you have a WordPress site, these files can be found in /wp-admin/, so be sure not to disallow that folder.
- Important: Configuring a robots.txt file does not ensure that the blocked URLs won’t become indexed. This could happen if the search engines discover the URL in a different way—generally by following a link to the disallowed page. Using the noindex meta tag is the only foolproof way to prevent the page from being indexed.
Search engines will still have the ability to crawl your site without a robots.txt file, but taking the time to make one supports good SEO results and lets you more tightly control how your site is accessed and crawled.
Susan Sisler
Latest posts by Susan Sisler (see all)
- Noindex Meta Tags vs. Robots.txt: Which Should You Use? - July 13, 2017
- Improve Your SEO Results With a Robots.txt File - April 6, 2017
- Ten Fun and Creative Custom 404 Pages - April 5, 2016
Leave a Reply