Table of Contents
Whether you like it or not, your digital presence is controlled by robots. They crawl the web looking for content and when they find it, they read, classify, and make it easier for humans to discover. Keep the web crawlers happy with the robots meta tag and they will reward you by making your website easier to find. Anger them, and they'll make sure nobody sees your website.
In this article, we are going to look at how to make the web crawling robots happy. Our primary focus will be examining the robots meta tag and the effects it has on our friendly neighborhood crawlers.
The Robots Meta Tag
The Internet is a marvelous thing. Behind every beautiful website that a human sees, a computer (or robot, or crawler) sees something else entirely. This can include additional information, guidance on what type of content is being presented, additional formats the content is available in, and much more. Every HTML webpage is divided into two distinct sections. The head which contains information about the webpage you are viewing, and the body which contains the presentation or content you are viewing.
Within the head of every HTML page is a plethora of data that tells robots additional information about the page that may not be useful to a human reader. One such piece of information is a meta tag called robots that instructs search engine web crawlers on whether or not to index that particular page.
The robots meta tag in most websites will look something along the lines of
<meta name="robots" content="index, follow">. Have you ever wondered what that actually meant and if there were any other options that you could put in there?
Do You Need the Robots Meta Tag?
Technically, no. If your website omits the robots meta tag, then search engine crawlers will take this as an invitation to crawl and index all the content on your website. Different search engines will handle how they crawl and what they index in their own different and unique ways, so while the tag is not technically required, it is good to have it.
In the next section, we’ll take a look at the different options and parameters we have when it comes to working with the robots meta tag.
Robots Meta Tag Options
The robots meta tag has a variety of options or parameters that you can pass into it depending on the desired outcome. In this section, we’ll take a look at some of these parameters as well as common configurations and the effects they’ll have on search engine crawlers or robots that stumble upon your website.
index - this tells the search engine crawler that the page should be indexed
noindex - the opposite of index, will tell the search engine crawler to not index the page
follow - tells the crawler to visit any links it finds on the page
nofollow - the opposite of follow, tells the crawler that it should not visit any links it finds on the page
nosnippet - tells the search engine crawler to not display the description of the particular page in its search results
noarchive - tells the search engine crawler that it should not cache the page
noimageindex - tells the search engine crawler that it should not index any of the images it finds on the page
all - this is a shortcut for index, follow and is assumed to be the default if no specific robots meta tag is provided
none - this is a shortcut for noindex, nofollow which means the page would not be indexed by search engine crawlers and the crawler would not go to any of the links it finds on the page
There are additional parameters such as notranslate, unavailable_after, and others that have varying levels of support by the various search engines, so we won’t dwell too much on them. Out of all the parameters, the four most important ones to consider and the ones that all search engines are in alignment with are index, noindex, follow, and nofollow. Let’s take a look at common configurations and their effects.
Anatomy of the Robots Meta Tag
The robots meta tag is a simple HTML
meta element with two properties: name and content. It looks like this:
<meta name="robots" content="index">
The name property is very often set to robots and by setting it as such, you are addressing all of the various search engine web crawlers. You can get more specific though. For example, if you wanted just the Google search engine crawler to index your pages and no others, instead of robots you could set the name property to googlebot. In that case, your meta tag would look like:
<meta name="googlebot" content="index">
The content property is where you pass in one or many parameters that will tell the search engine robot how to behave. If you are passing multiple parameters, you would do so by separating each with a comma such as index, follow, noarchive. The order of the parameters doesn’t matter, but you should take care not to put opposing parameters such as index, noindex.
Index, Follow - Best for SEO
The index, follow configuration for the robots meta tag is the most common and widely used for websites that wish to be indexed. The tag in its entirety looks like this:
<meta name="robots" content="index, follow">
This configuration will tell the search engine crawler to index the current page, as well as click through to any of the links it finds on the page.
The index, nofollow configuration will tell the search engine crawler to index the current page that it is on, but to not click through to any of the links it finds on the page. The configuration looks like:
<meta name="robots" content="index, nofollow">
You would typically use this configuration on pages that may be linking to privileged or paid content. Additionally, if the links on the page add no additional added value, it may make sense to use the nofollow attribute.
If you don’t want your website indexed at all you would use the noindex, nofollow configuration that looks like:
<meta name="robots" content="noindex, nofollow">
Why in the world would you want that though? Good question. One of the common reasons why you may elect this configuration is if you have multiple versions of your website. For example, a production and testing version. You would not want the test version of the site to be indexed as it would contain duplicated information.
Finally, we have the noindex, follow configuration that looks like:
<meta name="robots" content="noindex, follow">
You may use this particular configuration on pages that don’t provide any significant value to a reader, but the links contained within it do.
Search Engine Crawlers Are Complex
In this article, we shared some common practices and ways to utilize the robots meta tag on your website. One important takeaway is that the meta tag tells the search engine crawlers how they should behave and for the most part they will listen. The caveat though is that the search engine crawlers are not technically bound by the content in the robots meta tag and can behave however they want.
Not having a robots meta tag means that when a search engine crawler stumbles upon your website, it will index it and follow all the links within, but this behavior is not guaranteed so when in doubt, you can’t go wrong with
<meta name="robots" content="index, follow">.