Tabla de contenidos

What are meta robots in SEO? First episode of a series of posts and videos that I will organize in the category of: SEO Guides. Basic manuals to go deeper.

In the headers of the web, we usually put very relevant information for search engines, invisible to the user. Meta robots are a tag in HTML that gives an instruction to search engines.

This is usually the best way to control the behavior of each URL.

Meta robots are a classic of technical SEO, one of the first things to learn when you start in this world.

It is a very simple and basic subject but we must be very clear about the concepts if we do not want to make mistakes. Directly by your actions on the web or by any modification you make in any of the SEO plugins that we usually install in WordPress.

As Fernando Maciá points out in his digital marketing dictionary on his Human Level website:

“Meta robots allows you to control how a page should be indexed and how it is displayed to users on the search results page.”

Fernando Macia

It doesn’t get any clearer than that.

Also, as you will see, in robots.txt we block a URL completely, while with meta robots we can have a URL that is still passing link juice or popularity, but that we decide not to show up in Google’s indexes.

Tag syntax

Very simple: and these are the options we can define:

noindex, follow

<meta name="robots" content="noindex, follow"/>

In this case with the noindex we tell search engines NOT to index this content but you canfollow the links.

By following the links we maintain the link transfer and associated popularity juice.

This is the most typical solution when you want to avoid indexing a URL that may be considered as thin content or duplicate content from other sections of your website.

Very common in search results, which generates a change in the URL with the search term. In label files, author, etc.

index, nofollow

<meta name="robots" content="index, nofollow"/>

In this case the opposite is true, we tell you that you can index this URL but do NOT follow the links, therefore they will not transmit their value in the usual way.

It is the ideal combination when you do not back up the links of a specific URL, imagine pages created by users, for example in a forum.

noindex, nofollow

<meta name="robots" content="noindex, nofollow"/>

We avoid indexing and tracking links. It is a form of total blocking of that URL. Its use is not very common.

Index, follow

There is a fourth tag which is index, follow but this tag is not necessary to put it because it is the normal behavior, in which a URL is identified, the links are followed and the content is indexed in search engines.

However, I think it is positive to mark it explicitly, something that some technologies do not do by default, such as Shopify.

Is there a difference between robots.txt and meta robots at the crawling level?

Of course, remember that robots.txt is usually one of the first files that search engines will check.

If you mark a disallow to a directory within that file, in principle Google will not waste time crawling that directory, whereas if it arrives at a URL with the noindex tag, it does crawl.

In addition, with the robots.txt we can define patterns (imagine blocking directories or subsets of information) while the robots meta tag goes in each URL.

What should we take into account about these two ways of controlling crawling and indexing?

For example, if we block a directory in the robots.txt, Google will not waste time crawling that section. But if we receive an external link, it is quite likely that it will crawl and even index that section, thus ignoring the instruction we give it.

It is therefore important to leave in the meta robots the directives that we want, in a way to control the final indexing that Google makes of our website.

Other directives for meta robots

We can use more elements, some examples:

archive / noarchive: whether or not to store the web content in the internal cache.
noimageindex: not to index the images of the page,

And some other examples, but with less frequent uses that Google makes available to us in its help page for developers.

How else can we control indexing?

Canonical label

As you can imagine, it has nothing to do with the Catholic Church, canonizations or saints of any kind.

In 2009, Google, Yahoo and Microsoft agreed and designed a tag that would allow them to simplify the problem of duplicate content.

From an SEO point of view canonical tags are similar to 301. I explain:

Imagine that we have a URL:

«https://meloquitandelasmanos.com/zapatochulo/»

And this is an online store that creates a variable in the URL if you change an attribute of the product, for example the color, then the URL would be:

«https://meloquitandelasmanos.com/zapatochulo#rojo/»

Or even, making a mailing or creating an advertising campaign on Facebook for example, leaving the URL with the typical tags that allow you to analyze the campaigns, for example:

«https://meloquitandelasmanos.com/zapatochulo?utm-source=facebook-campaigns=zapatos-verano/»

For practical purposes search engines may interpret these URLs as duplicate content because it treats them as different URLs with the same content.

To do this, we only need to put a tag in the header that indicates the original rel=”canonical” and the other variations, the search engines will understand that it is the same page.

Note, as you know and I have already mentioned, that sometimes search engines can be “capricious” so the canonical tags are taken only as a suggestion, not as a directive.

If we are in WordPress and you have an SEO plugin, you can rest assured that by default SEO plugins usually add the canonical tag automatically.

To check it, just right-click and view the source code of the page. Look for the following tag in the header of your website:

<link rel="canonical" href="https://wajari.com/" />

You can also download some of these browser extensions. Just by clicking on it, you will see all this information:

Curious? Interesting? I hope this was the case.

This is a short but basic chapter that covers some basic principles of crawling and indexing that have a lot of relevance in the SEO of our web pages.

So you know what it means when we configure these options in any of the SEO plugins. Live long and prosper!