{"id":10601,"date":"2023-08-22T10:47:40","date_gmt":"2023-08-22T08:47:40","guid":{"rendered":"https:\/\/wajari.com\/blog\/robots-txt\/"},"modified":"2023-08-22T13:12:52","modified_gmt":"2023-08-22T11:12:52","slug":"robots","status":"publish","type":"post","link":"https:\/\/wajari.com\/en\/blog\/robots\/","title":{"rendered":"Robots.txt"},"content":{"rendered":"<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_82_2 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Tabla de contenidos<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/wajari.com\/en\/blog\/robots\/#What_is_robotstxt\" >What is robots.txt?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/wajari.com\/en\/blog\/robots\/#Why_is_robotstxt_used\" >Why is robots.txt used?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/wajari.com\/en\/blog\/robots\/#How_to_create_a_robotstxt_file\" >How to create a robots.txt file?<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/wajari.com\/en\/blog\/robots\/#Rules_and_syntax_of_robotstxt\" >Rules and syntax of robots.txt<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/wajari.com\/en\/blog\/robots\/#What_happens_to_the_default_robotstxt_in_WordPress\" >What happens to the default robots.txt in WordPress?<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/wajari.com\/en\/blog\/robots\/#Should_we_use_the_default_robotstxt\" >Should we use the default robots.txt?<\/a><\/li><\/ul><\/li><\/ul><\/nav><\/div>\n<div id=\"rank-math-faq\" class=\"rank-math-block\">\n<div class=\"rank-math-list \">\n<div id=\"faq-question-1692692814776\" class=\"rank-math-list-item\">\n<h2 class=\"rank-math-question \"><span class=\"ez-toc-section\" id=\"What_is_robotstxt\"><\/span>What is robots.txt?  <span class=\"ez-toc-section-end\"><\/span><\/h2>\n<div class=\"rank-math-answer \">\n\n<p>The robots.txt is a text file located in the root directory of a website. It is used to tell search engine robots which pages they should not crawl.<\/p>\n\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n\n\n<div style=\"height:30px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Why_is_robotstxt_used\"><\/span>Why is robots.txt used?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>There are several reasons why you should use <strong>robots.txt.<\/strong> The most common are:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>To prevent search engines from indexing pages that are not finished or are not ready to be indexed.<\/li>\n\n\n\n<li>To protect sensitive areas of the website, such as the administrator control panel.<\/li>\n\n\n\n<li>Help search engines NOT to crawl pages that are not important.<\/li>\n<\/ul>\n\n\n\n<div style=\"height:30px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"How_to_create_a_robotstxt_file\"><\/span>How to create a robots.txt file?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Creating a <strong>robots.txt<\/strong> file is very simple. You can create it with any text editor.  <\/p>\n\n\n\n<p>The robots.txt file is a plain text file, so you do not need to use any special programming language.<\/p>\n\n\n\n<p>The content of the <strong>robots.txt<\/strong> file is very simple.<\/p>\n\n\n\n<p>It is useful to &#8220;block&#8221; certain areas or files of your website to search engines to avoid indexing duplicate content.<\/p>\n\n\n\n<p>I always put in quotation marks: <strong>blocking or control<\/strong>, because, it does not always work as we expect.  <\/p>\n\n\n\n<p>We must use other methods such as <a href=\"https:\/\/wajari.com\/en\/blog\/meta-robots\/\" data-type=\"post\" data-id=\"10570\">meta robots<\/a> in the headers.<\/p>\n\n\n\n<figure class=\"wp-block-embed aligncenter is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"Robots.txt en SEO\" width=\"1200\" height=\"675\" src=\"https:\/\/www.youtube.com\/embed\/oASa56fu-4Y?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<div style=\"height:30px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Rules_and_syntax_of_robotstxt\"><\/span>Rules and syntax of robots.txt<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>The <strong>robots.txt<\/strong> has some simple rules but they can induce us to certain errors:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>The name of the robot<em>(user-agent<\/em>) and the action must be entered.<\/li>\n\n\n\n<li>The action can be of two types: <em>disallow and allow<\/em>.<\/li>\n\n\n\n<li>What is really important is the disallow. The allow is an exception within the disallow. It makes no sense to say allow everything because that is the default behavior of a search engine, to crawl everything (good advice is to be minimalist).  <\/li>\n\n\n\n<li>It is a text file (txt) not HTML.<\/li>\n\n\n\n<li>Always lowercase<\/li>\n\n\n\n<li>There may be empty lines between the different agents, but there should not be between the guidelines.<\/li>\n\n\n\n<li>We can put comments with the hash (#) as it will be ignored by search engines.<\/li>\n\n\n\n<li>It is highly recommended to place your sitemap.xml<\/li>\n<\/ol>\n\n\n\n<p>Some examples and let&#8217;s read it by lines:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>User-agent: Googlebot \nDisallow: \n\nUser-agent: Bingbot \nDisallow: \/ \n\n# Bloqueamos a los robots de este directorio \nUser-agent: * \nDisallow: \/cosasprohibidas\/<\/code><\/pre>\n\n\n\n<p>In the above example we do not block anything to <strong>Google <\/strong>(the disallow is empty) and we block the whole web to <strong>Bing <\/strong>by using (\/) which takes us to the root folder of the web.<\/p>\n\n\n\n<p>In the third guideline, as the commentary points out, we block all agents from the directory: \/prohibitedthings\/.<\/p>\n\n\n\n<p>When we want to address all the robots, we use the asterisk ( * )<\/p>\n\n\n\n<p>There are other guidelines such as <em>crawl delay<\/em>. You can see it in the following example:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>User-agent: Bingbot \nCrawl-delay: 5<\/code><\/pre>\n\n\n\n<p>The <strong>crawl-delay<\/strong> works as a delay in seconds to avoid overloading the server requests.<\/p>\n\n\n\n<p>It is something that is unnecessary for most websites. I see sense for large websites or media with a lot of traffic.<\/p>\n\n\n\n<p>The <strong>robots.txt<\/strong> accepts regular patterns or <strong>wildcards<\/strong>, something very useful if we want to block certain directories of our website.<\/p>\n\n\n\n<p>Example: the asterisk ( * ) to block directories starting with the same word: \/folder*\/ and will block all directories: folder1, folder2, etc.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>User-agent: Googlebot \nDisallow: \/carpeta*\/<\/code><\/pre>\n\n\n\n<p>The <strong>dollar sign<\/strong> at the end of the URL if we want to block for example an extension (such as a pdf or gif, for example we put \/* .pdf$ ).<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>User-agent: Bingbot\nDisallow: \/*.pdf$<\/code><\/pre>\n\n\n\n<div style=\"height:30px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"What_happens_to_the_default_robotstxt_in_WordPress\"><\/span>What happens to the default robots.txt in WordPress?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p><strong>WordPress default robots.txt<\/strong> is good because:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>It does not block any web frontend resource (the public part).<\/li>\n\n\n\n<li>Blocks all backend resources (administrative part of the web) with one exception:<\/li>\n\n\n\n<li>The <strong>admin-ajax.php<\/strong> which provides support for plugins and themes and can be used in the public part of the website.<\/li>\n<\/ol>\n\n\n\n<pre class=\"wp-block-code\"><code>User-agent: * \nDisallow: \/wp-admin\/ \nAllow: \/wp-admin\/admin-ajax.php<\/code><\/pre>\n\n\n\n<p>The WordPress default robots.txt example also teaches us an important factor of syntax. By &#8220;<strong>allow<\/strong>&#8221; overwriting a previous directive, we block everything from <em>wp-admin<\/em>, except: <em>admin-ajax.php<\/em>.<\/p>\n\n\n\n<p>A curious detail. WP takes care of creating a virtual robots.txt.  <\/p>\n\n\n\n<p>The file does not really exist in the <em>public_html <\/em>where you usually upload all the web files. Until you upload a file, it is virtual. Curious, isn&#8217;t it?<\/p>\n\n\n\n<div style=\"height:30px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Should_we_use_the_default_robotstxt\"><\/span>Should we use the default robots.txt?<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>Why not?  <\/p>\n\n\n\n<p>What should be added is the <strong>sitemap<\/strong> and all SEO plugins add it automatically.<\/p>\n\n\n\n<p>I believe that less is more. It is good to be minimalist in this file, search engines tend to check it frequently. Think carefully if you want to block something and why.<\/p>\n\n\n\n<p>I usually block the <em>feeds <\/em>so that they are not appearing in the <strong>Search Console<\/strong> reports, but I tend to put few specifications. In this example, I show you my robots.txt:  <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>User-agent: *\nDisallow: \/wp-admin\/\nAllow: \/wp-admin\/admin-ajax.php\n\nDisallow: \/wp-includes\/\nAllow: \/wp-includes\/*.js\nAllow: \/wp-includes\/*.css\n\nDisallow: *trackback\nDisallow: \/feed\/\nDisallow: *\/feed\/\n\nSitemap: https:\/\/wajari.com\/sitemap_index.xml <\/code><\/pre>\n\n\n\n<p>As you can see, I don&#8217;t tend to put a lot of &#8220;milongas&#8221;.<\/p>\n\n\n\n<p>I do not see it necessary for the vast majority of websites and we must not forget that this file will be visited almost every day by Google, so let&#8217;s think in terms of effectiveness.<\/p>\n\n\n\n<p>Use if you want to block something. Remember, great power, great responsibility.<\/p>\n\n\n\n<p>Always check that it is working properly. Here are some tools for you to try. Don&#8217;t just copy and paste without rhyme or reason, analyze for yourself why you do that.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"http:\/\/tools.seobook.com\/robots-txt\/analyzer\/\" rel=\"noopener\">SEO Tool<\/a> to analyze a robots.txt<\/li>\n\n\n\n<li><a href=\"https:\/\/developers.google.com\/search\/docs\/crawling-indexing\/robots\/intro?hl=es-419\" rel=\"noopener\">Official Google documentation on robots.txt<\/a> which is very well explained.<\/li>\n<\/ul>\n\n\n\n<p>An important detail:  <\/p>\n\n\n\n<p>Remember that Google <s>does whatever it wants<\/s>.  <\/p>\n\n\n\n<p>So as I told you in meta robots video, if you want to block a page or section from the search engine results, you must also use the <em>noindex<\/em> because the search engine could come directly, without going through your directive of this simple file.<\/p>\n\n\n\n<p>You will see that it is such a simple file that you will have no problem and it is part of the essence of SEO. If you have any questions, I will be glad to read your comments \ud83d\ude09  <\/p>\n\n\n\n<p>Live long and prosper!<\/p>\n\n\n\n<div style=\"height:30px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n","protected":false},"excerpt":{"rendered":"<p>The robots.txt is a text file located in the root directory of a website. It is used to tell search engine robots which pages they should not crawl. I tell you all about it in this SEO guide<\/p>\n","protected":false},"author":1,"featured_media":10594,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"_uag_custom_page_level_css":"","_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[70,87],"tags":[71,88],"class_list":["post-10601","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-seo-en","category-guides","tag-google-en","tag-search-engines"],"featured_image_urls_v2":{"full":["https:\/\/wajari.com\/wp-content\/uploads\/2023\/08\/2023-08-robots.png",640,426,false],"thumbnail":["https:\/\/wajari.com\/wp-content\/uploads\/2023\/08\/2023-08-robots-150x150.png",150,150,true],"medium":["https:\/\/wajari.com\/wp-content\/uploads\/2023\/08\/2023-08-robots-300x200.png",300,200,true],"medium_large":["https:\/\/wajari.com\/wp-content\/uploads\/2023\/08\/2023-08-robots.png",640,426,false],"large":["https:\/\/wajari.com\/wp-content\/uploads\/2023\/08\/2023-08-robots.png",640,426,false],"1536x1536":["https:\/\/wajari.com\/wp-content\/uploads\/2023\/08\/2023-08-robots.png",640,426,false],"2048x2048":["https:\/\/wajari.com\/wp-content\/uploads\/2023\/08\/2023-08-robots.png",640,426,false]},"post_excerpt_stackable_v2":"<p>The robots.txt is a text file located in the root directory of a website. It is used to tell search engine robots which pages they should not crawl. I tell you all about it in this SEO guide<\/p>\n","category_list_v2":"<a href=\"https:\/\/wajari.com\/en\/categoria\/seo-en\/\" rel=\"category tag\">SEO<\/a>, <a href=\"https:\/\/wajari.com\/en\/categoria\/guides\/\" rel=\"category tag\">Guides<\/a>","author_info_v2":{"name":"Wajari Vel\u00e1squez","url":"https:\/\/wajari.com\/en\/author\/wajari\/"},"comments_num_v2":"0 comments","jetpack_featured_media_url":"https:\/\/wajari.com\/wp-content\/uploads\/2023\/08\/2023-08-robots.png","uagb_featured_image_src":{"full":["https:\/\/wajari.com\/wp-content\/uploads\/2023\/08\/2023-08-robots.png",640,426,false],"thumbnail":["https:\/\/wajari.com\/wp-content\/uploads\/2023\/08\/2023-08-robots-150x150.png",150,150,true],"medium":["https:\/\/wajari.com\/wp-content\/uploads\/2023\/08\/2023-08-robots-300x200.png",300,200,true],"medium_large":["https:\/\/wajari.com\/wp-content\/uploads\/2023\/08\/2023-08-robots.png",640,426,false],"large":["https:\/\/wajari.com\/wp-content\/uploads\/2023\/08\/2023-08-robots.png",640,426,false],"1536x1536":["https:\/\/wajari.com\/wp-content\/uploads\/2023\/08\/2023-08-robots.png",640,426,false],"2048x2048":["https:\/\/wajari.com\/wp-content\/uploads\/2023\/08\/2023-08-robots.png",640,426,false]},"uagb_author_info":{"display_name":"Wajari Vel\u00e1squez","author_link":"https:\/\/wajari.com\/en\/author\/wajari\/"},"uagb_comment_info":0,"uagb_excerpt":"The robots.txt is a text file located in the root directory of a website. It is used to tell search engine robots which pages they should not crawl. I tell you all about it in this SEO guide","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/wajari.com\/en\/wp-json\/wp\/v2\/posts\/10601","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/wajari.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wajari.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/wajari.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/wajari.com\/en\/wp-json\/wp\/v2\/comments?post=10601"}],"version-history":[{"count":4,"href":"https:\/\/wajari.com\/en\/wp-json\/wp\/v2\/posts\/10601\/revisions"}],"predecessor-version":[{"id":10605,"href":"https:\/\/wajari.com\/en\/wp-json\/wp\/v2\/posts\/10601\/revisions\/10605"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wajari.com\/en\/wp-json\/wp\/v2\/media\/10594"}],"wp:attachment":[{"href":"https:\/\/wajari.com\/en\/wp-json\/wp\/v2\/media?parent=10601"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wajari.com\/en\/wp-json\/wp\/v2\/categories?post=10601"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wajari.com\/en\/wp-json\/wp\/v2\/tags?post=10601"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}