What To Do When robots.txt File Prevent Googlebot From Crawling Your Site?

What To Do When robots.txt File Prevent Googlebot From Crawling Your Site?

You land on this page as you want to know what to do when the robots.txt file prevents Googlebot from crawling your site?

robots.txt file prevents Googlebot from crawling your site is the most common problem faced by bloggers or website owners. Ranking in google search is the lifeline for any blog or website at present and if a robots.txt file in your database or server prevents Googlebot from crawling your site that means google crawlers are not able to fetch this file.

For your reference below we are mentioning few errors, we usually encounter due to malfunctioning of the robots.txt file.

Error: 1 - http://example.com/: Googlebot can’t access your site

Over the last 24 hours, Googlebot encountered 255 errors while attempting to access your robots.txt. To ensure that we didn’t crawl any pages listed in that file, we postponed our crawl. Your site’s overall robots.txt error rate is 66.2%.

Error: 2 - Error with other crawlers like woorank.com

This URL cannot be reviewed.

There may be several reasons for this: the URL does not exist, the website does not allow woorank to make reviews, the website denies both our HEAD and GET requests. The DNS send to an infinite loop or the robots.txt page does not allow access to our bot.

Error: 3 - Biggest Confusion

http://yoursite.com/robots.txt URL shows working robots.txt file in your server. This URL pointing it to be present in the root of the site (based on the URL structure above). Secondly, bots and crawlers did not read or execute the robots.txt file if it's present in the subfolder. BUT on checking people often complain that in root they not able to find any robots.txt file but URL showing it present and working mode.

We will give you proper solutions for all errors but before that, it is necessary to have some little discussion on the robots.txt file, what are these files, whats are the pros and cons of using them so that if any beginner also lands on this page can have maximum help with this information.

A) What Are robots.txt File And Their Utility

In very simple words robots.txt is a file to give hints to crawlers/users not to access specific content or URL on your site or server. If you do not need to keep any content private or not want to prevent anything from being indexed by Google then you not even need this “robots.txt” file at all.

B) robots.txt File Physically Present At Root Folder Of Your Site

If you have a robots.txt file physically present in the root folder of your site then trust me your life has become really very easy. I will recommend you to first ask your host to fix that file or you can use the proper format of code to edit that file [for proper code refer to below part of article]. In plugins like “All in one SEO,” there are inbuilt options to create and customize robots.txt file.

C) robots.txt File Virtually Present But Not In Root Folder

This is the toughest part of this problem. You need to fix your robots.txt file but it's not available at the root of your site in real. It's just a virtual file and without making changes to it you cant get rid of the issue with Google bots. Even if you do not have a problem with Googlebot then also you may need to make changes to the robots.txt file to make some content private.

So considering this problem we are suggesting you best solutions please try them 1 by 1 as a checklist and we are sure your problem will be solved.

1. Go To Dashboard >> Privacy >> “My blog is visible to anyone”.

At the time of installation of WordPress or later if you chose the option “ to block search engine from indexing your site” then even there is no robots.txt file in your server but if you will view the source code you can see virtually “robots.txt “ file present there.

2. Plugin like Google XML Sitemaps Generator also creates a virtual “robots.txt” file so follow the below solutions.

a) In the plugin settings page Please uncheck the box saying “Add sitemap URL to the virtual robots.txt file.”
b) You can add onto the sitemap by adding some code in the “sitemap-core.php” file for this plugin.

Find a line in your “sitemap-core.php” file identical to the below code:

echo “nSitemap: ” . $smUrl . “n”;

This adds the sitemap to the robots.txt file, Now ad this code to allow all robots for now and your code should look like this.

echo “nUser-agent: *”;
echo “nAllow: /n”;
echo “nSitemap: ” . $smUrl . “n”;

You can add as much content as you want from there according to your needs.

3. Fixing directly the virtual “robots.txt” file in WordPress.

The actual robots.txt comes from /functions.php file in the /wp-includes/ folder.

Find a line in your “functions.php” file inside wp-includes folder identical to below code:

function do_robots() {
header( ‘Content-Type: text/plain; charset=utf-8’ );
do_action( ‘do_robotstxt’ );
if ( ‘0’ == get_option( ‘blog_public’ ) ) {
echo “User-agent: *n”;
echo “Disallow: /n”;}
else {
echo “User-agent: *n”;
echo “Disallow:n”;}
}

Below is just sample for working robots.txt file,you can feel free to change the directories to suit your needs…

function do_robots() {
header( ‘Content-Type: text/plain; charset=utf-8’ );
do_action( ‘do_robotstxt’ );
if ( ‘0’ == get_option( ‘blog_public’ ) ) {
echo “User-agent: *”;
echo “nDisallow: /wp-admin”;
echo “nDisallow: /wp-includes”;
echo “nDisallow: /wp-content”;
echo “nDisallow: /stylesheets”;
echo “nDisallow: /_db_backups”;
echo “nDisallow: /cgi”;
echo “nDisallow: /store”;
echo “nDisallow: /wp-includesn”;
}
else {
echo “User-agent: *”;
echo “nDisallow: /wp-admin”;
echo “nDisallow: /wp-includes”;
echo “nDisallow: /wp-content”;
echo “nDisallow: /stylesheets”;
echo “nDisallow: /_db_backups”;
echo “nDisallow: /cgi”;
echo “nDisallow: /store”;
echo “nDisallow: /wp-includesn”;}
}

4. Disabling virtual robots.txt so that it does not stop Google bots from crawling from your site.

Create a proper robots.txt file in the root folder of the site make sure they allowing crawlers that Googlebot crawlers recognize robots.txt files present in the root directory as legitimate files and they will automatically ignore virtual robots.txt files.

Testing = login google webmaster tools panel – go to “fetch as google” and try to test by fetching url – http://www.yoursite.com/robots.txt/ and if URL is submitted and indexed then your error is properly solved and Googlebot are now able to crawl, fetch and index your website easily.

Actually, Google bots want http:/www.yoursite.com/robots.txt/ to return “error 200” which means robots.txt available and allows Googlebot to crawl and your site.

OR

Google bots want http:/www.yoursite.com/robots.txt/ to return “error 404” which means robots.txt not available and allows Googlebot can crawl anything on your site.

With all suggestions mentioned above we are sure that 100% of your robots.txt issue will be solved but as there are many SEO plugins providing robots.txt file creation features so we will recommend you to contact your host providers or very experienced in the relevant field in the case the above solutions not worked for you to get professional assistance without damaging your site or its features.

Please like, share & comment!

0 0

Post Your Comment

2 + 2 =

Comments

Get In Touch