Control how Google indexes your website with a simple robots.txt file

on 16 March 2018, 10:06:04 AM

A robots.txt file allows you to control how web crawlers, like Google, Bing , etc. access parts of your website. The file iteslf sits in the root folder of your website and adheres to the Robots Exclusion Protocol. This protocol allow you to control access to your website by URL or by the type of crawler. 

Not all crawlers/spiders follow this protocol to the letter and some ignore it completely; e.g. spambots, malware etc. 

How robots.txt file works.

Google is indexing your website and gets to the URL www.yoursite.com/news/ Just before loading this page th spider/crawler looks for www.yoursite.com/robots.txt and finds your robots file. The format may look like the following; 

Blocking All Access

User-agent: *

Disallow: /

The above placed inside a robotx.txt file will instruct all crawlers that they should not crawl any pages on the website.   To do the opposite and allow all spiders to crawl al pages on your website your robots.txt file would look like the following; 

Allow Full Access

User-agent: *

Allow: /

Additional exmples of robots.txt file can be found below. 

Block One Folder

User-agent: *

Allow: /folder/

Block One Page

User-agent: *

Allow: /news.html





More about robots.txt 

Tried our FREE SEO web audit?

Have you tried our FREE SEO AUDIT. Get our free PDF report for your nominated webpage and keyword.

Free SEO Audit

we specialise in rapid prototype development on the desktop, mobile or tablet