Using robots.txt

Developers can make their own robots.txt file and add it to the application to control how web crawling bots access their site.

It is important to note that the robots.txt file is always publicly available. Take care not to expose sensitive information when adding paths to the robots.txt file.

It is also important to note that bots aren't compelled to use the robots.txt file, and as such it is not a guaranteed security measure against bots.

Making a robots.txt file

The robots.txt should be added to the root directory of the file manager:

 

Inside the file, entries should be added with the following layout:

User-agent :
Allow :
Disallow :

After 'User-agent', enter the name of the robot.

After 'Allow' and 'Disallow' lines, add the paths that are allowed and disallowed. If paths aren't disallowed, they are implicitely allowed.

In all three lines, wildcards can be used.

Examples

In the following example the Google search bot is allowed access to all paths except for those starting with /private/. This is because all paths that aren't disallowed are implicitely allowed.

User-agent : Googlebot
Disallow : /private/

In this example all bots are disallowed every path except for /public/example.gif:

User-agent : *
Disallow : /
Allow : /public/example.gif

In this example all bots are allowed paths starting with /public/ but disallows access to paths ending with .gif:

User-agent : Googlebot
Disallow : *.gif
Allow : /public/

In this example Googlebot is allowed every path except for those starting with /private/, while Spambot is disallowed every path

User-agent : Googlebot
Disallow : /private

User-agent : SpamBot
Disallow : /