Robots.txt for WordPress

From LinuxReviews
Jump to navigationJump to search

This is a Robots Exclusion Standard (robots.txt) file which is ideal for WordPress.

File: robots.txt
# This rule means it applies to all user-agents
User-agent:  *

# Disallow all files ending with these extensions
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.gz$
Disallow: /*.wmv$
Disallow: /*.tar$
Disallow: /*.tgz$
Disallow: /*.cgi$
Disallow: /*.xhtml$
 
# Disallow parsing indididual post feeds and trackbacks..
Disallow: */feed/
Disallow: */trackback/
 
# Disallow all files with ? in url
Disallow: /*?*
Disallow: /*?
 
# Disallow all archived monthlies
Disallow: /2004/0*
Disallow: /2005/0*
Disallow: /2006/0*
Disallow: /2007/0*
Disallow: /2006/1*
Disallow: /2007/1*

Special bots[edit]

Many sites have a long list of bots with various access settings. This is generally a bad idea. However, sometimes you are required to allow more than normal: Many WordPress bloggers use advertisements from Google, in which case you have to allow their advertisement-bot:

File: robots.txt
# This is the ad bot for google
User-agent: Mediapartners-Google*
 
# Allow Everything
Disallow: 

See also[edit]