Robots.txt for WordPress
From LinuxReviews
Jump to navigationJump to searchThis is a Robots Exclusion Standard (robots.txt) file which is ideal for WordPress.
File: robots.txt
# This rule means it applies to all user-agents User-agent: * # Disallow all files ending with these extensions Disallow: /*.php$ Disallow: /*.js$ Disallow: /*.inc$ Disallow: /*.css$ Disallow: /*.gz$ Disallow: /*.wmv$ Disallow: /*.tar$ Disallow: /*.tgz$ Disallow: /*.cgi$ Disallow: /*.xhtml$ # Disallow parsing indididual post feeds and trackbacks.. Disallow: */feed/ Disallow: */trackback/ # Disallow all files with ? in url Disallow: /*?* Disallow: /*? # Disallow all archived monthlies Disallow: /2004/0* Disallow: /2005/0* Disallow: /2006/0* Disallow: /2007/0* Disallow: /2006/1* Disallow: /2007/1*
Special bots[edit]
Many sites have a long list of bots with various access settings. This is generally a bad idea. However, sometimes you are required to allow more than normal: Many WordPress bloggers use advertisements from Google, in which case you have to allow their advertisement-bot:
File: robots.txt
# This is the ad bot for google User-agent: Mediapartners-Google* # Allow Everything Disallow: