Make Money at Top Bucks
Topbucks can help you make fat cash on your website!

LinuxReviws.org --get your your Linux knowledge
> Linux Reviews > Web Design Tips and Info >

RSS Tutorial for Content Publishers and Webmasters

This tutorial explains the features and benefits of a Web format called RSS, and gives a brief technical overview of it. The reader is assumed to have some familiarity with XML and other Web technologies.


RSS Tutorial

for Content Publishers and Webmasters

  1. Introducing RSS
    1. What's in a RSS feed?
    2. How do people use feeds?
    3. Why should I make an RSS feed available?
    4. But isn't that giving away my content?
  2. Choosing Content for RSS Feeds
  3. Publishing RSS
  4. Telling People About Your Feed
  5. RSS Versions and Modules
    1. RSS 0.9x
    2. RSS 1.0
    3. Dublin Core Module
  6. Tips for Generating Good RSS Feeds
  7. RSS Tools and Validators
  8. Aggregators and other RSS Clients
  9. More Information about RSS
  10. About this Document

Introducing RSS

Think about all of the information that you access on the Web on a day-to-day basis; news headlines, search results, "What's New", job vacancies, and so forth. A large amount of this content can be thought of as a list; although it probably isn't in HTML <li> elements, the information is list-oriented.

Most people need to track a number of these lists, but it becomes difficult once there are more than a handful of sources. This is because they have to go to each page, load it, remember how it's formatted, and find where they last left off in the list.

RSS is an XML-based format that allows the syndication of lists of hyperlinks, along with other information, or metadata, that helps viewers decide whether they want to follow the link.

RSS allows peoples' computers to fetch and understand the information, so that all of the lists they're interested in can be tracked and personalized for them. It is a format that's intended for use by computers on behalf of people, rather than being directly presented to them (like HTML).

To enable this, a Web site will make an RSS feed, or channel, available, just like any other file or resource on the server. Once a feed is available, computers can regularly fetch the file to get the most recent items on the list. Most often, people will do this with an aggregator, a program that manages a number of lists and presents them in a single interface.

RSS can also be used for other kinds of list-oriented information, such as syndicating the content itself (often weblogs) along with the links. However, this tutorial focuses on the use of RSS for syndication of links.

What's in a RSS feed?

A feed contains a list of items, each of which is identified by a link. Each item can have any amount of metadata associated with it.

The most basic metadata supported by RSS includes a title for the link and a description of it; when syndicating news headlines, these fields might be used for the story title and the first paragraph or a summary, for example. For example, an simple item might look like;

<item>
  <title>Earth Invaded</title>
  <link>http://news.example.com/2004/12/17/invasion</link>
  <description>The earth was attacked by an invasion fleet 
  from halfway across the galaxy; luckily, a fatal 
  miscalculation of scale resulted in the entire armada 
  being eaten by a small dog.</description>

</item>

Additionally, the feed itself can have metadata associated with it, so that it can be given a title (e.g., "Bob's news headlines"), description, and other fields like publisher and copyright terms.

For an idea of what full feeds look like, see 'RSS Versions and Modules'.

How do people use feeds?

Aggregators are the most common use of RSS feeds, and there are several types. Web aggregators (sometimes called portals) make this view available in a Web page; my Yahoo is a well-known example of this. Aggregators have also been integrated into e-mail clients, users' desktops, or standalone, dedicated software. See 'Aggregators and other RSS Clients' for more information.

Aggregators can offer a variety of special features, including combining several related feeds into a single view, hiding items that the viewer has already seen, and categorizing feeds and items.

Other uses of RSS feeds include site tracking by search engines and other software; because the feed is machine-readable, the search software doesn't have to figure out which parts of the site are important and which parts are just the navigation and presentation. You may also choose to allow people to republish your feeds on their Web sites, giving them the ability to represent your content as they require.

Why should I make an RSS feed available?

Your viewers will thank you, and there will be more of them, because RSS allows them to see your site without going out of their way to visit it.

While this seems bad at first glance, it actually improves your site's visibility; by making it easier for your users to keep up with your site - allowing them to see it the way they want to - it's more likely that they'll know when something that interests them is available on your site.

For example, imagine that your company announces a new product or feature every month or two. Without a feed, your viewers have to remember to come to your site and see if they find anything new - if they have time. If you provide a feed for them, they can point their aggregator or other software at it, and it will give them a link and a description of developments at your site almost as soon as they happen.

News is similar; because there are so many sources of news on the Internet, most of your viewers won't come to your site every day. By providing an RSS feed, you are in front of them constantly, improving the chances that they'll click through to an article that catches their eye.

But isn't that giving away my content?

No! You still retain copyright on your content if you wish to.

By supplying an RSS feed, you can control what information is syndicated in the feed, whether it's a full article or just a teaser. Your content can still be protected by your current access control mechanisms; only the links and metadata are distributed. You can also protect the RSS feed itself with SSL encryption and HTTP username/password authentication too, if you'd like.

In many ways, RSS is similar to the subscription newsletters that many sites offer to keep viewers up-to-date. The big difference is that they don't have to supply an e-mail address, lowering the barrier of privacy concerns, while still giving you a direct channel to your viewers. Also, they get to see the content in the manner that's most convenient to them, which means that you get more eyes looking at your content.

Choosing Content for RSS Feeds

Any list-oriented information on your site that your viewers might be interested in tracking or reusing is a good candidate for an RSS feed. This can encompass news headlines and press releases, job listings, conference calendars and rankings (like 'top 10' lists).

For example;

  • News & Announcements - headlines, notices and any list of announcements that are added to over time
  • Document listings - lists of added or changed pages, so that people don't need to constantly check for different content
  • Bookmarks and other external links - while most people use RSS for sharing links from their own sites, it's a natural fit for sharing lists of external links
  • Calendars - listings of past or upcoming events, deadlines or holidays
  • Mailing lists - to compliment a Web-based archive of public or private e-mail lists
  • Search results - to let people track changing or new results to their searches
  • Databases - job listings, software releases, etc.

While it's a good start to have a 'master feed' for your site that lists recent news and events, don't stop there. Generally, each area of your site that features a changing list of information should have a corresponding feed; this allows viewers to precisely target their interests.

For example, if your news site has pages for World news, national news, local news, business, sports, etc., there should be a feed for each of these sections.

If your site offers a personalized view of data (e.g., people can choose categories of information that will show up on their home page), offer this as a feed, so that the viewers' Web pages match the content of their feeds.

A great example of this is Apple's iTunes Music Store RSS feed generator; you can customize it based on your preferences, and the views it allows match those provided in the Music Store itself.

Finally, remember that feeds are just as - if not more - useful on an Intranet as they are on the Internet. RSS can be a powerful tool for sharing and integrating information inside a company.

Publishing RSS

There are a number of ways to generate a feed from your content. First of all, explore your content management system - it might already have an option to generate an RSS feed.

If that option isn't available, you have a number of choices;

  • Self-scraping - The easiest way to publish a feed from existing content. Scraping tools fetch your Web page and pull out the relevant parts for the feed, so that you don't have to change your publishing system. Some use regular expressions or XPath expressions, while others require you to mark up your page with minimal hints (usually using <div> or <span> tags) that help it decide what should be put into the feed.
  • Feed integration - If your site is dynamically generated (using languages like Perl, Python or PHP), it may have a RSS library available, so that you can integrate the feed into your publishing process.
  • Starting with the feed - Alternatively, you can manage the list-oriented parts of your content in the RSS feed itself, and generate your Web pages (as well as other content, like e-mail lists) from the feed. This has the advantage of always having the correct information in the feed, and tools like XSLT make this option easy, especially if you're starting from scratch.
  • Third party scraping - If none of these options work for you, some people on the Web will scrape your site for you and make the feed available. Be warned, however, that this is never as reliable or accurate as doing it yourself, because they don't know the details of your content or your system. Also, using third parties introduces another point of failure in the delivery process; problems there (network, server or business) will cause your feed to be unavailable.

For more information about all of these options, see "Tools for generating and validating RSS feeds" and "More Information about RSS".

Telling People About Your Feed

An important step after publishing a feed is letting your viewers know that it exists; there are a lot of feeds available on the Web now, but it's hard to find them, making it difficult for viewers to utilize them.

Pages that have an associated RSS feed should clearly indicate this to viewers by using a link containing like 'RSS feed'. For example,

<a type="application/rss+xml" href="feed.rss">RSS feed for this page</a>

where 'feed.rss' is the URL for the feed. the 'type' attribute tells browsers that this is a link to an RSS feed in a way that they understand.

Additionally, some programs look for a link in the <head> section of your HTML. To support this, include a <link> tag;

<head>
  <title>My Page</title>

  <link rel="alternate" type="application/rss+xml" 
   href="feed.rss" title="RSS feed for My Page">
</head>

These links should be placed on the Web page that is most similar to the feed content; this enables people to find them as the browse.

Finally, there are a number of guides and registries for RSS feeds that people can search and browse through, much like the Yahoo directory for Web sites; it's a good idea to register your feed. See "Related Resources" for more information.

RSS Versions and Modules

There are two main versions of the RSS format in use today; RSS 0.9x and RSS 1.0. Although the numbers might lead you to believe that 1.0 replaces 0.9x, both are being actively used and developed. Each version has its benefits and drawbacks; RSS 0.9x is known for its simplicity, while RSS 1.0 is more extensible and fully specified. Both formats are XML-based and have the same basic structure.

People tend to get into heated discussions about the better format. Ultimately, it's a choice you shouldn't worry too much over; good RSS tools and aggregators will understand both formats. This section presents a quick overview of each; for more information, see their specifications and supporting materials.

RSS 0.9x

RSS 0.9x (the 'x' is for the last digit; as of writing, RSS 0.94 is in development) was designed by Netscape Communications and UserLand software, and is championed by UserLand's Dave Winer. In this version, RSS stands for "Really Simple Syndication," and simplicity is its focus.

This branch of RSS is based on RSS 0.91, which was first documented at Netscape and later refined by Userland.

Included in 0.92 - the latest stable version - are channel metadata like link, title, description; image, which allows you to specify a thumbnail image to display with the feed); webMaster and managingEditor, to identify who's responsible for the feed, and lastBuildDate, which shows when the feed was last updated. Items have the standard link, title and description metadata, as well as other, more experimental facilities like enclosure, which allows attachments to be automatically downloaded (don't expect these features to be supported by all aggregators, however).

RSS 0.9x takes a versioned approach to extensibility; new features are added by declaring a new version of RSS in the 0.9 series. Winer controls the release of new versions, so if you have suggestions about the future of RSS 0.9x, it's best to talk to him.

Here's an example of a minimal RSS 0.9x feed:

<?xml version="1.0"?>
<rss version="0.91">
  <channel>
    <title>Example Channel</title>
    <link>http://example.com/</link>

    <description>My example channel</description>
    <item>
       <title>News for September the Second</title>
       <link>http://example.com/2002/09/01</link>

       <description>other things happened today</description>
    </item>
    <item>
       <title>News for September the First</title>
       <link>http://example.com/2002/09/02</link>

    </item>
  </channel>
</rss>

RSS 1.0

RSS 1.0 stands for "RDF Site Summary." This flavor of RSS incorporates RDF, a Web standard for metadata. Because RSS 1.0 uses RDF, any RDF processor can understand RSS without knowing anything about it in particular.

RSS 1.0 also uses XML Namespaces to allow extensions - called RSS Modules - to be added without worrying about conflicts. This is because RSS 1.0 doesn't use a central person for extending the format; instead, namespaces are used to describe a space for your own extensions. For example, if you had an ISBN module to track books, it might look like this;

<item xmlns:book="http://namespace.example.com/book/1.0"
 rdf:about="http://www.amazon.com/exec/obidos/tg/detail/-/0553575376">
  <title>Excession</link>
  <link>http://www.amazon.com/exec/obidos/tg/detail/-/0553575376</link>

  <book:isbn>0553575376</book:isbn>
</item>

Generally, though, you should look for available RSS Modules, rather than defining your own, unless you're sure that what you need doesn't exist.

RSS 1.0 feeds look very similar to RSS 0.9x feeds, with a few key differences;

  • The entire feed is wrapped in <rdf:RDF> ... </rdf:RDF> elements (so that processors know that it's RDF)
  • Each <item> has an rdf:about attribute that usually, but not always, matches the <link>; this assigns an identifier to each item
  • There's an <items> element in the channel metadata that contains a list of items in the channel, so that RDF processors can keep track of the relationship between the items
  • Some metadata uses the rdf:resource attribute to carry links, instead of putting it inside the element.

RSS 1.0 is developed and maintained by an ad hoc group of interested people; see their Web site for more information about RSS 1.0 and RSS Modules. See below for an example of an RSS 1.0 feed.

Dublin Core Module

The most well-known example of an RSS Module is the Dublin Core Module. The Dublin Core is a set of metadata developed by librarians and information scientists that standardizes a set of common metadata that is useful for describing documents, among other things. The Dublin Core Module uses these metadata to attach information to both feeds (in the channel metadata) and to individual items.

This module includes useful elements like dc:date, for associating dates with items, dc:subject, which can be useful for categorizing items or feeds, and dc:rights, for dictating the intellectual property rights associated with an item or a feed.

Here's an example of a minimal RSS 1.0 feed that uses the Dublin Core Module:

<?xml version="1.0"?>

<rdf:RDF 
 xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
 xmlns="http://purl.org/rss/1.0/"
 xmlns:dc="http://purl.org/dc/elements/1.1/"
>
  <channel rdf:about="http://example.com/news.rss">
    <title>Example Channel</title>
    <link>http://example.com/</link>
    <description>My example channel</description>

    <items>
      <rdf:Seq>
        <rdf:li resource="http://example.com/2002/09/01/"/>
        <rdf:li resource="http://example.com/2002/09/02/"/>
      </rdf:Seq>
    </items>

  </channel>
  <item rdf:about="http://example.com/2002/09/01/">
     <title>News for September the First</title>
     <link>http://example.com/2002/09/01/</link>
     <description>other things happened today</description>

     <dc:date>2002-09-01</dc:date>
  </item>
  <item rdf:about="http://example.com/2002/09/02/">
     <title>News for September the Second</title>
     <link>http://example.com/2002/09/02/</link>

     <dc:date>2002-09-02</dc:date>
  </item>
</rdf:RDF>

As you can see, RSS 1.0 is a bit more verbose than 0.9x, mostly because it needs to be compatible with other versions of RSS while containing the markup that RDF processors need.

Tips for Generating Good RSS Feeds

RSS is easy to work with, but like any new format, you may encounter some problems in using it. This section attempts to address the most common issues that arise when generating a feed.

  • Meaningful Links - Give every item in your feed a distinct URL in the <link> tag, so that software can tell the difference between items, and recognize items that it's already seen. If two items really point at the same page, you can use different fragment identifiers; e.g.,http://www.example.com/#x2002-09-01 and http://www.example.com/#x2002-09-02.
  • Meaningful Metadata - Try to make the metadata useful on its own; for example, if you only include a short <title>, people may not know what the link is about. By the same token, if you shove an entire article into <description>, it'll crowd people's view of the feed, and they're less likely to stay interested in what you have to say. Generally, you want to put enough into the feed to help someone decide whether they should follow the link.
  • Encoding HTML - Although it's tempting, refrain from including HTML markup (like <a href="...">, <b> or <p>) in your RSS feed; because you don't know how it will be presented, doing so can prevent your feed from being displayed correctly. If you need to include a a tag in the text of the feed (e.g., the title of an item is "Ode to <title>"), make sure you escape ampersands and angle brackets (so that it would be "Ode to &lt;title&gt;").
  • XML Entities - Remember that XML doesn't predefine entities like HTML does; therefore, you won't have &nbsp; &copy; and other common entities available. You can define them in the XML, or alternatively just use an character encoding that makes what you need available.
  • Character Encoding - Some software generates feeds using Windows character sets, and sometimes mislabels them. The safest thing to do is to encode your feed as UTF-8 and check it by parsing it with an XML parser.
  • Version Compatibility - RSS 1.0 generators need to take special steps to ensure compatibility with 0.9x parsers; most importantly, use the default namespace for RSS. See the 1.0 spec for more information.
  • Communicating with Viewers - Don't use items in your feed to communicate to your users; for example, some feeds have been known to use the <description> to dictate copyright terms. Use the appropriate element or module.
  • Communicating with Machines - Likewise, use the appropriate HTTP status codes if your feed has relocated (usually, 301 Moved Permanently) or is no longer available (410 Gone or 404 Not Found).
  • Making your Feed Cache-Friendly - Successful RSS feeds see a fair amount of traffic because clients poll them often to see if they've changed. To support the load, Web Caching can help; see the caching tutorial.

RSS Tools and Validators

This is an incomplete list of tools for creating RSS feeds, and checking them to make sure that you've done so correctly. Note that there are many more libraries that help parsing RSS; these haven't been included here because this tutorial focuses on the Webmaster, not consumers of RSS.

  • xpath2rss - A tool for scraping Web sites using XPath expressions (a method of selecting parts of HTML and XML documents).
  • RSS.py - A Python library for generating and parsing RSS.
  • XML::RSS - A Perl module for generating and parsing RSS.
  • Orchard RSS - Work with feeds as a collection of nodes; support for Python, Perl and C.
  • Site Summaries in XHTML - An online service (also available as an XSLT stylesheet) that uses hints in your HTML to generate a feed.
  • myRSS - An online, third-party automated scraping service. Doesn't require any special markup.
  • Online RSS 0.9x Validator - Check your 0.9x feeds; from UserLand.
  • Online RSS 1.0 Validator - Check your 1.0 RSS feeds; includes module support. From Leigh Dodds.
  • Online RSS 1.0 Validator - Another 1.0 validator, from Dave Beckett.

Aggregators and Other RSS Clients

This is an incomplete listing of aggregators and other consumers of RSS content. For more, see "More Information about RSS."

  • Headline Viewer - The original desktop aggregator. For most versions of Windows.
  • SharpReader - Windows-based desktop aggregator; many features.
  • NetNewsWire - A newer, standalone desktop aggregator for MacOS X.
  • Radio UserLand - Hybrid desktop/Web aggregator and weblogging tool. Windows and Macintosh (7.5.5+ and OSX).
  • Meerkat - A Web-based aggregator by O'Reilly.
  • News is Free - Another Web-based aggregator which also does some third-party scraping.
  • Apache JetSpeed - An Enterprise-class Java Portal that supports RSS.
  • Daypop - A search engine for RSS-based news.
  • Syndicated content - Good list of best practices for creating an RSS feed.
  • Syndic8 - A community effort to gather, validate and search feeds with lots of other information.
  • RSS Workshop - A well-regarded introduction to publishing RSS feeds, from the state of Utah Online Services division.
  • Content Syndication with XML and RSS - RSS information and a forthcoming book by Ben Hammersley.
  • RSSInfo - Lists aggregators, toolsets and RSS-related news.
  • RSS Devcenter - O'reilly's Web portal for all things RSS.

About this Document

This document is Copyright © 2002-2005 Mark Nottingham <mnot@pobox.com>. This work is licensed under a Creative Commons License.

If you do mirror this document, please send e-mail to the address above, so that you can be informed of updates.

All trademarks within are property of their respective holders.

Although the author believes the contents to be accurate at the time of publication, no liability is assumed for them, their application or any consequences thereof. If any misrepresentations, errors or other need for clarification is found, please contact the author.

The latest revision of this document can always be obtained from http://www.mnot.net/rss/tutorial/

Version 0.84 -- March 25, 2005

Creative Commons License


The Feed Validator is a great way to check if your rss feed is made correctly.


- Next: Search Engine Control: robots.txt
- Previous: Using .htaccess for site control

Meet new people