Home

How XML Enables Web Syndication

Jamene Brooks-Kieffer
Resource Linking Librarian
K-State Libraries

This informal paper documents my process of learning about Web syndication, better known as RSS feeds. The original version of this paper was created for Dr. Gary Burnett's course on Web Development & Administration at Florida State University's College of Information in October 2005. In January 2007, I revised it for K-State Libraries.

Web syndication has become an incredibly popular means of spreading information around on the Web. Competing syndication protocols - including RSS 0.91, RSS 1.0, RSS 2.0, and Atom - push newly updated information to users who have asked for it. That information could be the daily headlines from the New York Times, updates to a personal blog, new product notifications from a favorite retailer, or a regular and provocative podcast.

To users, an update from many of these very different sources appears as a headline in their blog reader or news aggregator - perhaps Bloglines or News Gator. How do the developers at these diverse sources make their data all look the same to the user? The transmitted data must be structured in some common way. Under the hood of all the syndication protocols, collectively referred to here as RSS, XML quietly organizes and labels the information creating the syndication feed. In this paper, I look into how XML enables RSS.

True to its nature, XML does not do anything - it is not acting on the data input for syndication. Rather, XML acts more like file drawers - storing, labeling, and separating pieces of information that the RSS feed is going to syndicate. First, the XML must announce its presence at the beginning of the document:

<?xml version=”1.0” encoding=”iso-8859-1”?>

According to the XML 1.0 standard, the code should also declare a formal Document Type Definition (DTD). Since the XML is being used in an application, the DTD should refer to the specification for that application. This DTD from an XML.com news feed demonstrates:

<!DOCTYPE rss PUBLIC “-//Netscape Communications//DTD RSS 0.91//EN” “http://my.netscape.com/publish/formats/rss-0.91.dtd”>

Many RSS documents that I examined for this paper skip the DTD. They go directly to identifying the version of RSS used for the document. This sparse definition from the Benton Foundation news feed demonstrates:

<rss version=”2.0” xml:base=”http://benton.org”>

To ensure validity, this declaration should follow a reference to the RSS 2.0 standard, located at http://blogs.law.harvard.edu/tech/rss.

Once the XML and RSS versions are announced, XML’s work of data structuring can begin. The body of the RSS document is called a channel. The <channel> element parents all the other elements that describe the feed information - think of a television or radio station with one location and identity, but many different programs. The three child elements required inside <channel> are <title>, <link>, and <description>, these being three critical things a user needs to make sense of a feed. Other descriptive elements are optional. Here is a sample of these three child elements organizing a feed:

<?xml version=”1.0” encoding=” iso-8859-1”?>
<!DOCTYPE rss PUBLIC “http://blogs.law.harvard.edu/tech/rss#roadmap”>
<rss version=”2.0”>
<channel>
<title>Some Important Blog</title>
<link>http://www.someblog.com</link>
<description>Daily diary entries, reviews, and rants</description>
</channel>

These lines of code create an XML description of this blog’s existence, but there are no updates yet, which is the point of syndication. To add updates, we need the <item> element. This element is also a child of <channel>, and parents its own sub-children - elements that provide more details about the document. While the initial <title>, <link>, and <description> will probably only appear once for a given feed, <item> and its sub-children will appear repeatedly, once for every update that is posted to the site and channeled through the feed. Note that <title>, <link>, and <description> are the likely sub-children of <item>, although only either <title> or <description> is required. Other descriptive elements are optional. Here is the sample RSS feed with a posted update:

<?xml version=”1.0” encoding=” iso-8859-1”?>
<!DOCTYPE rss PUBLIC “http://blogs.law.harvard.edu/tech/rss#roadmap”>
<rss version=”2.0”>
<channel>
<title>Some Important Blog</title>
<link>http://www.someblog.com</link>
<description>Daily diary entries, reviews, and rants</description>
<item>
<title>Review: Wallace & Gromit movie</title>
<link>http://www.someblog.com/reviews/wallace.htm</link>
<description>Nick Park shines, Dreamworks needs BBC lessons</description>
</item>
</channel>

These <item></item> elements will continue to appear with every update to the syndicated portion of the blog or other source.

With this simple set of required elements, XML succeeds in structuring all kinds of data for web syndication. The developer chooses how much or little a user sees in each headline. The simple sample feed shows a title, link, and brief description, but extensive headlines are possible that include author, category, publication date, and other elements.

By looking “under the hood” of RSS, we can see that the protocols work well because they use ubiquitous ideas about how data can be organized. These ideas, such as “title” and “description” are easily given labels and containers with XML elements.

RSS is flexible enough to meet the needs of both small and large information-oriented Web sites, but it is not appropriate for all sites. Syndication pushes “headlines” at a user, who can opt to click to the site for more information. Information-oriented Web sites that remain relatively static will not need RSS to keep their users updated. Sites that update frequently, however, are good candidates for success with RSS. Syndicating these sites’ updates will enable users to:

The Annotated XML Specification is an excellent resource for understanding the interior workings of XML. Created by Tim Bray, the annotated specification displays the unedited text of the XML 1.0 spec alongside comments and explanations in plain English. Because XML and RSS are so tightly intertwined, this source is a great place to begin learning.

There is a great deal of “buzz” about what syndication can do for libraries. The more we understand about how this technology works, the better able we will be to use these tools appropriately.

Sources Consulted:

Pilgrim, M. (2002, December 18). What is RSS? [Dive into XML column]. O’Reilly Media’s XML.com. Retrieved October 9, 2005, from http://www.xml.com/pub/a/2002/12/18/dive-into-xml.html


Copyright © 2005-2007 Jamene Brooks-Kieffer

Home