Thursday, August 3, 2006

Namespaced Extensions in Feeds

| More

Namespaced Extensions in Feeds

Feeds can be used for more than just text; they can embed pictures, podcasts and video. There are even more esoteric bits of data that can be attached to feeds, like the geographic location that a post is about, the number of comments it has received and that (legal) license its contents are available under. To make all of this information easily parseable by computers, it is usually available as additional items and attributes in XML namespaces. For example, the Media RSS namespace is used to add more information about videos and pictures, like dimensions, duration and a thumbnail.

This usually isn't of direct interest to end users, it's just matter of which namespaced extensions a feed reader supports, and the more the merrier. However, since there are quite a few ones out there, developers must make trade-offs and decisions. One easy way to prioritize extension support is to see which ones are used more often.

I wrote a small MapReduce program to go over our BigTable and get the top 50 namespaces based on the number of feeds that use them. This means that we only looked at feeds that have at least one subscriber, i.e. the "feeds that matter." Note that the default namespaces for syndication feed formats (e.g. for Atom 1.0) are excluded, since I was interested only in extensions to the elements that are already expected to be in a feed.

We thought this information might be of interest to others, the way our analysis of XML errors and web authoring statistics have been. If I have missed anything, or if you have any feedback, a message in our discussion group or a link to this blog post is the best way to reach us.

table#stats { border-spacing: 0; border-collapse: collapse; font-family: sans-serif; } table#stats code { color: #333; font-weight: bold; } table#stats thead { background: #eee; } table#stats th:first-child { white-space: nowrap; } table#stats td, table#stats th { border: solid 1px #ddd; vertical-align: top; padding: 0.1em 0.3em 0.1em 0.3em; } table#stats td:first-child { text-align: right; }
% of Feeds Namespace URI
29.36% Dublin Core
15.71% XHTML
11.92% Blogger Atom API Extensions
11.88% Blogger Draft Extension
11.16% RSS 1.0 Content Module
8.39% Well-Formed Web Comment API
5.35% RSS 1.0 Administrative Module
3.85% FeedBurner Extensions
3.74% MSN Spaces
3.66% Slash
3.59% RSS 1.0 Syndication Module
2.50% iTunes
2.49% LiveJournal RSS Module 1.0
2.33% Dublin Core Terms
2.27% Microsoft Simple List Extensions
2.00% Yahoo Media RSS
1.24% RSS 1.0 Taxonomy Module
1.06% TrackBack Module for RSS 1.0/2.0
1.04% creativeCommons RSS Module
0.92% OpenSearch
0.68% Basic Geo (WGS84 lat/long) Vocabulary
0.54% Atom Threading
0.42% Creative Commons (RDF)
0.39% Technorati API
0.36% Google Calendar
0.31% Google GData
0.28% Feed History
0.28% eBay urn:ebay:apis:eBLBaseComponents
0.27% Pheed
0.23% RSS 1.0 Annotation Module
0.21% PRISM
0.18% Bulkfeeds
0.16% Atom Indexing urn:atom-extension:indexing
0.15% AOL Journals
0.14% Jive Forums
0.13% Yahoo! Weather
0.11% RSSWriter Manifest
0.11% FOAF Vocabulary
0.10% Feedster
0.10% Google Picasa Web
0.09% RSS 1.0 Link Module
0.09% Buzznet
0.09% Digg
0.09% PubSub
0.09% Snaplog PhotoBlog RSS extension
0.08% XSL
0.07% Hatena XML Namespace
0.07% iTunes Music Store
0.07% Furl
0.06% Google Base
0.06% Web Wiz Forums