Friday, April 23, 2010

Syndication Formats Demystified (Kind Of)

CNET described the motivation of its creators as follows: "Winer's opponents are seeking a new format that would clarify RSS ambiguities, consolidate its multiple versions, expand its capabilities, and fall under the auspices of a traditional standards organization." (Wiki)

This statement led me to develop the following Equation:

Where:

  • C0 is the initial complexity of the problem
  • C1 is the final complexity of the problem
  • n is the number of engineers simplifying the problem
  • e is the exponential constant
C1 = C0en

This would seem to give us the nice e-curve that describes the situation.

To decode RSS files we must first understand the format and discover how they are unique. There are 8 formats of RSS divided into three distinct categories as follows. The names of each format links back to the specification page for future reference.

The RDF (or RSS 1.*) branch includes the following versions:

  • RSS 0.90 was the original Netscape RSS version. This RSS was called RDF Site Summary, but was based on an early working draft of the RDF standard, and was not compatible with the final RDF Recommendation.
  • RSS 1.0 is an open format by the RSS-DEV Working Group, again standing for RDF Site Summary. RSS 1.0 is an RDF format like RSS 0.90, but not fully compatible with it, since 1.0 is based on the final RDF 1.0 Recommendation.
  • RSS 1.1 is also an open format and is intended to update and replace RSS 1.0. The specification is an independent draft not supported or endorsed in any way by the RSS-Dev Working Group or any other organization.

The RSS 2.* branch (initially UserLand, now Harvard) includes the following versions:

  • RSS 0.91 is the simplified RSS version released by Netscape, and also the version number of the simplified version originally championed by Dave Winer from Userland Software. The Netscape version was now called Rich Site Summary; this was no longer an RDF format, but was relatively easy to use.
  • RSS 0.92 through 0.94 are expansions of the RSS 0.91 format, which are mostly compatible with each other and with Winer's version of RSS 0.91, but are not compatible with RSS 0.90.
  • RSS 2.0.1 has the internal version number 2.0. RSS 2.0.1 was proclaimed to be "frozen", but still updated shortly after release without changing the version number. RSS now stood for Really Simple Syndication. The major change in this version is an explicit extension mechanism using XML namespaces.

From Wikipedia ? http://en.wikipedia.org/wiki/RSS#Variants

Atom 1.0 

From Wikipedia ? http://en.wikipedia.org/wiki/Atom_(standard)#Initial_work

My next post will show and explain the code I used to detect what feed format I'm working with and how to use that information to add the Title, Link, and Description to the CRSSItems collection. Needless to say, the detection code is longer than the decoding code.  

Monday, April 19, 2010

A Class for RSS Data

The first goal of decoding the various forms of RSS files is to create a base class to encapsulate the functionality and data of the RSS file.  The basic structure of an RSS file is a channel with a title, link, description, and several items with titles, links,and descriptions.  This statement should make the initial structure of the base class obvious.  It is also good to note that other data besides these three fields can be present in the various formats.  For instance, the channel may contain a webmaster email address or other such items.  Such information is mostly metadata by definition.  We should add functionality for dealing with this data if we so choose in any instance of the class.  With this in mind, I created the following CRSSItem class in the Class Designer:


I decided to handle metadata by creating a Collection object to store the data as Name Value pairs and a Flag to tell the class whether that data should be captured or ignored.

Private m_col_metadata As Collection
Private m_hasmetadata As Boolean = False

The MetaInfo and SetMetaInfo Methods expose the m_col_metadata Collection to the calling application.  The MetaInfo method takes a key representing the name of metadata and returns the corresponding value from the Collection.

Public Function MetaInfo(ByVal Key As String) As String
       MetaInfo = m_col_metadata(Key)
End Function

The SetMetaInfo simply adds or replaces metadata info for a given key.

Public Sub SetMetaInfo(ByVal key As String, ByVal data As String)
     If m_col_metadata.Contains(key) Then
            m_col_metadata.Remove(key)
     End If
     m_col_metadata.Add(data, key)

End Sub

Not shown in the diagram are the OnError and Progress events I added to enhance communication to the calling app.  The OnError event allows the application to handle all errors from which the class can't correct itself.  The Progress event can be used to pass status information back to the app in the case of long process locking calls.  This allows for status bar updates and such.

Now that we have created our base class, we can use VB class inheritance to create a class to process the RSS feed and store the data into a collection of RSSItems.



CRSSFeed inherits from CRSSItem using the following statement:

Inherits CRSSItem

I added an Items property to expose the collection of RSS items to the calling application.  The collection of RSS Items is a Generic.Dictionary collection created like this:

Private m_col_items As New System.Collections.Generic.Dictionary(Of Integer, CRSSItem)


Public ReadOnly Property Items() As System.Collections.Generic.Dictionary(Of Integer, CRSSItem)
     Get
           Items = m_col_items
     End Get

End Property

This allows for a call to the Values member of Items for For...Each looping in the calling app.

I also added a Version property to relay what we find out about the version of the RSS file back.  There is also a Create method which takes a URL as an argument.  In upcoming posts I will explain the various formats of RSS files such as RDF, RSS 2.0, and Atom and we will develop our Create Method to put data from these files into the class structure.

Sunday, April 18, 2010

Is RSS Really That 'Simple'

XML provides the underlying framework of Web 2.0 applications. A key application of XML is website syndication or what was once termed 'push' technology. RSS stands for Real Simple Syndication and theoretically provides the following items in XML format:

  • Title
  • Link
  • Description

In practice, however, this idea has been extended ad infinitum due to the extensibility of RSS and XML.  In essence, there are about 9 ways to encapsulate these three items in an XML file.  This leads to a robust use of the concept without any bullet-proof method of decoding RSS files.  I have read many bulletin board postings made by programmers seeking the best ways of decoding RSS to create a reader application, but have found many of the answers leaving only more questions on the subject.  I believe the best approach to the problem is the Object Oriented solution.

A RSS Reader application in its most basic form would:

  1. Retrieve an RSS file from a website
  2. Display a list of Titles from the file
  3. Display the Description of a given Title
  4. Provide the Link back to the underlying web page

The VB.Net object XMLDocument has the ability to retrieve an XML file directly from the web, making the first step relatively simple.  Add the following to the General Declarations section of your class:

Imports System.Xml   

Then declare a member of the type XMLDocument as a member of your class:

Private m_XMLDocument As New XmlDocument

To load a file, call the Load method passing the RSS feed URL as an argument:

m_XMLDocument.Load("http://mydomain.com/myfeed.rss") ' Create an XML Document to work with

Now that the RSS feed is loaded into the XMLDocument object processing is not as straightforward as one would think.  This is because of the various versions of RSS files than can be used.  The solution to this problem would suggest an Object Oriented design approach.  In future posts on this blog I will share the object model I used to solve the problem, explain the various formats, explain how to find the various information in various formats, and show how to handle extension information as Meta Data within the classes using dictionaries based on the name spaces. 

My hopes are this blog will help programmers better utilize Object Oriented Design, build applications that deal with RSS effectively, and lead to a mastery of Web 2.0 to lead us into the creation of Web 3.0 applications.