Monday, May 3, 2010

Source Code to Detect RSS Feed Types Explained

The code may be hard to read here on Blogger, but a link to the full source is provided at the end of this article




This article assumes you have set up the CRSSFeed class from a previous post. However, this code can be easily adapted for any application with the change of a few variable names. I strongly recommend using an object oriented approach to this problem as it will enable you to RSS enable any other apps you may develop. For more info on the CRSSFeed VB.Net class click here.



After pouring over the specs for the various RSS Feeds out there, I determined that I first needed to find out 3 attributes of the Document Element to begin. These are xmlns, xmlns:rdf, and version. In fact, xmlns:rdf is so important, I may add a member variable of my class to store it for use when I build my dictionary of metadata later. Here is my first lines of RSS detection code:



Dim xmlns As String = m_XMLDocument.DocumentElement.GetAttribute("xmlns")
Dim xmlns_rdf As String = m_XMLDocument.DocumentElement.GetAttribute("xmlns:rdf")
Dim rss_version As String = m_XMLDocument.DocumentElement.GetAttribute("version")

Next, we need to process the name of the Document Element. The possible values are RDF:rdf, channel, rss, and feed. XML is case sensitive, so I do an LCase on the Document Element name. This yields the following Case structure:



Select Case LCase(m_XMLDocument.DocumentElement.Name)
Case "rdf:rdf", "channel"
' Process RSS 1.0 here
Case "rss"
' Process RSS 2.0 here
Case "feed"
' Process Atom 1.0 here
Case Else
' Unknown format handler
End Select

In the RSS 1.0 handler, we detect the possible versions of 1.0, 0.90 and 1.1 by processing the xmlns:rdf and xmlns attribute. Note the RDF namespace must be "http://www.w3.org/1999/02/22-rdf-syntax-ns#" for a valid RSS 1.0 type feed:



m_type = RSSType.RSSType_RDF ' Set the member variable to the RDF type of feed
' Check xmlns attribute for RSS version
If xmlns_rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#" Then
m_version = Switch(xmlns = "http://purl.org/rss/1.0/", "1.0", _
xmlns = "http://channel.netscape.com/rdf/simple/0.9/", "0.90", _
xmlns = "http://purl.org/net/rss1.1#", "1.1")
' RDF/RSS 1.0 spec http://web.resource.org/rss/1.0/spec#s5.2
' RSS 0.90 spec http://www.rssboard.org/rss-0-9-0
' RSS 1.1 spec http://inamidst.com/rss1.1/
m_channel.Title = m_XMLDocument.DocumentElement.GetElementsByTagName("title").Item(0).InnerText
m_channel.Link = m_XMLDocument.DocumentElement.GetElementsByTagName("link").Item(0).InnerText
m_channel.Description = m_XMLDocument.DocumentElement.GetElementsByTagName("description").Item(0).InnerText
DetectFeedType = Not IsNothing(m_version) ' make sure we had a valid XML Namespace
Else
DetectFeedType = False
End If

RSS 2.x and 0.91-0.94 are easier, just grab the version attribute:



m_type = RSSType.RSSType_RSS_2_x ' Set the memeber variable for RSS 2.0 type of feed
If Not rss_version = vbNull Then ' If the RSS 2 spec ever changes we can change our code here to detect
m_version = rss_version
m_channel.Title = m_XMLDocument.DocumentElement.GetElementsByTagName("title").Item(0).InnerText
m_channel.Link = m_XMLDocument.DocumentElement.GetElementsByTagName("link").Item(0).InnerText
m_channel.Description = m_XMLDocument.DocumentElement.GetElementsByTagName("description").Item(0).InnerText
DetectFeedType = True
End If

In this code, I just check for Atom 1.0. If for some reason you find you need to check for 0.3 or so forth, do so here. I don't bother with 0.3 because it is obsolete, but who knows what you might find out on the web. Note that there can be more than one link so we need to process the rel attribute to find the one that has the value of "alternate" and is a child of the element "feed":



Dim node As XmlNode
m_type = RSSType.RSSType_Atom_1_x
m_version = "1.0"
m_channel.Title = m_XMLDocument.DocumentElement.GetElementsByTagName("title").Item(0).InnerText
' There can be more than one link in an Atom 1.0 header. Need to find The one where rel="alternate"
For Each node In m_XMLDocument.DocumentElement.GetElementsByTagName("link")
If node.Attributes.GetNamedItem("rel").Value = "alternate" And node.ParentNode.Name = "feed" Then
m_channel.Link = node.Attributes.GetNamedItem("href").Value
Exit For ' Found it, let's bail...
End If
Next
m_channel.Description = m_XMLDocument.DocumentElement.GetElementsByTagName("subtitle").Item(0).InnerText
DetectFeedType = True
' TODO: Find Atom Verson here. As code stands now assumes 1.0
' Atom 1.0 spec http://tools.ietf.org/html/rfc4287#section-1.1

Now we set our version string for our object and we are done:



If DetectFeedType Then
m_version = Choose(m_type, "RDF", "RSS", "ATOM") & " " & m_version
Title = m_channel.Title
Link = m_channel.Link
Description = m_channel.Description
End If

The next post will be putting the link, title and description in the exposed properties of our object. Now that we have detected the feed type, processing the rest of the feed is a snap. Until next time, Happy Computing!!!


To view the full code, click here.