Microsoft.NET

……………………………………………….Expertise in .NET Technologies

Working with XPath in .NET

Posted by Ravi Varma Thumati on May 8, 2009

What is XPath?

 “XPath,” it is a language for finding information in an XML document. XPath is used to navigate through elements and attributes in an XML document in a very easy and effective manner.  It is not something like “find and replace” in notepad or other word processing applications.  “XPath” has its own rules, structure, syntax and several other strict issues to work with.  But if we really focus and understand “XPath” from scratch, all of those strict issues become very easy.

First of all, is “XPath” necessary or compulsory?  To answer a question with another question is SQL compulsory? Can’t we develop applications without SQL at all? I think you can already guess my answer.  To be frank, “XPath” is not compulsory.  You can still achieve everything (working with an XML document) without working with “XPath” at all.  But, one should consider some of the common issues in application development, such as ease, effectiveness, speed, productivity, simplicity, and so on.  All of those are available with “XPath” when working with XML documents

Another important issue to consider is “XPath” is a “NON-XML language.”  This is one of the most critical confusions among many application developers.  It is just a language for querying XML documents, not XML itself.  Because XPath is an abstract language, it can be used in many environments.  It’s heavily used throughout XSL Transformations (XSLT) to identify nodes in the input document (XML document).  It’s also used in most Document Object Model (DOM) implementations for richer querying capabilities.

Working with XPath: – What is inside XPath?

Everybody knows that XML is nothing but a tree of several related (and structured) nodes of textual information.  XPath is a language for picking nodes and sets of nodes out of this tree. From the perspective of XPath, there are seven kinds of nodes:

  • The root node
  • Element nodes
  • Text nodes
  • Attribute nodes
  • Comment nodes
  • Processing instruction nodes
  • Namespace nodes

Those are not new buzzwords to any developer who knows XML.  Everybody knows that “root” refers to the topmost element within the XML document.  All other nodes are comprised of “elements.”  Every element contains information either in the form of text or attribute. Commenting is also allowed in an XML document.  These are a bit synonymous to XPath as well, but a bit different in certain aspects.

The XPath data model has several features that are not obvious. First, the tree’s root node is not the same as its root element. The tree’s root node contains the entire document, including the root element and comments and processing instructions that occur before the root element start tag or after the root element end tag.  The XPath data model does not include everything in the document. In particular, the XML declaration and DTD are not addressable via XPath. However, if the DTD provides default values for any attributes, then XPath recognizes those attributes. 

Finally, “xmlns” attributes are reported as namespace nodes. They are not considered attribute nodes, though a non-namespace aware parser will see them as such. Furthermore these nodes are attached to every element and attribute node for which that declaration has scope. They are not just attached to the single element where the namespace is declared.

XPath uses path expressions to select nodes or node-sets in an XML document. The simplest expression (or location path) is the one that selects the document’s root node. This path is simply the forward slash /. (You’ll notice that a lot of XPath syntax was deliberately chosen to be similar to the syntax used by the Unix shell. Here / is the root of a Unix filesystem and / is the root node of an XML document.) These path expressions look very much like the expressions you see when you work with a traditional computer file system.

XPath also includes over 100 built-in functions. There are functions for string values, numeric values, date and time comparison, node and QName manipulation, sequence manipulation, Boolean values, and more.

Working with XPath: – XPath with a simple example

The XPath type system is very simple, as you can observe from the following:

  • Node-set (A collection of nodes without duplicates)
  • Boolean (true or false)
  • Number (integers or floating point numbers)
  • String (sequence of characters)

Let us consider a small XML document (invoice.xml) containing the following information:

<invoice id=’123’>
       <item>
           <sku>100</sku>
           <price>9.95</price>
       </item>
       <item>
           <sku>101</sku>
           <price>29.95</price>
       </item>
</invoice>

The hierarchy would start with “root” (just consider “/”) and then only continues with “invoice” (and further with item, sku, price and so on). The following XPath expression identifies the two “price” elements:

/invoice/item/price

The above type of expression is called a “location path”.  Location path expressions look like file system paths, only they navigate through the XPath tree model to identify a set of nodes (known as a node-set).  A location path expression yields a node-set.  Location paths can be absolute or relative.  Absolute location paths begin with a forward slash (/) whereas relative location paths do not.

XPath can be used with a variety of XML processors including MSXML DOM, .NET, JAXP and so on.  The following is a simple JavaScript (based on MSXML DOM) to search for elements in an XML document using XPath:

var nl = doc.selectNodes(“/invoice/item/price”);
for (i=0;i<nl.length;i++)
{
//do some processing here
}

Working with XPath: – XPath related classes in .NET

NET framework provides full support to XML with the “System.XML” namespace.  If we need to work with XPath, the following classes would be a bit helpful:

  • XPathNavigator
  • XPathNodeIterator
  • XPathExpression
  • XPathDocument
  • XPathException

The “XPathNavigator” class allows you to define a read-only, random access cursor on a data store. The “XPathNodeIterator” class enables you to iterate a set of nodes that you select by calling an XPath method. The “XPathExpression” class encapsulates a compiled XPath expression. An XPathExpression object is returned when you call the Compile method. The Select, Evaluate, and Matches methods use this class. The “XPathDocument” class provides a read-only cache for fast and highly optimized processing of XML documents using XSLT. “XPathException” is the exception that is thrown when an error occurs during the processing of an XPath expression.

Apart from the above, there exists one more interface, “IXPathNavigable”.  This interface enables you to create an XPathNavigator class. The classes that implement this interface enable you to create navigators using the CreateNavigator method.

To create an XPathNavigator object for an XML document, you use the CreateNavigator method of the XmlNode and XPathDocument classes, which implements the IXPathNavigable interface. The CreateNavigator method returns an XPathNavigator object. You can then use the XPathNavigator object to perform XPath queries. You can use XPathNavigator to select a set of nodes from any data store that implements the IXPathNavigable interface. A data store is the source of data, which may be a file, a database, an XmlDocument object, or a DataSet object. You can also create your own implementation of the XPathNavigator class that can query other data stores.

The XPathNavigator object reads data from an XML document by using a cursor that enables forward and backward navigation within the nodes. In addition, XPathNavigator provides random access to nodes. However, because the cursor that the XPathNavigator object uses is read-only, you cannot edit an XML document by using the XPathNavigator object.

Working with XPath: – Examining XPath with a simple VB.NET/C# example

You can use the Select method of the XPathNavigator object to select the set of nodes from any store that implements the IXPathNavigable interface. The Select method returns an object of the XPathNodeIterator class. You can then use the object of the XPathNodeIterator class to iterate through the selected nodes.

After you have an XPathNodeIterator object, you can navigate within the selected set of nodes. The following code displays how to create an XPathNavigator object on an XML document, select a set of nodes by using the Select method, and iterate through the set of nodes.

Imports System.Xml

Imports System.Xml.XPath

.
.
Dim Doc As XPathDocument = New XPathDocument(“invoice.xml”)
Dim Navigator As XPathNavigator
Navigator = Doc.CreateNavigator()
Dim Iterator As XPathNodeIterator = Navigator.Select(“/invoice/item/price”)
While Iterator.MoveNext()
    Console.WriteLine(Iterator.Current.Name)
    Console.WriteLine(Iterator.Current.Value)
End While

The C# version of the above will be very similar, as you can see from the following:

using System.Xml;

using System.Xml.XPath;

.
.
XPathDocument Doc = new XPathDocument(“invoice.xml”);
XPathNavigator navigator = Doc.CreateNavigator();
XPathNodeIterator iterator = navigator.Select(“/invoice/item/price”);
while (iterator.MoveNext())
{
    Console.WriteLine(iterator.Current.Name);
    Console.WriteLine(iterator.Current.Value);
}

Advertisements

One Response to “Working with XPath in .NET”

  1. I do know this isn’t exactly on topic, but i’ve a site using the same program as effectively and i am getting troubles with my feedback displaying. is there a setting i’m missing? it’s doable it’s possible you’ll assist me out? thanx.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: