Login
Register

Home

Trainings

Fusion Blog

EBS Blog

Authors

CONTACT US

Fusion Blog
  • Register

Oracle Gold Partners, our very popular training packages, training schedule is listed here
Designed by Five Star Rated Oracle Press Authors & Oracle ACE's.

webinar new

Search Courses

 

Objective:
In the previous article XML, WEB SERVICES AND API'S IN JAVA we learned  about Role of XML in java platform, Introducing web services concept, Web services standard, API’s and tools to develop Java Services. In this article we will learn about the parsing an XML file using SAX.

Parsing an XML file using SAX:
In real-life applications, you will want to use the SAX parser to process XML data and do something useful with it. This section examines an example JAXP program, SAXLocalNameCount, that counts the number of elements using only the localName component of the element, in an XML document. Namespace names are ignored for simplicity. This example also shows how to use a SAX ErrorHandler.

Creating the Skeleton:
The SAXLocalNameCount program is created in a file named SAXLocalNameCount.java.

public class SAXLocalNameCount {
   static public void main(String[] args) {
       // ...
   }
}

Because you will run it standalone, you need a main() method. And you need command-line arguments so that you can tell the application which file to process.

Importing Classes:
The import statements for the classes the application will use are the following.

package sax;
import javax.xml.parsers.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;

import java.util.*;
import java.io.*;

public class SAXLocalNameCount {
   // ...
}

The javax.xml.parsers package contains the SAXParserFactory class that creates the parser instance used. It throws a ParserConfigurationException if it cannot produce a parser that matches the specified configuration of options. (Later, you will see more about the configuration options). The javax.xml.parsers package also contains the SAXParser class, which is what the factory returns for parsing. The org.xml.sax package defines all the interfaces used for the SAX parser. The org.xml.sax.helpers package contains DefaultHandler, which defines the class that will handle the SAX events that the parser generates. The classes in java.util and java.io, are needed to provide hash tables and output.

Setting Up I/O:
The first order of business is to process the command-line arguments, which at this stage only serve to get the name of the file to process. The following code in the main method tells the application what file you want SAXLocalNameCountMethod to process.

static public void main(String[] args) throws Exception {
   String filename = null;

   for (int i = 0; i < args.length; i++) {
       filename = args[i];
       if (i != args.length - 1) {
           usage();
       }
   }

   if (filename == null) {
       usage();
   }
}

This code sets the main method to throw an Exception when it encounters problems, and defines the command-line options which are required to tell the application the name of the XML file to be processed. Other command line arguments in this part of the code will be examined later in this lesson, when we start looking at validation.

The filename String that you give when you run the application will be converted to a java.io.File URL by an internal method, convertToFileURL(). This is done by the following code inSAXLocalNameCountMethod.

public class SAXLocalNameCount {
   private static String convertToFileURL(String filename) {
       String path = new File(filename).getAbsolutePath();
       if (File.separatorChar != '/') {
           path = path.replace(File.separatorChar, '/');
       }

       if (!path.startsWith("/")) {
           path = "/" + path;
       }
       return "file:" + path;
   }

   // ...
}
If the incorrect command-line arguments are specified when the program is run, then the SAXLocalNameCount application's usage() method is invoked, to print out the correct options onscreen.

private static void usage() {
   System.err.println("Usage: SAXLocalNameCount <file.xml>");
   System.err.println("       -usage or -help = this message");
   System.exit(1);
}

Further usage() options will be examined later in this lesson, when validation is addressed.

Implementing the ContentHandler Interface

The most important interface in SAXLocalNameCount is ContentHandler. This interface requires a number of methods that the SAX parser invokes in response to various parsing events. The major event-handling methods are: startDocument, endDocument, startElement, and endElement.

The easiest way to implement this interface is to extend the DefaultHandler class, defined in the org.xml.sax.helpers package. That class provides do-nothing methods for all the ContentHandlerevents. The example program extends that class.

public class SAXLocalNameCount extends DefaultHandler {
   // ...
}

Handling Special Characters:

In XML, an entity is an XML structure (or plain text) that has a name. Referencing the entity by name causes it to be inserted into the document in place of the entity reference. To create an entity reference, you surround the entity name with an ampersand and a semicolon:
&entityName;
When you are handling large blocks of XML or HTML that include many special characters, you can use a CDATA section. A CDATA section works like <code>...</code> in HTML, only more so: all white space in a CDATA section is significant, and characters in it are not interpreted as XML. A CDATA section starts with <![[CDATA[ and ends with ]]>.

An example of a CDATA section, taken from the sample XML file install-dir/jaxp-1_4_2-release-date/samples/data/REC-xml-19980210.xml, is shown below.

<p><termdef id="dt-cdsection" term="CDATA Section"<<term>CDATA sections</term> may occur anywhere character data may occur; they are used to escape blocks of text containing characters which would otherwise be recognized as markup. CDATA sections begin with the string "<code>&lt;![CDATA[</code>" and end with the string "<code>]]&gt;</code>"

Once parsed, this text would be displayed as follows:

CDATA sections may occur anywhere character data may occur; they are used to escape blocks of text containing characters which would otherwise be recognized as markup. CDATA sections begin with the string "<![CDATA[" and end with the string "]]>".

The existence of CDATA makes the proper echoing of XML a bit tricky. If the text to be output is not in a CDATA section, then any angle brackets, ampersands, and other special characters in the text should be replaced with the appropriate entity reference. (Replacing left angle brackets and ampersands is most important, other characters will be interpreted properly without misleading the parser.) But if the output text is in a CDATA section, then the substitutions should not occur, resulting in text like that in the earlier example. In a simple program such as ourSAXLocalNameCount application, this is not particularly serious. But many XML-filtering applications will want to keep track of whether the text appears in a CDATA section, so that they can treat special characters properly.

Setting up the Parser:

The following code sets up the parser and gets it started:

static public void main(String[] args) throws Exception {

   // Code to parse command-line arguments
   //(shown above)
   // ...

   SAXParserFactory spf = SAXParserFactory.newInstance();
   spf.setNamespaceAware(true);
   SAXParser saxParser = spf.newSAXParser();
}

These lines of code create a SAXParserFactory instance, as determined by the setting of the javax.xml.parsers.SAXParserFactory system property. The factory to be created is set up to support XML namespaces by setting setNamespaceAware to true, and then a SAXParser instance is obtained from the factory by invoking its newSAXParser() method.




Varun Kapila

Add comment


Security code
Refresh

About the Author

Varun Kapila

Search Trainings

Fully verifiable testimonials

Apps2Fusion - Event List

<<  Apr 2024  >>
 Mon  Tue  Wed  Thu  Fri  Sat  Sun 
  1  2  3  4  5  6  7
  8  91011121314
15161718192021
22232425262728
2930     

Enquire For Training

Fusion Training Packages

Get Email Updates


Powered by Google FeedBurner