XML Tutorial

From Free Pascal wiki
Revision as of 11:23, 9 August 2011 by BigChimp (talk | contribs) (FPC trunk writes encoding attribute)

Deutsch (de) English (en) español (es) français (fr) magyar (hu) Bahasa Indonesia (id) italiano (it) 日本語 (ja) 한국어 (ko) português (pt) русский (ru) 中文(中国大陆)‎ (zh_CN)

Introduction

The Extensible Markup Language (XML) is a W3C recommended language created to interchange information between different systems. It is a text based way to store information. Modern data interchange languages such as XHTML, as well as most WebServices technologies, are based on XML.

Currently there is a set of units that provides support for XML on Free Pascal. These units are called "XMLRead", "XMLWrite" and "DOM" and they are part of the Free Component Library (FCL) from the Free Pascal Compiler. The FCL is already on the default search path for the compiler on Lazarus, so you only need to add the units to your uses clause in order to get XML support. The FCL is not documented currently (October / 2005), so this short tutorial aims at introducing XML access using those units.

The XML DOM (Document Object Model) is a set of standardized objects that provide a similar interface for using XML on different languages and systems. The standard only specifies the methods, properties and other interface parts of the object, leaving the implementation free for different languages. The FCL currently fully supports the XML DOM 1.0.

Examples

Below there is a list of XML data manipulation examples with growing complexity. Units needed in order to compile the example code (and for any other XML code) are: DOM, XMLRead, XMLWrite, XMLCfg, XMLUtils, XMLStreaming. Not all of them are needed in every example, though.

Reading a text node

For Delphi Programmers: Note that when working with TXMLDocument, the text within a Node is considered a separate TEXT Node. As a result, you must access a node's text value as a separate node. Alternatively, the TextContent property may be used to retrieve content of all text nodes beneath the given one, concatenated together.

The ReadXMLFile procedure always creates a new TXMLDocument, so you don't have to create it beforehand. However, be sure to destroy the document by calling Free when you are done.

For instance, consider the following XML:

<xml><?xml version="1.0"?> <request>

 <request_type>PUT_FILE</request_type>
 <username>123</username>
 <password>abc</password>

</request></xml>

The following code example shows both the correct and the incorrect ways of getting the value of the text node (add the units XMLRead and DOM to the used units list):

<delphi>var

 PassNode: TDOMNode;
 Doc: TXMLDocument;

begin

 try
   // Read in xml file from disk
   ReadXMLFile(Doc, 'test.xml');
   // Retrieve the "password" node
   PassNode := Doc.DocumentElement.FindNode('password');
   // Write out value of the selected node
   WriteLn(PassNode.NodeValue); // will be blank
   // The text of the node is actually a separate child node
   WriteLn(PassNode.FirstChild.NodeValue); // correctly prints "abc"
   // alternatively
   WriteLn(PassNode.TextContent);
 finally
   // finally, free the document
   Doc.Free;
 end;

end;</delphi>

Note that ReadXMLFile(...) ignores all leading whitespace characters when parsing a document. The section whitespace characters describes how to keep them.

Printing the names of nodes and attributes

A quick note on navigating the DOM tree: When you need to access nodes in sequence, it is best to use FirstChild and NextSibling properties (to iterate forward), or LastChild and PreviousSibling (to iterate backward). For random access it is possible to use ChildNodes or GetElementsByTagName methods, but these will create a TDOMNodeList object which eventually must be freed. This differs from other DOM implementations like MSXML, because FCL implementation is object-based, not interface-based.

The following example shows how to print the names of nodes to a TMemo placed on a form.

Bellow is the XML file called 'test.xml':

<xml><?xml version="1.0"?> <images directory="mydir">

 <imageNode URL="graphic.jpg" title="">
   <Peca DestinoX="0" DestinoY="0">Pecacastelo.jpg1.swf</Peca>
   <Peca DestinoX="0" DestinoY="86">Pecacastelo.jpg2.swf</Peca>
 </imageNode>

</images></xml>

And here the Pascal code to execute the task:

<delphi>var

 Doc: TXMLDocument;
 Child: TDOMNode;
 j: Integer;

begin

 try
   ReadXMLFile(Doc, 'test.xml');
   Memo.Lines.Clear;
   // using FirstChild and NextSibling properties
   Child := Doc.DocumentElement.FirstChild;
   while Assigned(Child) do
   begin
     Memo.Lines.Add(Child.NodeName + ' ' + Child.Attributes.Item[0].NodeValue);
     // using ChildNodes method
     with Child.ChildNodes do
     try
       for j := 0 to (Count - 1) do
         Memo.Lines.Add(format('%s %s (%s=%s; %s=%s)',
                               [
                                 Item[j].NodeName,
                                 Item[j].FirstChild.NodeValue,
                                 Item[j].Attributes.Item[0].NodeName,  // 1st attribute details
                                 Item[j].Attributes.Item[0].NodeValue,
                                 Item[j].Attributes.Item[1].NodeName,  // 2nd attribute details
                                 Item[j].Attributes.Item[1].NodeValue
                               ]));
     finally
       Free;
     end;
     Child := Child.NextSibling;
   end;
 finally
   Doc.Free;
 end;

end;</delphi>

This will print:

imageNode graphic.jpg
Peca Pecacastelo.jpg1.swf (DestinoX=0; DestinoY=0)
Peca Pecacastelo.jpg2.swf (DestinoX=0; DestinoY=86)

Populating a TreeView with XML

One common use of XML files is to parse them and show their contents in a tree like format. You can find the TTreeView component on the "Common Controls" tab on Lazarus.

The function below will take a XML document previously loaded from a file or generated on code, and will populate a TreeView with it´s contents. The caption of each node will be the content of the first attribute of each node.

<delphi>procedure TForm1.XML2Tree(tree: TTreeView; XMLDoc: TXMLDocument); var

 iNode: TDOMNode;
 procedure ProcessNode(Node: TDOMNode; TreeNode: TTreeNode);
 var
   cNode: TDOMNode;
   s: string;
 begin
   if Node = nil then Exit; // Stops if reached a leaf
   
   // Adds a node to the tree
   if Node.HasAttributes and (Node.Attributes.Length>0) then
     s := Node.Attributes[0].NodeValue
   else
     s := ; 
   TreeNode := tree.Items.AddChild(TreeNode, s);
   // Goes to the child node
   cNode := Node.FirstChild;
   // Processes all child nodes
   while cNode <> nil do
   begin
     ProcessNode(cNode, TreeNode);
     cNode := cNode.NextSibling;
   end;
 end;
   

begin

 iNode := XMLDoc.DocumentElement.FirstChild;
 while iNode <> nil do
 begin
   ProcessNode(iNode, nil); // Recursive
   iNode := iNode.NextSibling;
 end;

end;</delphi>

Modifying a XML document

The first thing to remember is that TDOMDocument is the "handle" to the DOM. You can get an instance of this class by creating one or by loading a XML document.

Nodes on the other hand cannot be created like a normal object. You *must* use the methods provided by TDOMDocument to create them, and latter use other methods to put them on the correct place on the tree. This is because nodes must be "owned" by a specific document on DOM.

Below are some common methods from TDOMDocument:

<delphi>function CreateElement(const tagName: DOMString): TDOMElement; virtual; function CreateTextNode(const data: DOMString): TDOMText; function CreateCDATASection(const data: DOMString): TDOMCDATASection;

 virtual;

function CreateAttribute(const name: DOMString): TDOMAttr; virtual;</delphi>

CreateElement creates a new node.

CreateTextNode creates a value for a node.

CreateAttribute creates an attribute inside a node.

CreateCDATA creates a CDATA section: regular XML markup characters such as <> are not interpreted within the CDATA section. See Wikipedia article on CDATA

And here an example method that will locate the selected item on a TTreeView and then insert a child node to the XML document it represents. The TreeView must be previously filled with the contents of a XML file using the XML2Tree function.

<delphi>procedure TForm1.actAddChildNode(Sender: TObject); var

 position: Integer;
 NovoNo: TDomNode;

begin

 {*******************************************************************
 *  Detects the selected element
 *******************************************************************}
 if TreeView1.Selected = nil then Exit;
 if TreeView1.Selected.Level = 0 then
 begin
   position := TreeView1.Selected.Index;
   NovoNo := XMLDoc.CreateElement('item');
   TDOMElement(NovoNo).SetAttribute('nome', 'Item');
   TDOMElement(NovoNo).SetAttribute('arquivo', 'Arquivo');
   with XMLDoc.DocumentElement.ChildNodes do
   begin
     Item[position].AppendChild(NovoNo);
     Free;
   end;
   {*******************************************************************
   *  Updates the TreeView
   *******************************************************************}
   TreeView1.Items.Clear;
   XML2Tree(TreeView1, XMLDoc);
 end
 else if TreeView1.Selected.Level >= 1 then
 begin
   {*******************************************************************
   *  This function only works on the first level of the tree,
   *  but can easely modifyed to work for any number of levels
   *******************************************************************}
 end;

end;</delphi>

Create a TXMLDocument from a string

Given an XML document in string variable MyXmlString, the following code will create it's DOM:

<delphi>var

 S: TStringStream;
 XML: TXMLDocument;

begin

 S := TStringStream.Create(MyXMLString);
 try
   // Read complete XML document
   ReadXMLFile(XML, S);             
   // Alternatively: read only an XML Fragment
   ReadXMLFragment(AParentNode, S); 
 finally
   S.Free;
 end;

end;</delphi>

Validating a document

Since March 2007, DTD validation facility has been added to the FCL XML parser. Validation is checking that logical structure of the document conforms to the predefined rules, called Document Type Definition (DTD).

Here is an example of XML document with a DTD:

<xml><?xml version='1.0'?> <!DOCTYPE root [ <!ELEMENT root (child)+ > <!ELEMENT child (#PCDATA)> ]> <root>

 <child>This is a first child.</child>
 <child>And this is the second one.</child>

</root></xml>

This DTD specifies that 'root' element must have one or more 'child' elements, and that 'child' elements may have only character data inside. If parser detects any violations from these rules, it will report them.

Loading such document is slightly more complicated. Let's assume we have XML data in a TStream object:

<delphi>procedure TMyObject.DOMFromStream(AStream: TStream); var

 Parser: TDOMParser;
 Src: TXMLInputSource;
 TheDoc: TXMLDocument;

begin

 try
   // create a parser object
   Parser := TDOMParser.Create;
   // and the input source
   Src := TXMLInputSource.Create(AStream);
   // we want validation
   Parser.Options.Validate := True;
   // assign a error handler which will receive notifications
   Parser.OnError := @ErrorHandler;
   // now do the job
   Parser.Parse(Src, TheDoc);
   // ...and cleanup
 finally
   Src.Free;
   Parser.Free;
 end;

end;

procedure TMyObject.ErrorHandler(E: EXMLReadError); begin

 if E.Severity = esError then  // we are interested in validation errors only
   writeln(E.Message);

end;</delphi>

Whitespace characters

If you want to preserve leading whitespace characters in node texts, the above method is the way to load your XML document. Leading whitespace characters are ignored by default. That is the reason why the ReadXML(...) function never returns any leading whitespace characters in node texts. Before calling Parser.Parse(Src, TheDoc) insert the line

<delphi>Parser.Options.PreserveWhitespace := True;</delphi>

This will force the parser to return all whitespace characters. This includes all the newline characters that exist in an XML document to make it more readable!

Generating a XML file

Below is the complete code to write a XML file. (This was taken from a tutorial in the DeveLazarus blog) Please, remember to include the DOM and XMLWrite units in your uses clause.

<delphi>unit Unit1;

{$mode objfpc}{$H+}

interface

uses

 Classes, SysUtils, LResources, Forms, Controls, Graphics, Dialogs, StdCtrls,
 DOM, XMLWrite;

type

 { TForm1 }
 TForm1 = class(TForm)
   Button1: TButton;
   Label1: TLabel;
   Label2: TLabel;
   procedure Button1Click(Sender: TObject);
 private
   { private declarations }
 public
   { public declarations }
 end;
 

var

 Form1: TForm1;
 

implementation

{ TForm1 }

procedure TForm1.Button1Click(Sender: TObject); var

 Doc: TXMLDocument;                                  // variable to document
 RootNode, parentNode, nofilho: TDOMNode;                    // variable to nodes

begin

 try
   // Create a document
   Doc := TXMLDocument.Create;
   // Create a root node
   RootNode := Doc.CreateElement('register');
   Doc.Appendchild(RootNode);                           // save root node
 
   // Create a parent node
   RootNode:= Doc.DocumentElement;
   parentNode := Doc.CreateElement('usuario');
   TDOMElement(parentNode).SetAttribute('id', '001');       // create atributes to parent node
   RootNode.Appendchild(parentNode);                          // save parent node
   // Create a child node
   parentNode := Doc.CreateElement('nome');                // create a child node
   // TDOMElement(parentNode).SetAttribute('sexo', 'M');     // create atributes
   nofilho := Doc.CreateTextNode('Fernando');         // insert a value to node
   parentNode.Appendchild(nofilho);                         // save node
   RootNode.ChildNodes.Item[0].AppendChild(parentNode);       // insert child node in respective parent node

   // Create a child node
   parentNode := Doc.CreateElement('idade');               // create a child node
   // TDOMElement(parentNode).SetAttribute('ano', '1976');   // create atributes
   nofilho := Doc.CreateTextNode('32');               // insert a value to node
   parentNode.Appendchild(nofilho);                         // save node
   RootNode.ChildNodes.Item[0].AppendChild(parentNode);       // insert a childnode in respective parent node
   writeXMLFile(Doc, 'test.xml');                     // write to XML
 finally
   Doc.Free;                                          // free memory
 end;

end;

initialization

 {$I unit1.lrs}

end.</delphi>

The result will be the XML file below: <xml><?xml version="1.0"?> <register>

 <usuario id="001">
   <nome>Fernando</nome>
   <idade>32</idade>
 </usuario>

</register></xml>

--Fernandosinesio 22:28, 24 April 2008 (CEST)fernandosinesio@gmail.com

Encoding

Starting from SVN revision 12582, XML reader is able to process data in any encoding by using external decoders. See XML_Decoders for more details.

According to the XML standard, the encoding attribute in the first line of the XML is optional in case the actual encoding is UTF-8 (presumably no BOM - Byte Order Marker) or UTF-16 (UTF-16 BOM).

As of version 2.4 of FreePascal, there is an encoding property in a TXMLDocument, but it is ignored. WriteXMLFile always uses UTF-8 and doesn´t generate an encoding attribute in the first line of the XML file.

FPC versions from current trunk does explicitly write an UTF8 encoding attribute, as this is needed for some programs that cannot handle the XML without it.

See also

External Links