PHP XML Expat Parser
The built-in Expat parser makes it
possible to process XML documents in PHP.
What
is XML?
XML is used to describe data and to
focus on what data is. An XML file describes the structure of the data.
In XML, no tags are predefined. You
must define your own tags.
What
is Expat?
To read and update - create and
manipulate - an XML document, you will need an XML parser.
There are two basic types of XML
parsers:
- Tree-based parser: This parser transforms an XML document into a tree structure. It analyzes the whole document, and provides access to the tree elements. e.g. the Document Object Model (DOM)
- Event-based parser: Views an XML document as a series of events. When a specific event occurs, it calls a function to handle it
The Expat parser is an event-based
parser.
Event-based parsers focus on the
content of the XML documents, not their structure. Because of this, event-based
parsers can access data faster than tree-based parsers.
Look at the following XML fraction:
<from>Jani</from>
An event-based parser reports the
XML above as a series of three events:
- Start element: from
- Start CDATA section, value: Jani
- Close element: from
The XML example above contains
well-formed XML. However, the example is not valid XML, because there is no
Document Type Definition (DTD) associated with it.
However, this makes no difference
when using the Expat parser. Expat is a non-validating parser, and ignores any
DTDs.
As an event-based, non-validating
XML parser, Expat is fast and small, and a perfect match for PHP web
applications.
Note: XML documents must be well-formed or Expat will generate an
error.
Installation
The XML Expat parser functions are
part of the PHP core. There is no installation needed to use these functions.
An
XML File
The XML file below will be used in
our example:
<?xml version="1.0"
encoding="ISO-8859-1"?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
Initializing
the XML Parser
We want to initialize the XML parser
in PHP, define some handlers for different XML events, and then parse the XML
file.
Example
<?php
//Initialize the XML parser
$parser=xml_parser_create();
//Function to use at the start of an element
function start($parser,$element_name,$element_attrs)
{
switch($element_name)
{
case "NOTE":
echo "-- Note --<br />";
break;
case "TO":
echo "To: ";
break;
case "FROM":
echo "From: ";
break;
case "HEADING":
echo "Heading: ";
break;
case "BODY":
echo "Message: ";
}
}
//Function to use at the end of an element
function stop($parser,$element_name)
{
echo "<br />";
}
//Function to use when finding character data
function char($parser,$data)
{
echo $data;
}
//Specify element handler
xml_set_element_handler($parser,"start","stop");
//Specify data handler
xml_set_character_data_handler($parser,"char");
//Open XML file
$fp=fopen("test.xml","r");
//Read data
while ($data=fread($fp,4096))
{
xml_parse($parser,$data,feof($fp)) or
die (sprintf("XML Error: %s at line %d",
xml_error_string(xml_get_error_code($parser)),
xml_get_current_line_number($parser)));
}
//Free the XML parser
xml_parser_free($parser);
?>
//Initialize the XML parser
$parser=xml_parser_create();
//Function to use at the start of an element
function start($parser,$element_name,$element_attrs)
{
switch($element_name)
{
case "NOTE":
echo "-- Note --<br />";
break;
case "TO":
echo "To: ";
break;
case "FROM":
echo "From: ";
break;
case "HEADING":
echo "Heading: ";
break;
case "BODY":
echo "Message: ";
}
}
//Function to use at the end of an element
function stop($parser,$element_name)
{
echo "<br />";
}
//Function to use when finding character data
function char($parser,$data)
{
echo $data;
}
//Specify element handler
xml_set_element_handler($parser,"start","stop");
//Specify data handler
xml_set_character_data_handler($parser,"char");
//Open XML file
$fp=fopen("test.xml","r");
//Read data
while ($data=fread($fp,4096))
{
xml_parse($parser,$data,feof($fp)) or
die (sprintf("XML Error: %s at line %d",
xml_error_string(xml_get_error_code($parser)),
xml_get_current_line_number($parser)));
}
//Free the XML parser
xml_parser_free($parser);
?>
The output of the code above will
be:
-- Note --
To: Tove
From: Jani
Heading: Reminder
Message: Don't forget me this weekend!
To: Tove
From: Jani
Heading: Reminder
Message: Don't forget me this weekend!
How it works:
- Initialize the XML parser with the xml_parser_create() function
- Create functions to use with the different event handlers
- Add the xml_set_element_handler() function to specify which function will be executed when the parser encounters the opening and closing tags
- Add the xml_set_character_data_handler() function to specify which function will execute when the parser encounters character data
- Parse the file "test.xml" with the xml_parse() function
- In case of an error, add xml_error_string() function to convert an XML error to a textual description
- Call the xml_parser_free() function to release the memory allocated with the xml_parser_create() function
PHP XML DOM
The built-in DOM parser makes it
possible to process XML documents in PHP.
What
is DOM?
The W3C DOM provides a standard set
of objects for HTML and XML documents, and a standard interface for accessing
and manipulating them.
The W3C DOM is separated into different parts (Core, XML, and HTML) and different levels (DOM Level 1/2/3):
* Core DOM - defines a standard set of objects for any structured document
* XML DOM - defines a standard set of objects for XML documents
* HTML DOM - defines a standard set of objects for HTML documents
The W3C DOM is separated into different parts (Core, XML, and HTML) and different levels (DOM Level 1/2/3):
* Core DOM - defines a standard set of objects for any structured document
* XML DOM - defines a standard set of objects for XML documents
* HTML DOM - defines a standard set of objects for HTML documents
XML
Parsing
To read and update - create and
manipulate - an XML document, you will need an XML parser.
There are two basic types of XML
parsers:
- Tree-based parser: This parser transforms an XML document into a tree structure. It analyzes the whole document, and provides access to the tree elements
- Event-based parser: Views an XML document as a series of events. When a specific event occurs, it calls a function to handle it
The DOM parser is an tree-based
parser.
Look at the following XML document
fraction:
<?xml version="1.0"
encoding="ISO-8859-1"?>
<from>Jani</from>
<from>Jani</from>
The XML DOM sees the XML above as a
tree structure:
- Level 1: XML Document
- Level 2: Root element: <from>
- Level 3: Text element: "Jani"
Installation
The DOM XML parser functions are
part of the PHP core. There is no installation needed to use these functions.
An
XML File
The XML file below will be used in
our example:
<?xml version="1.0"
encoding="ISO-8859-1"?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
Load
and Output XML
We want to initialize the XML
parser, load the xml, and output it:
Example
<?php
$xmlDoc = new DOMDocument();
$xmlDoc->load("note.xml");
print $xmlDoc->saveXML();
?>
$xmlDoc = new DOMDocument();
$xmlDoc->load("note.xml");
print $xmlDoc->saveXML();
?>
The output of the code above will
be:
Tove Jani Reminder Don't forget me
this weekend!
If you select "View
source" in the browser window, you will see the following HTML:
<?xml version="1.0"
encoding="ISO-8859-1"?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
The example above creates a
DOMDocument-Object and loads the XML from "note.xml" into it.
Then the saveXML() function puts the
internal XML document into a string, so we can output it.
Looping
through XML
We want to initialize the XML parser,
load the XML, and loop through all elements of the <note> element:
Example
<?php
$xmlDoc = new DOMDocument();
$xmlDoc->load("note.xml");
$x = $xmlDoc->documentElement;
foreach ($x->childNodes AS $item)
{
print $item->nodeName . " = " . $item->nodeValue . "<br />";
}
?>
$xmlDoc = new DOMDocument();
$xmlDoc->load("note.xml");
$x = $xmlDoc->documentElement;
foreach ($x->childNodes AS $item)
{
print $item->nodeName . " = " . $item->nodeValue . "<br />";
}
?>
The output of the code above will
be:
#text =
to = Tove
#text =
from = Jani
#text =
heading = Reminder
#text =
body = Don't forget me this weekend!
#text =
to = Tove
#text =
from = Jani
#text =
heading = Reminder
#text =
body = Don't forget me this weekend!
#text =
In the example above you see that
there are empty text nodes between each element.
When XML generates, it often
contains white-spaces between the nodes. The XML DOM parser treats these as
ordinary elements, and if you are not aware of them, they sometimes cause
problems.
SimpleXML handles the most common
XML tasks and leaves the rest for other extensions.
What
is SimpleXML?
SimpleXML is new in PHP 5. It is an
easy way of getting an element's attributes and text, if you know the XML
document's layout.
Compared to DOM or the Expat parser,
SimpleXML just takes a few lines of code to read text data from an element.
SimpleXML converts the XML document
into an object, like this:
- Elements - Are converted to single attributes of the SimpleXMLElement object. When there's more than one element on one level, they're placed inside an array
- Attributes - Are accessed using associative arrays, where an index corresponds to the attribute name
- Element Data - Text data from elements are converted to strings. If an element has more than one text node, they will be arranged in the order they are found
SimpleXML is fast and easy to use
when performing basic tasks like:
- Reading XML files
- Extracting data from XML strings
- Editing text nodes or attributes
However, when dealing with advanced
XML, like namespaces, you are better off using the Expat parser or the XML DOM.
Installation
As of PHP 5.0, the SimpleXML
functions are part of the PHP core. There is no installation needed to use
these functions.
Using
SimpleXML
Below is an XML file:
<?xml version="1.0"
encoding="ISO-8859-1"?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
We want to output the element names
and data from the XML file above.
Here's what to do:
- Load the XML file
- Get the name of the first element
- Create a loop that will trigger on each child node, using the children() function
- Output the element name and data for each child node
Example
<?php
$xml = simplexml_load_file("test.xml");
echo $xml->getName() . "<br />";
foreach($xml->children() as $child)
{
echo $child->getName() . ": " . $child . "<br />";
}
?>
$xml = simplexml_load_file("test.xml");
echo $xml->getName() . "<br />";
foreach($xml->children() as $child)
{
echo $child->getName() . ": " . $child . "<br />";
}
?>
The output of the code above will
be:
note
to: Tove
from: Jani
heading: Reminder
body: Don't forget me this weekend!
to: Tove
from: Jani
heading: Reminder
body: Don't forget me this weekend!