Code:Parsing an RSS document
Now that you have an understanding of RSS document structure, let's take a look at some code that will parse and display it.
The following code is a collection of five functions that were added to the Code Gallery by uncleozzy. We will take a look at them one by one and examine how they work.
Globals
The functions we are about to look at require some variables in the global scope. The first thing that must be done is to declare and initialize those variables.
$_item = array();
$_depth = array();
$_tags = array("dummy");
/* "dummy" prevents unecessary subtraction
* in the $_depth indexes */
Function initArray()
This function initializes the $_item array by ensuring that all the proper keys are in place and that they all point to an empty string. The author makes very liberal use of this function.
This function will be called each time an opening tag is found for an image, item, or channel. It is also used after the closing tag of each and at the onset of the entire parsing routine. (This may be tending towards overkill.)
{
global $_item;
$_item = array ("TITLE"=>"", "LINK"=>"",
"DESCRIPTION"=>"", "URL"=>"");
}
Function startElement()
When using the XML functionality of PHP, you must specify a function to be called each time an opening tag is encountered. This function serves that purpose.
As you can see by examining this function, if an opening tag is found for an item, channel, or image element, the initArray is called. Then, whether one of those opening tags was found or not, the $_depth array is incremented and the name of the opening tag is pushed onto the $_tags array.
{
global $_depth, $_tags, $_item;
if (($name=="ITEM") || ($name=="CHANNEL")
|| ($name=="IMAGE")) {
initArray();
}
$_depth[$parser]++;
array_push($_tags, $name);
}
Function endElement()
Just as you must specify a function to be called for opening tags, you must also specify a function that is called when a closing tag is found. The endElement function handles the actual display of our data. As each closing tag is encountered, this function displays the data that corresponds to that tag. First though, it pops the top element off of the $_tags array, then it decrements the $_depth array. As mentioned earlier, it also calls the initArray function after displaying the pertinent data.
{
global $_depth, $_tags, $_item;
array_pop($_tags);
$_depth[$parser]--;
switch ($name) {
case "ITEM":
echo "<p><a href=’{$_item['LINK']}’>" .
"{$_item['TITLE']}</a></p>\n";
initArray();
break;
case "IMAGE":
echo "<a href=’{$_item['LINK']}’>" .
"<DEFANGED_IMG src=’{$_item['URL']}’ " .
"alt=’{$_item['TITLE']}; border=’0’></a>\n<br />\n";
initArray();
break;
case "CHANNEL":
echo "<h3>{$_item['TITLE']}</h3>\n";
initArray();
break;
}
}
Function parseData()
This function is where the data is actually stored into the $_item array. When there is data that needs to be parsed, this function is invoked. Data is basically anything that is not an element tag.
The first thing this function does is determine if the data is only whitespace. If it is, it does not bother to store it in the array. If the data does contain useful information, it is stored in the $_item array.
{
global $_depth, $_tags, $_item;
$crap = preg_replace ("/\s/", "", $text);
/* is the data just whitespace?
if so, we don't want it! */
if ($crap) {
$text = preg_replace ("/^\s+/", "", $text);
/* get rid of leading whitespace */
if ($_item[$_tags[$_depth[$parser]]]) {
$_item[$_tags[$_depth[$parser]]] .= $text;
} else {
$_item[$_tags[$_depth[$parser]]] = $text;
}
}
}
Function parseRDF()
This function is the wrapper function for all the others. When using this function, you do not need to worry about calling any others.
It starts off by creating the parser, and then it calls the initArray function. The functions for handling the opening and closing tags, and the data itself, are then registered. Next, the function proceeds to open the file specified, parse it, and verify that it is a valid RSS document. The file is then closed and the memory from the parser freed.
{
global $_depth, $_tags, $_item;
$xml_parser = xml_parser_create();
initArray();
/* Set up event handlers */
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "parseData");
/* Open up the file */
$fp = fopen ($file, "r") or die ("Could not open $file for input");
while ($data = fread ($fp, 4096)) {
if (!xml_parse($xml_parser, $data, feof($fp))) {
die (sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code ($xml_parser)),
xml_get_current_line_number($xml_parser)));
}
}
fclose($fp);
xml_parser_free($xml_parser);
}
Example Use
As I mentioned earlier, using this set of functions is as easy as calling the parseRDF function. All you need to do is pass a URL for the RSS document, or a path to a local file..
<?php
parseRDF("http://www.zend.com/news.rss");
?>
|