Introducing Deathshadow´s RSS Parser for PHP

So, you want an RSS parser for PHP that doesn´t suck? Well then, I have some good news for you, and no it´s not a bridge for sale in Brooklyn. One of my (online) friends, Jason "Deathshadow" Knight, made an RSS parser for PHP users that will not only cache an RSS feed every hour (or whenever you tell it to), but will also serve the cached copy to your site´s visitors, thereby reducing the bandwidth usage on the feeds´ host servers (RSS feeds are very bandwidth intensive, so using a cached copy of each feed the script parses will save the feed hosts a lot of bandwidth; something the feed owners will thank you for).

This RSS parser consists of three PHP scripts, each of which does a different job. The first PHP file is the actual Web page, and is pretty much bare (save for the script calls and the paths to the feeds themselves). Your Web page will be built around this file. The second script is the skin of the parser. This is the script that creates the HTML output (in this case a series of lists) of the parsed feeds. The third file is the functions library. This is what makes the entire parser tick. Each script file will be explained in depth below.

File 1: The Web Page ( name-of-web-page.php )

		
<?php 
	/* 
	Example of Deathshadow's RSS Reader 

	Shows the functionality of the RSS_Echo_Array function 

	Valid array values are: 
		url - self explanatory, ***MANDITORY*** 
		id - a unique ID to be assigned to the wrapping DIV, optional 
		class - class for the wrapping DIV, optional 
		limit - maximum number to show from that source 
		refresh - time in minutes before updating the cached copy. 
			a value of zero indicates do not cache 
			recommended value is 120 [default]. 

	Jason M. Knight (aka Deathshadow) Sept 2006 
	http://battletech.hopto.org 
	deathshadow60@hotmail.com 
	*/ 

	$settings['rss_skin']='rss_skin.php'; 

	require_once('rss_read.php'); 

	$rss_list=array( 
		array(    'url' => 'http://www.battlecorps.com/BC2/rss.php', 
						'id' => 'battlecorps', 
						'limit' => 4, 
						'refresh' => 60 
		), 
		 
		array(    'url' => 'http://www.camospecs.com/RSS.asp', 
						'id' => 'camospecs', 
						'limit' => 4, 
						'refresh' => 60 
		), 
		 
		array(    'url' => 'http://forums.classicbattletech.com/index.php?board=7;sa=news;type=rss;action=.xml', 
						'id' => 'cbt', 
						'limit' => 4, 
						'refresh' => 60 
		), 
			 
		array(    'url' => 'http://www.lordsofthebattlefield.com/news/index.xml', 
						'id' => 'lotb', 
						'limit' => 4, 
						'refresh' => 60 
		), 
		 
		array(    'url' => 'http://www.solaris7.com/Boards/rss.asp', 
						'id' => 'solaris', 
						'limit' => 4, 
						'refresh' => 60 
		) 
	); 
?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head> 
<title>Classic BattleTech RSS News Feed Parser (Beta)</title>
<meta http-equiv="imagetoolbar" content="no" /> 
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> 
<meta name="keywords" content="Classic BattleTech, News, RSS, Web Feed" /> 
<meta name="description" content="RSS News Feed Syndication for Classic BattleTech Web Sites" /> 
<link rel="stylesheet" type="text/css" href="stylesheet.css" media="screen" /> 
<style type="text/css" media="screen"> 
#camospecs { 
	clear: left; 
} 

#camospecs img { 
	display: block; 
	margin-top: 1em; 
	margin-bottom: -1em; 
} 

</style> 
</head> 
<body> 
<h1>Irian News Interstellar</h1> <!-- just a temporary header --> 
<em>A Division of Irian Media Interstellar</em>
<?php 

RSS_Echo_Array($rss_list); 

?> 
</body> 
</html>
		
	

This is the actual "Web page" that will contain the feeds. At the top is a snippet of PHP code which contains a couple "require" statements and an array.

The file names in the "require" statements are nothing more than relative links. You can change them to reflect your parser´s location on the server the same way you would link images, stylesheets, and scripts.

For example, if your parser files (which are coming later) are in a folder just below your root directory (where your home page is), you would use $DOCUMENT_ROOT . '/rss/ before the filename.

Here´s how the rss_skin would look (for example) with the use of the $DOCUMENT_ROOT variable:

		
$settings['rss_skin']=$DOCUMENT_ROOT . '/rss/rss_skin.php';
		
	

You´ll also notice that the array at the bottom of the PHP script (above the HTML code) lists the feeds, along with IDs and cache timers.

		
array(    'url' => 'http://www.solaris7.com/Boards/rss.asp', 
	'id' => 'solaris', 
	'limit' => 4, 
	'refresh' => 60 
)
		
	

This is a nested array which contains four array entries: The link to the feed itself (in this case, to the RSS feed on www.solaris7.com), a unique ID I gave to the feed (in this case "solaris" but it can be anything, as long as you follow the standard HTML rules for IDs), a limit to the number of feed entries (I set this feed to have four entries, you can have more or less, depending on your needs), and a refersh rate (in this case, 60 minutes - all refresh rates are in minutes).

Below all that is the HTML (or in this case, XHTML) code. This is where you put your Web page. If you're comfortable using HTML 4.01 instead of XHTML 1.0 Transitional, feel free to change it.

Just take note of the PHP code directly above the closing BODY tag. This is the function that calls the parser and generates the HTML version of the feeds (I asked Jason to use unordered lists).

You can call your Web page anything you want, as long as it has a .php extension instead of .htm or .html

Also please note that the embedded stylesheet was for local testing purposes only. That can be removed safely.

Next up, the RSS skin script.

File 2: The Parser Skin ( rss_skin.php )

		
<?php 
/* 
Skin template for Deathshadow's RSS Reader 
called by rss_read.php 
Jason M. Knight (aka Deathshadow) Sept 2006 
http://battletech.hopto.org 
deathshadow60@hotmail.com 
*/ 

$allow_tags='<b><i><u><em><strong><img><a><br><p><br/><br />'; 

function rss_header($item) { 
	global $allow_tags; 
	return 
		(array_key_exists('title',$item) 
			? "\t<h3>". 
				(array_key_exists('link',$item) 
					?    '<a href="'.$item['link'].'">' 
					: '' 
				). 
				strip_tags($item['title'],$allow_tags). 
				(array_key_exists('link',$item) 
					? '</a>' 
					: '' 
				)."</h3>\n" 
			: '' 
		). 
		(array_key_exists('description',$item) 
			? "\t<p>".strip_tags($item['description'],$allow_tags)."</p>\n" 
			: '' 
		). 
		"\t<ul>\n" 
	; 
} 

function rss_item($item) { 
	global $allow_tags; 
	return "\t\t<li>\n". 
		(array_key_exists('title',$item) 
			? "\t\t\t<h4>\n\t\t\t\t". 
				(array_key_exists('link',$item) 
					?    '<a href="'.$item['link'].'">'."\n\t\t\t\t\t" 
					: '' 
				). 
				strip_tags($item['title'],$allow_tags)."\n". 
				(array_key_exists('link',$item) 
					? "\t\t\t\t</a>\n" 
					: '' 
				)."\t\t\t</h4>\n" 
			: '' 
		). 
		(array_key_exists('author',$item) 
			?    "\t\t\tBy: ".strip_tags($item['author'],$allow_tags)."<br />\n" 
			: '' 
		). 
		(array_key_exists('pubdate',$item) 
			? "\t\t\tOn: ".strip_tags($item['pubdate'],$allow_tags)."<br />\n" 
			: '' 
		). 
		(array_key_exists('description',$item) 
			? "\t\t\t<p>".strip_tags($item['description'],$allow_tags)."</p>\n" 
			: '' 
		). 
	"\t\t</li>\n\n"; 
} 

function rss_footer() { 
	return "\t</ul>\n"; 
} 

?>
		
	

This is pretty straightforward. At the top of the script, is a single variable. This variable defines what HTML tags you will ALLOW other people to keep in their RSS feeds when it is parsed into your page. RSS wasn´t set up to allow people to do this, but everyone and their uncle does it anyway, so we have to protect ourselves from people inserting malicious code (like scripts) into RSS feeds that get run through parsers like this one.

It also helps because some people will use presentational HTML code, and this will help remove it. One of the tweaks Jason was supposed to make was to include a list of "permitted" HTML attributes as well, but either he never did it, or he hasn´t gotten it to me yet.

Nothing else needs to be changed. If you want to change something, send me a private message at SitePoint Forums with a copy of the web page that you want the parser to appear in, and I´ll make the appropriate changes for you.

If you choose to do it yourself though, I cannot accept any responsibility for any damage you may cause (standard disclaimer #1).

This file will go wherever you specified your RSS parser to go in (I HIGHLY recommend putting it in a folder called "rss" or "rss-parser" - you'll find out why in the next part).

File 3: The RSS Reader ( rss_read.php )

		
<?php 
/* 
Function Library for Deathshadow's RSS Reader 

require_once this library then call 
	function RSS_Parse_Url($rss_url,$limit) 
or 
	RSS_Echo_Array($rss_array) 
To parse RSS code. 

Jason M. Knight (aka Deathshadow) Sept 2006 
http://battletech.hopto.org 
deathshadow60@hotmail.com 
*/ 


/* 
	Do not override debug output here, 
	override AFTER you require_once 
	this library. 
*/ 
$debug=false; 

require_once($settings['rss_skin']); 

function RSS_Failure($rss_url,$reason) { 
	echo 'RSS News Feed for ',$rss_url,' Failed - Reason:',$reason,'<br />'; 
} 

function RSS_Parse_List($tag_list,$limit=0) { 
	$result=''; 
	$this_tag=array(); 
	$header=array(); 
	$in_item=false; 
	$in_channel=false; 
	$in_tag=false; 
	$retval=''; 
	$t=0; 
	$n=0; 
	while ($t<count($tag_list)) { 
		$element_tag=strtolower($tag_list[$t]['tag']); 
		$element_type=strtolower($tag_list[$t]['type']); 
		switch ($element_tag) { 
			case 'channel': 
				switch ($element_type) { 
					case 'open': 
						$in_channel=true; 
					break; 
					case 'close': 
						$in_channel=false; 
					break; 
				} 
			break; 
			case 'item': 
				switch ($element_type) { 
					case 'open': 
						array_splice($this_tag,0); 
						$in_item=true; 
					break; 
					case 'close': 
						$retval.=rss_item($this_tag); 
						array_splice($this_tag,0); 
						$in_item=false; 
						$n++; 
						if ($n==$limit) { 
							$t=count($tag_list); 
						} 
					break;     
				} 
			break; 
			default: 
				switch ($element_type) { 
					case 'open': 
						if ($in_item) { 
							if ($in_tag) { 
								$this_tag[$current_tag].='<'.$element_tag.'>'; 
							} else { 
								$in_tag=true; 
								$current_tag=$element_tag; 
								$this_tag[$current_tag]=$tag_list[$t]['value']; 
							} 
						} 
					break; 
					case 'close': 
						if ($in_item) { 
							if ($in_tag) { 
								if ($current_tag==$element_tag) { 
									$in_tag=false; 
								} else { 
									$this_tag[$current_tag].='</'.$element_tag.'>'; 
								} 
							} 
						} 
					break; 
					case 'cdata': 
						if ($in_tag) { 
							$this_tag[$current_tag].=$tag_list[$t]['value']; 
						} 
					break; 
					case 'complete': 
						if ($in_tag) { 
							$this_tag[$current_tag].='<'.$element_tag.'>'.$tag_list[$t]['value'].'</'.$element_tag.'>'; 
						} else if ($in_item) { 
							if (array_key_exists($element_tag,$this_tag)) { 
								$this_tag[$element_tag].=$tag_list[$t]['value']; 
							} else $this_tag[$element_tag]=$tag_list[$t]['value']; 
						} else if ($in_channel) { 
							if (array_key_exists($element_tag,$header)) { 
								$header[$element_tag].=$tag_list[$t]['value']; 
							} else $header[$element_tag]=$tag_list[$t]['value']; 
						} 
					break; 
				} 
			break; 
		} 
		$t++; 
	} 
	return rss_header($header).$retval.rss_footer(); 
} 

$bad_chars=array('://','/','\\',':','?','&','=','.',',','"',"'"); 

function RSS_Parse_Url($rss_url,$limit=0,$reefer=120) { 
	global $bad_chars,$debug; 
	 
	// we'll handle it from here 
	$old_reporting=error_reporting(E_ERROR | E_WARNING | E_PARSE); 
	 
	$cached=false; 
	if ($reefer>0) { 
		$safe_name='cache/'.str_replace($bad_chars,'_',$rss_url).'.cacheRSS'; 
		if (file_exists($safe_name)) { 
			$age=(time()-filemtime($safe_name))/60; 
			if ($age<$reefer) { 
				$cached=true; 
				$data=file_get_contents($safe_name); 
				if ($debug) echo '$debug - read from cache<br />'; 
			} 
		} 
	} 
	if (!$cached) { 
		$data=file_get_contents($rss_url); 
		if ($data) { 
			if ($debug) echo '$debug - read from source<br />'; 
			if ($reefer>0) { 
				$handle=fopen($safe_name,'w'); 
				if ($handle) { 
					fwrite($handle,$data); 
					fclose($handle); 
				} else if ($debug) RSS_Failure($rss_url,'Cache Write Failed'); 
			} 
		} 
	} 
	if ($data) { 
		$parser=xml_parser_create(); 
		xml_parser_set_option($parser,XML_OPTION_CASE_FOLDING,0); 
		xml_parser_set_option($parser,XML_OPTION_SKIP_WHITE,1); 
		xml_parse_into_struct($parser,$data,$results) or RSS_Failure($rss_url); 
		xml_parser_free($parser); 
		return RSS_Parse_List($results,$limit); 
	} else RSS_Failure($rss_url,'URL Read Failed'); 
	error_reporting($old_reporting); 
} 

function RSS_Echo_Array($rss_array) { 
	foreach ($rss_array as $rss_item) { 
		$limit=(array_key_exists('limit',$rss_item) ? $rss_item['limit'] : 0); 
		$reefer=(array_key_exists('refresh',$rss_item) ? $rss_item['refresh'] : 120); 
		echo 
			'<div'. 
			(array_key_exists('class',$rss_item) 
				? ' class="'.$rss_item['class'].'"' 
				: '' 
			). 
			(array_key_exists('id',$rss_item) 
				? ' id="'.$rss_item['id'].'"' 
				: '' 
			).">\n". 
			RSS_Parse_Url($rss_item['url'],$limit,$reefer). 
			"</div>\n" 
		; 
	} 
} 

?>
		
	

Ok, this is simple enough. Don´t change a thing. Not even the file name (should have told you not to change the filename on the last one either).

Just drop this in and let it do its job.

There is one other thing you need to do though. You need to create a folder INSIDE the folder your RSS skin and read files are held in, name the folder "cache" and give that folder full read-write permissions. On Apache systems, this is called a chmod, and is set to 777.

All you have to do is start up your FTP program, go to the folder you named "cache" and change its properties to 777.

If you need instructions on how to do that with your FTP program, tell me what program you use (Internet Explorer does not count, sorry) and I´ll see what I can do. I use FileZilla exclusively, so I can give you a very quick answer if that's what you use.