Tuesday, October 6, 2009

Using Magpie to parse Blogger feeds

I recently redesigned my website and one thing I changed was putting my blog on the front page instead of just linking to it. But what's cool (at least, according to my own geeky tastes) is that my blog is hosted on Blogger--not on my website! In this post, I'm going to walk through how I did this.

1. Blogger setup
First, you need to make sure your Blogger feeds are configured to include the full content of each blog post (as opposed to just the first paragraph or whatever). I think this is the default setting, but to make sure, to go the Settings page and click on the "Site Feed" tab. The "Allow Blog Feeds" option should be set to "Full".



2. Atom feed URL
Now, you must get the URL of your blog's Atom feed. Go to your blog's homepage and view the source (in Firefox, this is under "View > Page Source"). Look for a "link" tag that looks like the one below, and grab the value of its "href" attribute.
<link rel="alternate" type="application/atom+xml" title="mangstacular - Atom" href="http://mangstacular.blogspot.com/feeds/posts/default" />

Note: In this tutorial, I use the Atom feed. The RSS feed also has all the same information--it's just arranged differently.


3. Magpie setup
Download Magpie. This is a wonderfully easy-to-use RSS/Atom parser written in PHP which we'll use to parse the Atom feed.

You'll want to configure Magpie to cache the Atom feed so that it only downloads it when your blog changes in some way (i.e. when there's a new post or a new comment). This way, your website won't have to download the feed from Blogger every time someone visits your page. Open the "rss_fetch.inc" file and add these two lines somewhere at the top:
define('MAGPIE_CACHE_ON', true);
define('MAGPIE_CACHE_DIR', 'cache');

This will turn on caching and instruct Magpie to save the cached Atom file in the directory you specify.

Note: Be sure that the permissions of the cache directory allow your web server to write to it. The way you do this varies from server to server, based on the user that your PHP process runs under, but here are the commands that I had to run:
chmod 775 cache
chgrp web cache

What you do NOT want to do is set the folder to be globally writable. While this would work, it would also allow anyone on the Internet to write to that directory--not a good thing.


4. Parse the Feed
Now you're ready to write the code that fetches and parses the feed. Simply calling Magpie's "fetch_rss" function will parse the feed and return all the data in an associative array (despite "rss" being in the function name, it will also parse Atom feeds). No need to deal with any XML. Below is some sample code.

Note: There are a couple quirks, which may be Blogger-specific--be sure to read the code comments.

require_once('magpierss/rss_fetch.inc');

$atom = fetch_rss('http://mangstacular.blogspot.com/feeds/posts/default');
foreach ($atom->items as $item){
//var_dump($item); //see all the stuff that's in each item

$date = date('F j, Y', strtotime($item['published']));
$title = $item['title'];
$content = $item['atom_content']; //no need to run html_entity_decode() or anything
$url = $item['link'];

//because there are two <link> tags whose "rel" attributes are the same...
//...it stuffs the "href" attributes from both tags into this one string...
//...so you must extract the URL you want...
//...which in my case is the URL to the comments page
$commentsUrl = substr($item['link_replies'], strpos($item['link_replies'], 'https'));

$numberOfComments = $item['thr']['total'];

//generate HTML for the entry
//...
}


And now it looks like you're running fancy Wordpress software! ;)

No comments: