Sunday, November 30, 2008

My Woot! Google gadget

My Woot! Google gadget

Though I have known about woot for a while now I have recently started tracking it on a daily basis and decided I needed a woot gadget for my google home. A look at the API reveals that google gadgets are quite easy to create, as you can see in the "Hello World" example below.

<?xml version="1.0" encoding="UTF-8" ?>

<Module>

<ModulePrefs title="hello world example" />

<Content type="html">

<![CDATA[

Hello, world!

]]>

</Content>

</Module>

Inside The CDATA section you can put whatever html you want. I want to put html containing the url of the woot main image, the title and the price. These are all easily scraped from the woot website. I used the HTML Agility pack found on codeplex

http://www.codeplex.com/htmlagilitypack

Using the HTML Agility pack allows you parse HTML documents using XPATH. This is an amazing intuitive mechanism for scraping and allowed me to get what I wanted in just a few minutes.

Code to retrieve the title, image path, and price listed below



HtmlWeb hw = new HtmlWeb();



string url = @"http://www.woot.com";


HtmlDocument doc = hw.Load(url);



HtmlNode ImageNode = doc.DocumentNode.SelectSingleNode("//img[@class='salePic']");


HtmlNode PriceNode = doc.DocumentNode.SelectSingleNode("//span[@id='PriceSpan']");


HtmlNode TitleNode = doc.DocumentNode.SelectSingleNode("//h3[@id='TitleHeader']");



string imgSrc = ImageNode.Attributes["src"].Value;


string priceSpan = PriceNode.InnerText;


string titleText = TitleNode.InnerText;



As you can see, using the HTML Agility pack takes all the pain out of screen scraping a website. Just write the out put to the CDATA section and you are done. I just did this in an aspx page by removing all the standard mark up and replacing it with the xml surrounding CDATA.



%@ Page Language="C#" AutoEventWireup="true" CodeBehind="wootGadget.aspx.cs" Inherits="MyGames.Gadgets.wootGadget" %>


<?xml version="1.0" encoding="UTF-8"?>


<Module>


<ModulePrefs title="woot!" />


<Content type="html"><![CDATA[


<% GetWoot(); %>


]]></Content>


</Module>




The code behind.



sing System;


using System.Collections;


using System.Configuration;


using System.Data;


using System.Linq;


using System.Web;


using System.Web.Security;


using System.Web.UI;


using System.Web.UI.HtmlControls;


using System.Web.UI.WebControls;


using System.Web.UI.WebControls.WebParts;


using System.Xml.Linq;


using HtmlAgilityPack;



namespace MyGames.Gadgets


{


public partial class wootGadget : System.Web.UI.Page


{


protected void Page_Load(object sender, EventArgs e)


{



}




public void GetWoot()


{


HtmlWeb hw = new HtmlWeb();



string url = @"http://www.woot.com";


HtmlDocument doc = hw.Load(url);



//HtmlNode node = doc.DocumentNode.SelectSingleNode("/html/body/div[position()=1]/div[position()=1]");


HtmlNode ImageNode = doc.DocumentNode.SelectSingleNode("//img[@class='salePic']");


HtmlNode PriceNode = doc.DocumentNode.SelectSingleNode("//span[@id='PriceSpan']");


HtmlNode TitleNode = doc.DocumentNode.SelectSingleNode("//h3[@id='TitleHeader']");



string imgSrc = ImageNode.Attributes["src"].Value;


string priceSpan = PriceNode.InnerText;


string titleText = TitleNode.InnerText;



Response.Write("<span><strong>woot!</strong></span><br />");


Response.Write("<a href='http://www.woot.com/' target='_blank' ><img src='" + imgSrc + "' height='150' style='border:none;float:left;' /></a>");


Response.Write("<span style='font-size:.8em;'>" + titleText + "</span><br />");


Response.Write("<span><strong>" + priceSpan + "</strong></span>");


}



}


}