Sunday, November 30, 2008

My Woot! Google gadget

My Woot! Google gadget

Though I have known about woot for a while now I have recently started tracking it on a daily basis and decided I needed a woot gadget for my google home. A look at the API reveals that google gadgets are quite easy to create, as you can see in the "Hello World" example below.

<?xml version="1.0" encoding="UTF-8" ?>


<ModulePrefs title="hello world example" />

<Content type="html">


Hello, world!




Inside The CDATA section you can put whatever html you want. I want to put html containing the url of the woot main image, the title and the price. These are all easily scraped from the woot website. I used the HTML Agility pack found on codeplex

Using the HTML Agility pack allows you parse HTML documents using XPATH. This is an amazing intuitive mechanism for scraping and allowed me to get what I wanted in just a few minutes.

Code to retrieve the title, image path, and price listed below

HtmlWeb hw = new HtmlWeb();

string url = @"";

HtmlDocument doc = hw.Load(url);

HtmlNode ImageNode = doc.DocumentNode.SelectSingleNode("//img[@class='salePic']");

HtmlNode PriceNode = doc.DocumentNode.SelectSingleNode("//span[@id='PriceSpan']");

HtmlNode TitleNode = doc.DocumentNode.SelectSingleNode("//h3[@id='TitleHeader']");

string imgSrc = ImageNode.Attributes["src"].Value;

string priceSpan = PriceNode.InnerText;

string titleText = TitleNode.InnerText;

As you can see, using the HTML Agility pack takes all the pain out of screen scraping a website. Just write the out put to the CDATA section and you are done. I just did this in an aspx page by removing all the standard mark up and replacing it with the xml surrounding CDATA.

%@ Page Language="C#" AutoEventWireup="true" CodeBehind="wootGadget.aspx.cs" Inherits="MyGames.Gadgets.wootGadget" %>

<?xml version="1.0" encoding="UTF-8"?>


<ModulePrefs title="woot!" />

<Content type="html"><![CDATA[

<% GetWoot(); %>



The code behind.

sing System;

using System.Collections;

using System.Configuration;

using System.Data;

using System.Linq;

using System.Web;

using System.Web.Security;

using System.Web.UI;

using System.Web.UI.HtmlControls;

using System.Web.UI.WebControls;

using System.Web.UI.WebControls.WebParts;

using System.Xml.Linq;

using HtmlAgilityPack;

namespace MyGames.Gadgets


public partial class wootGadget : System.Web.UI.Page


protected void Page_Load(object sender, EventArgs e)



public void GetWoot()


HtmlWeb hw = new HtmlWeb();

string url = @"";

HtmlDocument doc = hw.Load(url);

//HtmlNode node = doc.DocumentNode.SelectSingleNode("/html/body/div[position()=1]/div[position()=1]");

HtmlNode ImageNode = doc.DocumentNode.SelectSingleNode("//img[@class='salePic']");

HtmlNode PriceNode = doc.DocumentNode.SelectSingleNode("//span[@id='PriceSpan']");

HtmlNode TitleNode = doc.DocumentNode.SelectSingleNode("//h3[@id='TitleHeader']");

string imgSrc = ImageNode.Attributes["src"].Value;

string priceSpan = PriceNode.InnerText;

string titleText = TitleNode.InnerText;

Response.Write("<span><strong>woot!</strong></span><br />");

Response.Write("<a href='' target='_blank' ><img src='" + imgSrc + "' height='150' style='border:none;float:left;' /></a>");

Response.Write("<span style='font-size:.8em;'>" + titleText + "</span><br />");

Response.Write("<span><strong>" + priceSpan + "</strong></span>");




No comments: