My Woot! Google gadget
Though I have known about woot for a while now I have recently started tracking it on a daily basis and decided I needed a woot gadget for my google home. A look at the API reveals that google gadgets are quite easy to create, as you can see in the "Hello World" example below.
<?xml version="1.0" encoding="UTF-8" ?>
<Module>
<ModulePrefs title="hello world example" />
<Content type="html">
<![CDATA[
Hello, world!
]]>
</Content>
</Module>
Inside The CDATA section you can put whatever html you want. I want to put html containing the url of the woot main image, the title and the price. These are all easily scraped from the woot website. I used the HTML Agility pack found on codeplex
http://www.codeplex.com/htmlagilitypack
Using the HTML Agility pack allows you parse HTML documents using XPATH. This is an amazing intuitive mechanism for scraping and allowed me to get what I wanted in just a few minutes.
Code to retrieve the title, image path, and price listed below
HtmlWeb hw = new HtmlWeb();
string url = @"http://www.woot.com";
HtmlDocument doc = hw.Load(url);
HtmlNode ImageNode = doc.DocumentNode.SelectSingleNode("//img[@class='salePic']");
HtmlNode PriceNode = doc.DocumentNode.SelectSingleNode("//span[@id='PriceSpan']");
HtmlNode TitleNode = doc.DocumentNode.SelectSingleNode("//h3[@id='TitleHeader']");
string imgSrc = ImageNode.Attributes["src"].Value;
string priceSpan = PriceNode.InnerText;
string titleText = TitleNode.InnerText;
As you can see, using the HTML Agility pack takes all the pain out of screen scraping a website. Just write the out put to the CDATA section and you are done. I just did this in an aspx page by removing all the standard mark up and replacing it with the xml surrounding CDATA.
%@ Page Language="C#" AutoEventWireup="true" CodeBehind="wootGadget.aspx.cs" Inherits="MyGames.Gadgets.wootGadget" %>
<?xml version="1.0" encoding="UTF-8"?>
<Module>
<ModulePrefs title="woot!" />
<Content type="html"><![CDATA[
<% GetWoot(); %>
]]></Content>
</Module>
The code behind.
sing System;
using System.Collections;
using System.Configuration;
using System.Data;
using System.Linq;
using System.Web;
using System.Web.Security;
using System.Web.UI;
using System.Web.UI.HtmlControls;
using System.Web.UI.WebControls;
using System.Web.UI.WebControls.WebParts;
using System.Xml.Linq;
using HtmlAgilityPack;
namespace MyGames.Gadgets
{
public partial class wootGadget : System.Web.UI.Page
{
protected void Page_Load(object sender, EventArgs e)
{
}
public void GetWoot()
{
HtmlWeb hw = new HtmlWeb();
string url = @"http://www.woot.com";
HtmlDocument doc = hw.Load(url);
//HtmlNode node = doc.DocumentNode.SelectSingleNode("/html/body/div[position()=1]/div[position()=1]");
HtmlNode ImageNode = doc.DocumentNode.SelectSingleNode("//img[@class='salePic']");
HtmlNode PriceNode = doc.DocumentNode.SelectSingleNode("//span[@id='PriceSpan']");
HtmlNode TitleNode = doc.DocumentNode.SelectSingleNode("//h3[@id='TitleHeader']");
string imgSrc = ImageNode.Attributes["src"].Value;
string priceSpan = PriceNode.InnerText;
string titleText = TitleNode.InnerText;
Response.Write("<span><strong>woot!</strong></span><br />");
Response.Write("<a href='http://www.woot.com/' target='_blank' ><img src='" + imgSrc + "' height='150' style='border:none;float:left;' /></a>");
Response.Write("<span style='font-size:.8em;'>" + titleText + "</span><br />");
Response.Write("<span><strong>" + priceSpan + "</strong></span>");
}
}
}
No comments:
Post a Comment