XPath support for the Html Agility Pack on Windows Phone

The latest break both Urban Dictionary and Fail++ had were caused by HTML markup changes on the respective sites.

I kept putting off being able to push parsing logic from my server because I was lazy and kept thinking “I will do it next release”. Well – the changes caught me with my pants down – the inability to update parsing “on the fly” required me to rush and publish a new app and wait for a few days until the marketplace updated it. A great option for getting data off HTML is RegEx – the Windows Phone platform supports it and it works pretty well… If you are smart enough or patient enough to use it on HTML. I am neither.

So instead, I decided to add XPath support for the great Html Agility Pack parser that exists on CodePlex. I understand XPath and I generally don’t feel like my brain explodes when I author it.

Here’s the repackaged code for your consumption. There are a few caveats:

  1. It does not support all axis types – specifically, it does not support “previous node”.
  2. It does not support all XPath functions – you can see a full list in the FunctionNode.cs file.
  3. It does support a few functions that are not in the xpath spec. Some string parsing functions, RegEx support and .OuterHtml/.InnerHtml support among others.
  4. It probably won’t pass any XPath tests, but then again, it’s working on HTML and not XPath, so I am not to worried about it.

I am going to upload this to CodePlex as soon as it’s back online.

The sample app inside the solution has a few simple examples, but overall, it’s what you would expect.

Oh, also, I didn’t write a lot of comments – sorry about that.

This project uses the XPathParser project on codeplex as source code and references version 1.4 of HAP (the binary is in the package – I did not test it with more recent versions).

Download

Advertisements
This entry was posted in Dev, Windows Phone and tagged , , , . Bookmark the permalink.

21 Responses to XPath support for the Html Agility Pack on Windows Phone

  1. Pingback: XPath support for the Html Agility Pack on Windows Phone

  2. Pingback: XPath support for the Html Agility Pack on Windows Phone – www.nalli.net

  3. lars says:

    heey,
    i download it and it is brilliant,
    but i got a few quenstions:
    first the selectnodes doesn’t work
    and what is the socialebola.lib.haphelper in the test file

    thanks
    Lars

  4. lars says:

    can i send you a piece of code were iam working on but it won’t i can’t get the xpath working for it?

  5. Johnny says:

    awesome.

    until Microsoft provides official support for xpath in windows phone, i will use yours

  6. Any plans to update codeplex with this code?

  7. Pingback: Some RegEx stuff « cartesian product

  8. Ambious says:

    Hello from the future.
    When I try to load a document (like so):
    HtmlDocument document = new HtmlDocument();
    .Load(strResult);
    I get an error:
    Attempt to access the method failed: System.IO.StreamReader..ctor(System.String, System.Text.Encoding)

    Any idea what I’m doing wrong?

    • SocialEbola says:

      Hello! Have you invented flying cars yet? If not… WHAT THE HELL ARE YOU WAITING FOR?!

      As for loading HTML…

      Try calling .LoadHtml() – .Load loads a file.

      • Ambious says:

        Thank you for answering.
        The flying cars aren’t happening because they run on renewable energy and the oil companies bought all the patents and prototypes and shot them into the sun.
        As for thee problem I’m having 😛
        I tried .LoadHtml() but then strangely I got an empty stream (or at least it seemed empty).
        Either I’m not manipulating it right or something else is wrong, but I know this is pretty obscure so I’ll try to to toy around with some more and report back if I make any progress.
        Thanks for the heads up anyway.

  9. Ambious says:

    Ok, I managed to ‘load’ it I’m just having trouble parsing it. I’m still mastering Xpath.
    Anyway, I now have a different show-stopper issue: Encoding.
    The HTML I load contains Hebrew text (encoding is Windows-1255) and when I parse it the encoding gets lost and it turns into Gibberish. Is there any way to avoid this?

  10. Pingback: 7 ימים עם טלפון חלונות 7: יום 4 - פיתוח - Ambious

  11. Pingback: שבעה ימים עם Windows Phone 7 [היום הרביעי] | Newsgeek

  12. Aleksi says:

    .SelectNodes isn’t working for me either, what could be the reason? The test app runs just fine.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s