The latest break both Urban Dictionary and Fail++ had were caused by HTML markup changes on the respective sites.
I kept putting off being able to push parsing logic from my server because I was lazy and kept thinking “I will do it next release”. Well – the changes caught me with my pants down – the inability to update parsing “on the fly” required me to rush and publish a new app and wait for a few days until the marketplace updated it. A great option for getting data off HTML is RegEx – the Windows Phone platform supports it and it works pretty well… If you are smart enough or patient enough to use it on HTML. I am neither.
So instead, I decided to add XPath support for the great Html Agility Pack parser that exists on CodePlex. I understand XPath and I generally don’t feel like my brain explodes when I author it.
Here’s the repackaged code for your consumption. There are a few caveats:
- It does not support all axis types – specifically, it does not support “previous node”.
- It does not support all XPath functions – you can see a full list in the FunctionNode.cs file.
- It does support a few functions that are not in the xpath spec. Some string parsing functions, RegEx support and .OuterHtml/.InnerHtml support among others.
- It probably won’t pass any XPath tests, but then again, it’s working on HTML and not XPath, so I am not to worried about it.
I am going to upload this to CodePlex as soon as it’s back online.
The sample app inside the solution has a few simple examples, but overall, it’s what you would expect.
Oh, also, I didn’t write a lot of comments – sorry about that.