HTML is notoriously difficult to parse and it has usually been a pain to do this in Java. Yes I know that there are parsers (like jtidy and nekohtml) that try to create a proper DOM but I’ve been waiting for something more lightweight.
Enter Jsoup. It feels like a mix of JQuery and Beautiful Soup (for Python).
String html = response.getContentAsString();
Document document = Jsoup.parse(html);
Elements elements = document.select("#errorRef");
assertThat(elements.size(), equalTo(1));
assertThat(elements.first().text(), equalTo(errorRef));
Mmm, full flavour, none of the fat.
#1 by Murray on 18 September 2010 - 4:20 pm
Quote
Hmm looks a bit friendlier than tagsoup which is what I normally use. Thanks for the tip.
#2 by Murray on 9 March 2011 - 5:24 pm
Quote
Finally got around to trying this and love it. Especially the jquery-like selects. So much nicer than xpath and DOM-traversal.
#3 by Tom on 9 March 2011 - 8:46 pm
Quote
Thanks Murray. It seems pretty speedy as well. Using it right now on a project as part of my integration tests. Keeping the build under 2 minutes
#4 by Kim Pollex on 15 May 2012 - 10:59 pm
Quote
No DOM-compatible Interface means not even having XPath, nor can one use JAXB, Castor, XStream et.al. This is an *huge* disadvantage.
#5 by Tom on 17 May 2012 - 12:38 pm
Quote
That depends on what you are using it for. I use Jsoup for tests & website scraping. Don’t usually have a need for JAXB, Castor, XStream or even XPath in these situations.