Choosing between database types
Nov 23 11:35:01
Askari and Java prototyping
Nov 22 4:48:43
Finding scarcity in the digital economy
Nov 10 12:17:27
Tornado - the web services server?
Sep 19 23:19:55
Quote of the week
Jul 18 9:18:44
The two most useful things I've found about my Rhino wrapper Askari are:
Recently, I came across a nice example which allowed me to exploit both benefits. I wanted to be able to extract data from a series of web pages, but which unfortunately were being presented in HTML format rather than XHTMl.
So I tracked down the JTidy library, and made it so Askari would automatically load it by adding the following lines to lib-askari.conf:
; JTidy interface
jar,jtidy\jtidy.jar
Then in the Askari interpreter I could load the file, convert it to XML using JTidy, select the part of the file I wanted, and write the results to disk:
var xhtmlns = new Namespace("http://www.w3.org/1999/xhtml"); for (var i=1;i<100;i++) { var text = readUrl("http://myurl.com?id=" + i); var page = new XML(tidy(text)); var email = page..xhtmlns::div.(@id == "results"); writeFile("email"+i+".html", email.toXMLString()); } function tidy(s) { var JTidy = JavaImporter(); JTidy.importPackage(Packages.org.w3c.tidy); var tidy = new JTidy.Tidy(); tidy.setXHTML(true); tidy.setNumEntities(true); tidy.setQuiet(true); tidy.setDocType("omit"); tidy.setShowWarnings(false); var inStream = new java.io.ByteArrayInputStream(new java.lang.String(s).getBytes()); var outStream = new java.io.ByteArrayOutputStream(); tidy.parse(inStream, outStream); return outStream.toString(); }
Most of the tricky bits are in Java type wrangling (since this isn't a normal concern of JavaScript). Nonetheless, it's still an easy way to take a Java library for a quick test drive.