Askari and Java prototyping

The two most useful things I've found about my Rhino wrapper Askari are:

  • it gives me the easiest way I've found to quickly load, interpret and transform XML data
  • it allows me to use Java libraries without compiling Java code

Recently, I came across a nice example which allowed me to exploit both benefits. I wanted to be able to extract data from a series of web pages, but which unfortunately were being presented in HTML format rather than XHTMl.

So I tracked down the JTidy library, and made it so Askari would automatically load it by adding the following lines to lib-askari.conf:

; JTidy interface

Then in the Askari interpreter I could load the file, convert it to XML using JTidy, select the part of the file I wanted, and write the results to disk:

var xhtmlns = new Namespace("");

for (var i=1;i<100;i++) {
  var text = readUrl("" + i);
  var page = new XML(tidy(text));
  var email = page..xhtmlns::div.(@id == "results");
  writeFile("email"+i+".html", email.toXMLString());

function tidy(s) {
  var JTidy = JavaImporter();
  var tidy = new JTidy.Tidy();
  var inStream = new java.lang.String(s).getBytes());
  var outStream = new;
  tidy.parse(inStream, outStream);
  return outStream.toString();

Most of the tricky bits are in Java type wrangling (since this isn't a normal concern of JavaScript). Nonetheless, it's still an easy way to take a Java library for a quick test drive.