The two most useful things I've found about my Rhino wrapper Askari are:
- it gives me the easiest way I've found to quickly load, interpret and transform XML data
- it allows me to use Java libraries without compiling Java code
Recently, I came across a nice example which allowed me to exploit both benefits. I wanted to be able to extract data from a series of web pages, but which unfortunately were being presented in HTML format rather than XHTMl.
So I tracked down the JTidy library, and made it so Askari would automatically load it by adding the following lines to lib-askari.conf:
; JTidy interface
jar,jtidy\jtidy.jar
Then in the Askari interpreter I could load the file, convert it to XML using JTidy, select the part of the file I wanted, and write the results to disk:
var xhtmlns = new Namespace("http://www.w3.org/1999/xhtml");
for (var i=1;i<100;i++) {
var text = readUrl("http://myurl.com?id=" + i);
var page = new XML(tidy(text));
var email = page..xhtmlns::div.(@id == "results");
writeFile("email"+i+".html", email.toXMLString());
}
function tidy(s) {
var JTidy = JavaImporter();
JTidy.importPackage(Packages.org.w3c.tidy);
var tidy = new JTidy.Tidy();
tidy.setXHTML(true);
tidy.setNumEntities(true);
tidy.setQuiet(true);
tidy.setDocType("omit");
tidy.setShowWarnings(false);
var inStream = new java.io.ByteArrayInputStream(new java.lang.String(s).getBytes());
var outStream = new java.io.ByteArrayOutputStream();
tidy.parse(inStream, outStream);
return outStream.toString();
}
Most of the tricky bits are in Java type wrangling (since this isn't a normal concern of JavaScript). Nonetheless, it's still an easy way to take a Java library for a quick test drive.