Archive for the ‘Software Development’ Category.

Readability vs Runtime Feedback

In my scala explorations I also came across the problem of testing and verifying mocks. In scala it is trivial to pass around predicates, however these are function objects, that can be applied, but don’t know much about their implementation. So while the code is readable the feedback can be quite bad.

Groovy’s assert statement is a shining example, how both goals can be attainded. Just consider the following assertion:

assert ["100","Test", 11].contains("10"+"1")

It is beautiful to read (as opposed let’s say hamcrast that makes me use it’s own syntax to construct an expression tree). When it comes to execute this I also do get really good feedback:

Scala Compiler Quirks

Recently I have been dabbling with scala a bit. As it happens I found a few quirks in the compiler.

So does the following bit of code compiler and, if so, what does it print?

object Example {
    val x = y
    val y = true
 
    def main(args: Array[String]) {
        print(x);
    }
}

Yes you’ll get false and that is plain wrong. In my opinion a compiler error would have been fine and ideally, as there are no cycles it would have just printed true.

Another major selling point of scala is it’s pattern matching mechanism. A very convenient feature is the compiler’s ability to print a warning, if the cases provided are not exhaustive. A very simple case of such a non-exhaustive match is this:

object CaseExample {
      def main(args: Array[String]) {
          val b = args.length == 1
          b match {
              case true => print("True")
          }
 
      }
  }

Unfortunately the compiler doesn’t complain at all. After some googling I came across SI-3111. My own problem was slighty, but only slightly more complicated. I matched tupels of Options against Some and None.

We have been bitten by both problems, that lead to really subtle bugs. This is really unfortunate, because the compiler can do a lot more in a language like scala. The first problem gets even caught by javac.

A third shortcoming of scala can be easily fixed. Set the tab size to four spaces and your code will
look a lot more structured.

To me it seems like the scala compiler unnessarily discredits the idea of strong static checks by a somehwat quirky implementation.

Don’t use DITA if you don’t have to

I am currently for a large organisation that seems to embrace DITA for their documentation. In theory this is a nice thing. There is a wealth of tools to edit dita files and to transform them to all sorts of formats. The reference implementation is the DITA OPEN TOOLKIT. First of all I am not impressed with the documentation. There is no proper tutorial and no good overview. And these are people who are into writing documentation!

The second problem is the poor quality of the tooling. It a pile of ant build files, XSLTs and obscure jar files. This stuff is really brittle. Also it seems that people are not really concerned about having a decent continuous integration process. Documentation is rendered on some developer box, that has the “right” version of the toolkit installed. Clearly not everything that has the name Darwin on the box is good. The Practices of Proper Christian Programming have certainly not been followed. The whole thing is terribly brittle. Countless hours have been wasted, are being wasted and will be wasted. This is very unfortunate because documentation is often treated as a third class artefact that gets created as an afterthought using terribly blunt tools.

One decent resource is actually DITA for the Impatient. Generally I would advise to use a custom tool chain to build your documentation. Probably something involving a templating language. Transformations should be implemented in a real programming language. As a backend things like LaTeX come to mind. Also ant is actually a crappy tool even for compiling java code, so providing decent command line utilities is key for a toolkit that claims to be universal and extensible.

Verdict: Stay clear of DITA.

Getting notified when a job is done

I really like the interaction of the command line with more advanced user interfaces. So today I got around to finally writing a little wrapper script for the mac, that notifies me with speech output when a program has finished using the say utility. The problem I was trying to solve was getting feedback when my command line build is done. Obviously I also wanted to know whether the build was successful or not. The first solution looked like this:

./build.sh && say success || say fail

That actually works quite nicely. But then I found it so useful, that I wrote a wrapper script, that returns an error when the wrapped program fails on top of the speech notification. It looks like this:

#!/bin/bash
fail ()
{
	say fail
        exit 1
}
 
eval $@ 
if [ $? -ne 0 ]; then
        fail
fi
say success

After placing this as tell in my path I can just go

tell ./build.sh

Exception Handling – The Catch Block should go Last

I have just stumbled across a piece of code like this:

    Object getSomeValue() {
        Object value = null;
        try {
           value = errorProneOperation();
        } catch (IOException e) {
            throw new IllegalStateException(e);
        }
        return value;
    }

Now I find this really awkward. Why initialise something to null to keep the compiler happy?
Surely you want to do this:

    Object getSomeValue() {
        try {
           Object value = errorProneOperation();
           return value;
        } catch (IOException e) {
            throw new IllegalStateException(e);
        }
    }

No useless initialisation here. Also the scope of the local variable is smaller, which is good. I think there is a general rule here, which is: “There should be nothing after your catch-handler apart from a potential finally”. As with all rules it might be broken, but only with good reason. I don’t want to see that crappy initialisation to null again.

JSON Builder – Fun with Generics

On the train back to Berlin I spiked a little fluent Json Builder in Java. Here is one of my acceptance tests:

 JsonBuilder builder = new JsonBuilder<NULL>()
        .addObject("name")
            .addProperty("first", "Holden")
            .addProperty("last", "Caulfield")
        .end()
        .addArray("contact")
            .addPrimitive("00447903217666")
            .addObject()
                .addProperty("street", "5 Mayton St")
            .end()
            .addPrimitive("004915151183666")
        .end()
        .addProperty("date", "2011-12-12");
 
JsonObject clientFile =builder.build();

This yields:

{
    "name":
        {
         "first":"Holden",
         "last":"Caulfield"
        },
    "contact":["00447903217666",{"street":"5 Mayton St"},"004915151183666"],
    "date":"2011-12-12"
}

I am wondering whether people find the nesting of builders with end() useful.
The interesting thing is the type parameter, as there is a JsonBuilder and a JsonArrayBuilder, that can be nested arbitrarily but the end() call always returns the the enclosing type.

The type is recursive ;-).
In JsonBuilder we have:

class JsonBuilder <P> {
    public JsonArrayBuilder<JsonBuilder<P>> addArray(String key); 
    public JsonBuilder<JsonBuilder<P>> addObject(String name);
    public P end();
}

And in JsonArrayBuilder things are similar:

class JsonArrayBuilder<P>{   
    public JsonBuilder<JsonArrayBuilder<P>> addObject();
    public JsonArrayBuilder<JsonArrayBuilder<P>> addArray();
    public P end();
}

I was quite surprised, that this thing works. To start with I defined a NULL type for the instance at the root level. Obviously this could be hidden in a static factory method or a subclass.

Processing large XML files with Shell Scripts

I recently did some work around analysing xml files for data imports. This kind of task is usually well suited for taco bell programming. Now xml is not easily manipulated with standard unix utilities, so I looked for a way to run xqueries against my files.

The first thing I found was the eXist xml database. It has a reasonable http interface and hence can be scripted using curl. The trouble with eXist is, that you have to first import your data into the database and also it is doesn’t deal with big files (~500MB) gracefully – at least not in an ad-hoc naïve fashion.

After some research I found xqilla a command line utility, which operates on files and does easily filter hundreds of megabytes.

Also I found xquery really nice. It doesn’t use angle brackets. So here I have got a little example to make my point. Given I want to extract some information from this xml input (bigfile.xml:

<?xml version="1.0" encoding="UTF-8"?>
<library>
    <book author="J.D. Salinger" title="The Catcher in the Rye" lang="en">
        <isbn>0-316-76953-3</isbn>
    </book>
    <book author="Joseph Heller" title="Catch-22" lang="en">
        <isbn>0-684-83339-5</isbn>
    </book>
    <book author="Ödön von Horváth" title="Jugend ohne Gott" lang="de">
        <isbn>3-518-18807-0</isbn>
    </book>
</library>

Now I create the following xquery and write it to a file called test.xquery.

for $x in doc("bigfile.xml")/library/book
where $x/@lang = "en"
return 
  concat(
    data($x/@title),
    " by ",
    data($x/@author),
    ": ",
    data($x/isbn)
  )

Now running

xqilla test.xquery

yields this:

The Catcher in the Rye by J.D. Salinger: 0-316-76953-3
Catch-22 by Joseph Heller: 0-684-83339-5

If you are doing multiple queries against a single file eXist will probably be faster if you create the right indices, but for just filtering xml in a single pass, xqilla is a tool to consider.

Reading the Classics – The CLU Reference Manual

Last year I started reading or rereading some of the classical texts in computer science. The first one was the CLU Reference Manual by Prof. Liskov et al.

The book and the language were conceived in the seventies. CLU is object based the central concept being the abstract data type essentially encapsulated objects without inheritance. It does however support parametric polymorphism AKA generics. There is also a really nice exception mechanism. Also there is the notion of iterators that support yield return, just in case you thought MS invented that. Also there is a small amount of really effective syntactic sugar. The syntax to call “methods” is a bit queer, but on the whole it’s a very interesting language.

The book is very terse. Beautiful. There is not only advice on how to do exception handling (if only I had that kind of instruction when I learnt java), but also on avoiding singletons. Prof. Waldschmidt once mentioned to us that CLU rather than java should be used to teach programming. I think he was spot on. My java didn’t take off until I understood how to use delegation and composition. If you don’t have inheritance you might learn that lesson faster.

It’s a shame that CLU has been abandoned. Imagine a world, where C hadn’t been used for systems programming…

Java Lib to Launch External Processes

I recently redesigned some of the code I tend to use to spawn external processes (pdflatex anyone?) in java. The implementation is still a bit buggy, but I am more interested in people’s opinions about the API (non-blocking killable invocations are not yet supported). The project on google code is called jproc. Here is the cookbook so far:

To launch an external program we’ll use a ProcBuilder. The run method
builds and spawns the actual process and blocks until the process exits.
The process takes care of writing the output to a stream (as opposed to the standard
facilities in the JDK that expect the client to actively consume the
output from an input stream:

ByteArrayOutputStream output = new ByteArrayOutputStream();
 
new ProcBuilder("echo")
        .withArg("Hello World!")
        .withOutputStream(output)
        .run();
 
assertEquals("Hello World!\n", output.toString());

The input can be read from an arbitrary input stream, like this:

ByteArrayInputStream input = new ByteArrayInputStream("Hello cruel World".getBytes());
 
ProcResult result = new ProcBuilder("wc")
        .withArgs("-w")
        .withInputStream(input).run();
 
assertEquals("3", result.getOutputString().trim());

If all you want to get is the string that gets returned and if there
is not a lot of data, using a streams is quite cumbersome. So for convenience
if no stream is provdied the output is captured by default and can be
obtained from the result.

ProcResult result = new ProcBuilder("echo")
                            .withArg("Hello World!")
                            .run();
 
assertEquals("Hello World!\n", result.getOutputString());
assertEquals(0, result.getExitValue());
assertEquals("echo \"Hello World!\"", result.getProcString());

For providing input there is a convenience method too:

ProcResult result = new ProcBuilder("cat")
   .withInput("This is a string").run();
 
assertEquals("This is a string", result.getOutputString());

Some external programs are using environment variables. These can also
be set using the withVar method

ProcResult result = new ProcBuilder("bash")
                            .withArgs("-c", "echo $MYVAR")
                            .withVar("MYVAR","my value").run();
 
assertEquals("my value\n", result.getOutputString());
assertEquals("bash -c \"echo $MYVAR\"", result.getProcString());

A common usecase for external programs is batch processing of data.
These programs might always run into difficulties. Therefore a timeout can be
specified. There is a default timeout of 5000ms. If the program does not terminate within the timeout
interval it will be terminated and the failure is indicated through
an exception:

ProcBuilder builder = new ProcBuilder("sleep")
        .withArg("2")
        .withTimeoutMillis(1000);
try {
    builder.run();
    fail("Should time out");
}
catch (TimeoutException ex){
    assertEquals("Process 'sleep 2' timed out after 1000ms.", ex.getMessage());
}

Even if the process does not timeout, we might be interested in the
execution time. It is also available through the result:

ProcResult result = new ProcBuilder("sleep")
        .withArg("0.5")
        .withTimeoutMillis(1000)
        .run();
 
assertTrue(result.getExecutionTime() > 500 && result.getExecutionTime() < 1000);

By default the new program is spawned in the working directory of
the parent process. This can be overidden:

ProcResult result = new ProcBuilder("pwd")
        .withWorkingDirectory(new File("/"))
        .run();
 
assertEquals("/\n", result.getOutputString());

It is a time honoured tradition that programs signal a failure
by returning a non-zero exit value. However in java failure is
signalled through exceptions. Non-Zero exit values therefore
get translated into an exception, that also grants access to
the output on standard error.

ProcBuilder builder = new ProcBuilder("ls")
                            .withArg("xyz");
try {
    builder.run();
    fail("Should throw exception");
} catch (ExternalProcessFailureException ex){
    assertEquals("ls: xyz: No such file or directory\n", ex.getStderr());
    assertEquals(1, ex.getExitValue());
    assertEquals("ls xyz", ex.getCommand());
    assertEquals("ls: xyz: No such file or directory\n", ex.getStderr());
    assertTrue(ex.getTime() > 0);
 
}

Input and output can also be provided as byte[].
ProcBuilder also copes with large amounts of
data.

int MEGA = 1024 * 1024;
byte[] data = new byte[4 * MEGA];
for (int i = 0; i < data.length; i++) {
    data[i] = (byte) Math.round(Math.random() * 255 - 128);
}
 
ProcResult result = new ProcBuilder("gzip")
   .withInput(data)
   .run();
 
assertTrue(result.getOutputBytes().length > 2 * MEGA);

The builder allows to build and spawn several processes from
the same builder instance:

ProcBuilder builder = new ProcBuilder("uuidgen");
String uuid1 = builder.run().getOutputString();
String uuid2 = builder.run().getOutputString();
 
assertNotNull(uuid1);
assertNotNull(uuid2);
assertTrue(!uuid1.equals(uuid2));

For convenience there is also a static method that just runs a
program and captures the ouput:

String output = ProcBuilder.run("echo", "Hello World!");
 
assertEquals("Hello World!\n", output);

Also there is a static method that filters a given string through
a program:

String output = ProcBuilder.filter("x y z","sed" ,"s/y/a/");
 
assertEquals("x a z\n", output);

Join

Note to self: join works only on sorted text files.