Scala Compiler Quirks

Recently I have been dabbling with scala a bit. As it happens I found a few quirks in the compiler.

So does the following bit of code compiler and, if so, what does it print?

object Example {
    val x = y
    val y = true
 
    def main(args: Array[String]) {
        print(x);
    }
}

Yes you’ll get false and that is plain wrong. In my opinion a compiler error would have been fine and ideally, as there are no cycles it would have just printed true.

Another major selling point of scala is it’s pattern matching mechanism. A very convenient feature is the compiler’s ability to print a warning, if the cases provided are not exhaustive. A very simple case of such a non-exhaustive match is this:

object CaseExample {
      def main(args: Array[String]) {
          val b = args.length == 1
          b match {
              case true => print("True")
          }
 
      }
  }

Unfortunately the compiler doesn’t complain at all. After some googling I came across SI-3111. My own problem was slighty, but only slightly more complicated. I matched tupels of Options against Some and None.

We have been bitten by both problems, that lead to really subtle bugs. This is really unfortunate, because the compiler can do a lot more in a language like scala. The first problem gets even caught by javac.

A third shortcoming of scala can be easily fixed. Set the tab size to four spaces and your code will
look a lot more structured.

To me it seems like the scala compiler unnessarily discredits the idea of strong static checks by a somehwat quirky implementation.

Posted in Software Development | 1 Comment

Don’t use DITA if you don’t have to

I am currently for a large organisation that seems to embrace DITA for their documentation. In theory this is a nice thing. There is a wealth of tools to edit dita files and to transform them to all sorts of formats. The reference implementation is the DITA OPEN TOOLKIT. First of all I am not impressed with the documentation. There is no proper tutorial and no good overview. And these are people who are into writing documentation!

The second problem is the poor quality of the tooling. It a pile of ant build files, XSLTs and obscure jar files. This stuff is really brittle. Also it seems that people are not really concerned about having a decent continuous integration process. Documentation is rendered on some developer box, that has the “right” version of the toolkit installed. Clearly not everything that has the name Darwin on the box is good. The Practices of Proper Christian Programming have certainly not been followed. The whole thing is terribly brittle. Countless hours have been wasted, are being wasted and will be wasted. This is very unfortunate because documentation is often treated as a third class artefact that gets created as an afterthought using terribly blunt tools.

One decent resource is actually DITA for the Impatient. Generally I would advise to use a custom tool chain to build your documentation. Probably something involving a templating language. Transformations should be implemented in a real programming language. As a backend things like LaTeX come to mind. Also ant is actually a crappy tool even for compiling java code, so providing decent command line utilities is key for a toolkit that claims to be universal and extensible.

Verdict: Stay clear of DITA.

Posted in Software Development | 1 Comment

Getting notified when a job is done

I really like the interaction of the command line with more advanced user interfaces. So today I got around to finally writing a little wrapper script for the mac, that notifies me with speech output when a program has finished using the say utility. The problem I was trying to solve was getting feedback when my command line build is done. Obviously I also wanted to know whether the build was successful or not. The first solution looked like this:

./build.sh && say success || say fail

That actually works quite nicely. But then I found it so useful, that I wrote a wrapper script, that returns an error when the wrapped program fails on top of the speech notification. It looks like this:

#!/bin/bash
fail ()
{
	say fail
        exit 1
}
 
eval $@ 
if [ $? -ne 0 ]; then
        fail
fi
say success

After placing this as tell in my path I can just go

tell ./build.sh
Posted in Software Development | 2 Comments

Exception Handling – The Catch Block should go Last

I have just stumbled across a piece of code like this:

    Object getSomeValue() {
        Object value = null;
        try {
           value = errorProneOperation();
        } catch (IOException e) {
            throw new IllegalStateException(e);
        }
        return value;
    }

Now I find this really awkward. Why initialise something to null to keep the compiler happy?
Surely you want to do this:

    Object getSomeValue() {
        try {
           Object value = errorProneOperation();
           return value;
        } catch (IOException e) {
            throw new IllegalStateException(e);
        }
    }

No useless initialisation here. Also the scope of the local variable is smaller, which is good. I think there is a general rule here, which is: “There should be nothing after your catch-handler apart from a potential finally”. As with all rules it might be broken, but only with good reason. I don’t want to see that crappy initialisation to null again.

Posted in Software Development | 5 Comments

JSON Builder – Fun with Generics

On the train back to Berlin I spiked a little fluent Json Builder in Java. Here is one of my acceptance tests:

 JsonBuilder builder = new JsonBuilder<NULL>()
        .addObject("name")
            .addProperty("first", "Holden")
            .addProperty("last", "Caulfield")
        .end()
        .addArray("contact")
            .addPrimitive("00447903217666")
            .addObject()
                .addProperty("street", "5 Mayton St")
            .end()
            .addPrimitive("004915151183666")
        .end()
        .addProperty("date", "2011-12-12");
 
JsonObject clientFile =builder.build();

This yields:

{
    "name":
        {
         "first":"Holden",
         "last":"Caulfield"
        },
    "contact":["00447903217666",{"street":"5 Mayton St"},"004915151183666"],
    "date":"2011-12-12"
}

I am wondering whether people find the nesting of builders with end() useful.
The interesting thing is the type parameter, as there is a JsonBuilder and a JsonArrayBuilder, that can be nested arbitrarily but the end() call always returns the the enclosing type.

The type is recursive ;-).
In JsonBuilder we have:

class JsonBuilder <P> {
    public JsonArrayBuilder<JsonBuilder<P>> addArray(String key); 
    public JsonBuilder<JsonBuilder<P>> addObject(String name);
    public P end();
}

And in JsonArrayBuilder things are similar:

class JsonArrayBuilder<P>{   
    public JsonBuilder<JsonArrayBuilder<P>> addObject();
    public JsonArrayBuilder<JsonArrayBuilder<P>> addArray();
    public P end();
}

I was quite surprised, that this thing works. To start with I defined a NULL type for the instance at the root level. Obviously this could be hidden in a static factory method or a subclass.

Posted in Software Development | Leave a comment

Processing large XML files with Shell Scripts

I recently did some work around analysing xml files for data imports. This kind of task is usually well suited for taco bell programming. Now xml is not easily manipulated with standard unix utilities, so I looked for a way to run xqueries against my files.

The first thing I found was the eXist xml database. It has a reasonable http interface and hence can be scripted using curl. The trouble with eXist is, that you have to first import your data into the database and also it is doesn’t deal with big files (~500MB) gracefully – at least not in an ad-hoc naïve fashion.

After some research I found xqilla a command line utility, which operates on files and does easily filter hundreds of megabytes.

Also I found xquery really nice. It doesn’t use angle brackets. So here I have got a little example to make my point. Given I want to extract some information from this xml input (bigfile.xml(:

<?xml version="1.0" encoding="UTF-8"?>
<library>
    <book author="J.D. Salinger" title="The Catcher in the Rye" lang="en">
        <isbn>0-316-76953-3</isbn>
    </book>
    <book author="Joseph Heller" title="Catch-22" lang="en">
        <isbn>0-684-83339-5</isbn>
    </book>
    <book author="Ödön von Horváth" title="Jugend ohne Gott" lang="de">
        <isbn>3-518-18807-0</isbn>
    </book>
</library>

Now I create the following xquery and write it to a file called test.xquery.

for $x in ./library/book
where $x/@lang = "en"
return 
  concat(
    data($x/@title),
    " by ",
    data($x/@author),
    ": ",
    data($x/isbn)
  )

Now running

xqilla test.xquery -i bigfile.xml

yields this:

The Catcher in the Rye by J.D. Salinger: 0-316-76953-3
Catch-22 by Joseph Heller: 0-684-83339-5

If you are doing multiple queries against a single file eXist will probably be faster if you create the right indices, but for just filtering xml in a single pass, xqilla is a tool to consider.

For adhoc usage it is often impractical to create a file with the quey expression. In such cases a process substitution is your friend:

xqilla <(echo 'for $x in ./library/book return data($x/@title)') -i bigfile.xml
Posted in Software Development | 1 Comment