Archive for the ‘Held der Kommandozeile’ Category.

Getting notified when a job is done

I really like the interaction of the command line with more advanced user interfaces. So today I got around to finally writing a little wrapper script for the mac, that notifies me with speech output when a program has finished using the say utility. The problem I was trying to solve was getting feedback when my command line build is done. Obviously I also wanted to know whether the build was successful or not. The first solution looked like this:

./build.sh && say success || say fail

That actually works quite nicely. But then I found it so useful, that I wrote a wrapper script, that returns an error when the wrapped program fails on top of the speech notification. It looks like this:

#!/bin/bash
fail ()
{
	say fail
        exit 1
}
 
eval $@ 
if [ $? -ne 0 ]; then
        fail
fi
say success

After placing this as tell in my path I can just go

tell ./build.sh

Processing large XML files with Shell Scripts

I recently did some work around analysing xml files for data imports. This kind of task is usually well suited for taco bell programming. Now xml is not easily manipulated with standard unix utilities, so I looked for a way to run xqueries against my files.

The first thing I found was the eXist xml database. It has a reasonable http interface and hence can be scripted using curl. The trouble with eXist is, that you have to first import your data into the database and also it is doesn’t deal with big files (~500MB) gracefully – at least not in an ad-hoc naïve fashion.

After some research I found xqilla a command line utility, which operates on files and does easily filter hundreds of megabytes.

Also I found xquery really nice. It doesn’t use angle brackets. So here I have got a little example to make my point. Given I want to extract some information from this xml input (bigfile.xml:

<?xml version="1.0" encoding="UTF-8"?>
<library>
    <book author="J.D. Salinger" title="The Catcher in the Rye" lang="en">
        <isbn>0-316-76953-3</isbn>
    </book>
    <book author="Joseph Heller" title="Catch-22" lang="en">
        <isbn>0-684-83339-5</isbn>
    </book>
    <book author="Ödön von Horváth" title="Jugend ohne Gott" lang="de">
        <isbn>3-518-18807-0</isbn>
    </book>
</library>

Now I create the following xquery and write it to a file called test.xquery.

for $x in doc("bigfile.xml")/library/book
where $x/@lang = "en"
return 
  concat(
    data($x/@title),
    " by ",
    data($x/@author),
    ": ",
    data($x/isbn)
  )

Now running

xqilla test.xquery

yields this:

The Catcher in the Rye by J.D. Salinger: 0-316-76953-3
Catch-22 by Joseph Heller: 0-684-83339-5

If you are doing multiple queries against a single file eXist will probably be faster if you create the right indices, but for just filtering xml in a single pass, xqilla is a tool to consider.

Java Lib to Launch External Processes

I recently redesigned some of the code I tend to use to spawn external processes (pdflatex anyone?) in java. The implementation is still a bit buggy, but I am more interested in people’s opinions about the API (non-blocking killable invocations are not yet supported). The project on google code is called jproc. Here is the cookbook so far:

To launch an external program we’ll use a ProcBuilder. The run method
builds and spawns the actual process and blocks until the process exits.
The process takes care of writing the output to a stream (as opposed to the standard
facilities in the JDK that expect the client to actively consume the
output from an input stream:

ByteArrayOutputStream output = new ByteArrayOutputStream();
 
new ProcBuilder("echo")
        .withArg("Hello World!")
        .withOutputStream(output)
        .run();
 
assertEquals("Hello World!\n", output.toString());

The input can be read from an arbitrary input stream, like this:

ByteArrayInputStream input = new ByteArrayInputStream("Hello cruel World".getBytes());
 
ProcResult result = new ProcBuilder("wc")
        .withArgs("-w")
        .withInputStream(input).run();
 
assertEquals("3", result.getOutputString().trim());

If all you want to get is the string that gets returned and if there
is not a lot of data, using a streams is quite cumbersome. So for convenience
if no stream is provdied the output is captured by default and can be
obtained from the result.

ProcResult result = new ProcBuilder("echo")
                            .withArg("Hello World!")
                            .run();
 
assertEquals("Hello World!\n", result.getOutputString());
assertEquals(0, result.getExitValue());
assertEquals("echo \"Hello World!\"", result.getProcString());

For providing input there is a convenience method too:

ProcResult result = new ProcBuilder("cat")
   .withInput("This is a string").run();
 
assertEquals("This is a string", result.getOutputString());

Some external programs are using environment variables. These can also
be set using the withVar method

ProcResult result = new ProcBuilder("bash")
                            .withArgs("-c", "echo $MYVAR")
                            .withVar("MYVAR","my value").run();
 
assertEquals("my value\n", result.getOutputString());
assertEquals("bash -c \"echo $MYVAR\"", result.getProcString());

A common usecase for external programs is batch processing of data.
These programs might always run into difficulties. Therefore a timeout can be
specified. There is a default timeout of 5000ms. If the program does not terminate within the timeout
interval it will be terminated and the failure is indicated through
an exception:

ProcBuilder builder = new ProcBuilder("sleep")
        .withArg("2")
        .withTimeoutMillis(1000);
try {
    builder.run();
    fail("Should time out");
}
catch (TimeoutException ex){
    assertEquals("Process 'sleep 2' timed out after 1000ms.", ex.getMessage());
}

Even if the process does not timeout, we might be interested in the
execution time. It is also available through the result:

ProcResult result = new ProcBuilder("sleep")
        .withArg("0.5")
        .withTimeoutMillis(1000)
        .run();
 
assertTrue(result.getExecutionTime() > 500 && result.getExecutionTime() < 1000);

By default the new program is spawned in the working directory of
the parent process. This can be overidden:

ProcResult result = new ProcBuilder("pwd")
        .withWorkingDirectory(new File("/"))
        .run();
 
assertEquals("/\n", result.getOutputString());

It is a time honoured tradition that programs signal a failure
by returning a non-zero exit value. However in java failure is
signalled through exceptions. Non-Zero exit values therefore
get translated into an exception, that also grants access to
the output on standard error.

ProcBuilder builder = new ProcBuilder("ls")
                            .withArg("xyz");
try {
    builder.run();
    fail("Should throw exception");
} catch (ExternalProcessFailureException ex){
    assertEquals("ls: xyz: No such file or directory\n", ex.getStderr());
    assertEquals(1, ex.getExitValue());
    assertEquals("ls xyz", ex.getCommand());
    assertEquals("ls: xyz: No such file or directory\n", ex.getStderr());
    assertTrue(ex.getTime() > 0);
 
}

Input and output can also be provided as byte[].
ProcBuilder also copes with large amounts of
data.

int MEGA = 1024 * 1024;
byte[] data = new byte[4 * MEGA];
for (int i = 0; i < data.length; i++) {
    data[i] = (byte) Math.round(Math.random() * 255 - 128);
}
 
ProcResult result = new ProcBuilder("gzip")
   .withInput(data)
   .run();
 
assertTrue(result.getOutputBytes().length > 2 * MEGA);

The builder allows to build and spawn several processes from
the same builder instance:

ProcBuilder builder = new ProcBuilder("uuidgen");
String uuid1 = builder.run().getOutputString();
String uuid2 = builder.run().getOutputString();
 
assertNotNull(uuid1);
assertNotNull(uuid2);
assertTrue(!uuid1.equals(uuid2));

For convenience there is also a static method that just runs a
program and captures the ouput:

String output = ProcBuilder.run("echo", "Hello World!");
 
assertEquals("Hello World!\n", output);

Also there is a static method that filters a given string through
a program:

String output = ProcBuilder.filter("x y z","sed" ,"s/y/a/");
 
assertEquals("x a z\n", output);

Join

Note to self: join works only on sorted text files.

CATting multiple files

Quite often I want to pipe the content of multiple files into a command line utility. An example would be to count the lines of sql in my project. This is another case, where xargs comes in handy:

find . -name "*.sql" | xargs cat | wc -l

Visualising log files with gnuplot

I recently had the pleasure of supporting a new system throughout its first month of production. This was a good opportunity to refresh my command line skills. As it happened I spent a lot of time looking at log files trying to figure out what happened to the productions system. I figured, that a graphical representation of the events would be nice and started using gnuplot.

First I started out with a bunch of bash scripts, using what your usual unix installation provides, but then I actually came up with some groovy scripts to provide better abstractions. A log file generally looks somewhat like this:

04/01/1970 07:55:13 garbage
04/01/1970 09:27:48 Event 2
04/01/1970 10:01:28 garbage
04/01/1970 10:38:30 garbage
04/01/1970 10:48:36 garbage
04/01/1970 10:51:58 Event 2
04/01/1970 11:03:45 garbage
04/01/1970 11:34:03 Event 1
04/01/1970 12:24:33 garbage
05/01/1970 04:35:50 ERROR

There is a lot of garbage plus some events we might be interested in. It allows to specify events, e.g. by providing a regexp:

Event EVENT1 = new RegExEvent("Event 1", ~/Event 1/)
Event EVENT2 = new RegExEvent("Event 2", ~/Event 2/)
Event ERROR = new RegExEvent("Error", ~/ERROR/)

The next step is newing up a TimeLineVisualizer on these events and passing in a stream with the actual log:

def logFile = new File("test.log");
 
logFile.withInputStream {InputStream stream ->
  def visualizer = new TimeLineVisualizer([
          EVENT1,
          EVENT2,
          ERROR
  ]);
  visualizer.visualize(stream)
}

If you have the gnuplot binary on your path this will yield something like this:

timeline

Also in some cases you would like to know which time of day events are most likely to happen. For producing histograms I created another visualizer (which currently takes only one event).

logFile.withInputStream {InputStream stream ->
  def visualizer = new HistogramVisualizer(EVENT2, HistogramVisualizer.HOUR_OF_DAY_BINS)
  visualizer.visualize(stream)
}

For the example log file, which unfortunately has an even distribution of events, we get this:

histogram

The cool thing about gnuplot is, that you can actually run these things in a cron job to produce daily reports (and mail them to the appropriate people) or on a continuous integration server to visualise how the system is being exercised by the test suite.

Using the clipboard effectively – “Held der Kommandozeile”

Every now and then even the über-geek can’t avoid the use of non-command-line-tools. Currently I am doing a bit of production support work, so tail, grep, and awk are my friends. However people expect emails, excel spreadsheets, and similar stuff. So what to do? Well use the best of both worlds pipe your output into the clipboard and vice versa. Here is how it works for different platforms:

Cygwin

If you are a software developer or any other serious user of computing gear on windows get cygwin!

ls | putclip # Will copy the output of ls to the clipboard getclip | grep "ERROR 500" # Will grep on the contents of the clipboard

OSX

ls | pbcopy # Will copy the output of ls to the clipboard pbpaste | grep "ERROR 500" # Will grep on the contents of the clipboard

Linux

ls | xsel --clipboard # Will copy the output of ls to the clipboard xsel --clipboard | grep "ERROR 500" # Will grep on the contents of the clipboard
Alternatively you might want to try xclip.

QuickSilver

And while we are at it, if you are a QuickSilver user you should have a look at the qs commandline tool, which lets you pipe contents into qs or open qs on files.
qs mylovelyfile.ext # opens quicksilver on a file (useful for sending it by mail)

And of course I shamelessly stole that stuff from all over the place, e.g. there.

Embrace the Power of Unix

fortune | xargs cowsay

How much weighs your checkin?

Beeing budget-driven you have to have the right tools. The simplest thing I could come up with are the code scales, that tell you what you have got in your workspace:

 (svn diff |grep -e "^\+[^\+]"|wc -l;svn diff|grep -e "^-[^-]"|wc -l;echo "-";echo "p")|dc

It once again proves the point that real programmers use unix and love RPN.