A++ [Eric Torreborre's Blog]: 12 October 2008

When you program for a trader you don't have time to write tests

I've been wondering for some time if the quote above, or a variant like: "you don't have time to refactor your code", was really true or not. I generally bug my colleagues when they don't write tests along with their code and I usually prophetize that they will actually spare time by writing tests instead of losing time.

Unfortunately I don't have strong evidence if this is true or not. The only thing I can say is that almost every time I've tried to take shortcuts in my developments, I've been bitten by very silly bugs and regretted my recklessness!

Anyway, what would you do if you had to code at the speed of light for a drug-addict trader (and they really need drugs those days,...)? What kind of practices would you adopt to go faster? How would you prepare yourself for those situations?

With those questions in mind, I was happy to be recently involved in a one week project where the biggest part of my job was to do some "Commando programming" to create the necessary tools to support the project.

In this post, I'll do my own retrospective of that project and try to highlight the points which I think are decisive in the context of "Commando programming". Actually most of the "Ding!" points (does that ring a bell to you?) below are applicable to any programming situation. The only difference is that there's no time for preparation in a "Commando" situation. You have to be seriously fit for the job.

This post is too long, that's the Steve Yegge syndrom,... So here's a summary of the take-away points, so you can get a first impression if it's worth reading or not:
Ding! Know your destination
Ding! Be a fast code-reader
Ding! Know at least one scripting language really well
Ding! Know your platform ecosystem
Ding! Don't rewrite anything
Ding! Take careful risks and prototype
Ding! Have at least one high-level functional test
Ding! Test the difficult stuff
Ding! Isolate the file system
Ding! Use the console
Ding! Have lots of sample data
Ding! Time is money, concepts are time
Ding! PPP: Practice, Practice, Practice!!!
Ding! Cool down and document/review your code
Ding! Cool down and take a break

The project: go see your doctor, now!

In my company we've setup a methodology and a set of tools to assess the performance of our customers deployments and one of them asked us to come over for one week to deploy this so-called "HealthCheck" process. After the first initial interviews, we determined that one very essential objective of the project was to reduce the time to save a trade from 1.2 seconds to less than 500 ms. Very clear and quantifiable objective, that's a good start!

Ding! Know your destination

First point here, this may look totally obvious but still it needs to be said: the overall user objective must be very clear for everyone.

This point is important for at least 4 reasons:

It's hard to say that you're going fast if you don't know where you're going!
The fastest development in the world is the one you don't do because you don't need it
It helps a lot, as seen later, to have a clear overall objective when negotiating with yourself or with your customer the next micro-task to develop
Developing fast is not the alpha and omega of productivity. Seeing the project from its overall objective can make you realize that saving other people's time may actually more important than saving your own

Reading the thermometer

Assessing the performance of our systems involves parsing and analyzing lots of different log files: client requests, server requests, database requests, sql execution times, workflow times, garbage collection times,... The scripts we started working with were shell scripts doing the following:

Reading and managing log files, including archiving old versions
Parsing the files and extracting the time spent for a given "quantity" (client request, SQL call,...)
Computing small statistics: minimum / average / maximum time
Creating graphs in order to be able to spot "spikes" that we will be able to label as "Red flags"

The scripts we had were doing the job, but were pretty slow considering the amount of data we had to process. More than one hour could be spent just running the scripts on a subset of the log files.

Analyzing the scripts, I thought that I could implement something more effective,(...)

Ding! Be a fast code-reader

Commando programming may involve reading a lot of code that's not yours before being able to do anything else. This has been written before: we usually spend far more time reading code than writing it. Reading code faster will necessarily speed you up.

(...) more effective so I started prototyping something using Ruby. Why Ruby? Because I knew I would be fast to implement my ideas with it. Or was I?

Ding! Know at least one scripting language really well

No, I wasn't so fast,... I haven't programmed in Ruby for quite some time and I've forgotten many of the idioms or methods of the core api. My head is so full of Scala nowadays, that I would certainly have been faster using Scala for that task. The morality here is that knowing a "scripting" language is very helpful but you really need to have everything in your mental RAM in order to be effective.

This goes for api knowledge as well as "environmental" knowledge: you should spend no time figuring out how to develop/build/test/deploy your code. Being stuck for 2 hours because you don't know how to set a class path for example is a huge waste of value.

Anyway, I was able to show the effectiveness of using a language like Ruby to speed up the analysis time so I decided to officially port the scripts from shell to another language. But now which language should I select for my commando developments?

Choosing a language is a vast question and the criteria governing that choice may not be entirely related to the intrinsic language features. This choice may also be governed by more "contextual" considerations, like the ability of other developers to learn the language and associated tools.

In my case, Groovy was the winner for 4 reasons:

It has all sort of features to help write code faster like closures, literals,...
It can access Java libraries and I was planning to integrate some Database analysis tools we had written in Java
That's one of the closest language to Java on the JVM so other developers will be able to pick up the new scripts faster. And it is usually better known than JRuby, Scala or Jython for example

Ding! Know your platform ecosystem

I also knew, when choosing Groovy, that I would be able to use all sorts of niceties. One of them was the Ant integration which allowed me to easily reuse the exec task (see below). Knowing my way around Ant was a good thing (I know that's not exceptional for a Java developer, it's just an example ;-) ).

Rewrite everything?

That was the most "dangerous" part of this project.

I would honestly have preferred avoiding it. Since I had to replace one small part of the scripts with some new Groovy logic and also, since the scripts would need to be extended anyway, I decided to rewrite them all in Groovy.

Ding! Don't rewrite anything

Haha, I just said I did it! Well, not that you can never rewrite code or some library part, but you have to be damn sure about what you're doing. So,...

Ding! Take careful risks and prototype

Doing "Commando programming" will undoubtedly require taking risks and trying out some solutions before going on. More than ever, it needs to be done with lots of care because the last thing you want, is to find yourself in the middle of the road at the end of the week, because you didn't have time to finish implementing your new "Grand Vision", whatever it is.

In my case I mitigated the risks a lot by "wrapping" the existing shell commands in my Groovy scripts. So I was essentially running the same thing but executed from Groovy! That was really worth it because in the process, I could also reduce a lot the existing amount of code thanks to the usual Groovy tricks.

Test or not to test

Now we come to the meat of the discussion! TDD, BDD, unit tests, code-and-see, what should be done? I honestly don't have a definitive answer but here's what I did.

Ding! Have at least one high-level functional test

You need to have at least one high-level functional test guiding you in your developments. Something which you can come back to and say: "this case still doesn't pass, I'm not finished yet". You may actually have implemented more code than just what was necessary to pass that test case but at least this primary use case should be ok.

Ding! Test the difficult stuff

Complex logic is definitely something you want to isolate and test early. If you don't unit test it thoroughly, it will usually haunt you in the most painful way: hidden inside all the rest. Besides, this complex logic usually means "value" for your customer, more than anything else, so it's worth cherishing it.

Ding! Isolate the file system

Actually, you should soon realize that any interaction with the file system is slowing you down.
Being able to mock the file system, or to write to temporary files with automatic clean up is a huge time-saver for unit testing. In a "Commando" situation you would be prepared with this and have all sorts of functions and ready to use design to help with this. In my case, I had to write all lot of it,...

Ding! Use the console

The Read-Eval-Print loop. In my developments I found very useful to test interactively my regular expressions inside the Groovy console. Whether or not I turned those expressions to actual unit tests,... I didn't do it all the time I must say. One reason is that, once I had confidence that this regular expression was encapsulated in a higher-level concept, like "extractClassName", it would be right forever.

Ding! Have lots of sample data

If possible try to gather as much sample data as you can and run it through your program. Does it blow-up right away? Well, that's actually good, it may be the sign that you forgot a requirement. Instead of spending time trying to refine your requirements until they're exhaustive, or try to think about all possible cases, run the system with the maximum data available. There are 2 drawbacks to this: one is the time spent re-executing large datasets and analyzing failures, the other is not knowing what to look at! Your program may not break with enough smoke to prove you wrong.

On my project I had lots of log files I could run my scripts against. And fortunately my scripts broke in very unambiguous ways, showing where I was wrong.

To design or not to design

So you're programming like hell, you don't have time to draw all those nifty UML diagrams or devise the best design patterns for the job. "We'll design when we'll have time."

But is it really buying you time? I don't think so. I'm not saying that you should write diagrams or add tons of interface and clever indirections.

But you need design in the sense that the problem and solution domains should be very, very clear to you.

Ding! Time is money, concepts are time

So it's worth taking a bit of time and think: "What's really this system like?". Clarifying the concepts is very, very important. For us, it was first of all giving proper names to our analysis concepts:

An entity can have measurable properties ("A trade server has requests")
For each property, you can define several quantities ("each request can take some average time")
For a given "Test environment", we can run several "Test campaigns"

This helped a lot in our discussions, in structuring the scripts and planning our actions.

The next step was to realize that the type of queries we wanted to execute to analyze logging data looked pretty much like SQL queries: "between 15:00 and 15:03. return all update queries with a time > 300ms and group them by workflow rules". This gave us a good hint that the next step for our scripts would be to put everything in a database!

Coding

Consider this:

"I know where I'm going, I have a clear mind about my system and it's concepts, I know how to write minimal and effective tests, is there anything more I could do now to speed up my coding?"

Let's reformulate the sentence above:

"I know I want a gold medal at the Olympics, I know I'll get it by being a table tennis champion and I know the rules of the game, I have a very good racket, is there anything more I could do now to be the best at table tennis?"

Ding! PPP: Practice, Practice, Practice!!!

That's so obvious I'm ashamed to have to write it,... but that makes a real difference for "Commando programming":

technique: practice your regular-expression-fu for example, or know how to write a parser for a simple language (I planned to do that for the verbose gc logs but we didn't have time nor interest to rewrite this part)
libraries: no time should be spent looking at the API docs
tools: if you need remote debugging, it should be fast to setup, because you did it thousands of times before
computer science: you should have at least a good sense of the complexity of your algorithms
integration: how to call Excel from Groovy to create graphics was one of our coding issues
system / network tools: that wasn't too necessary on our project, but I can imagine this making a real difference in other settings

Cooling down for a minute

Two things of equal importance I've also noticed during that week. First of all, when I tried to go fast I couldn't help but noticing entropy. Yes my code was doing the job, but at the same time it wasn't always consistent, documented, properly named or refactored.

Ding! Cool down and document/review your code

This really can look like a pure loss of time. But I really observed that it wasn't. For 2 reasons:

Actually it doesn't take that long to review/document/refactor the code (try it at home)! Except maybe for the naming part, it's a bit like playing sudoku, you just try to make things fall into place
I observed that I was much less dragged down implementing new feature on clear-cut code with obvious intentions, I had much less mental noise. You know, like: "This computeTotal function is actually also sorting the results, so I know I don't have to do it there". If computing and sorting are better separated or named, that can reduce the "mental tax" while reasoning about something else.

The second thing is the equivalent of "cooling down the code",... applied to your body!

Ding! Cool down and take a break

We're not machines. I was so enthusiastic with that project that I almost coded day and night (and I was away from my family,...). But I also observed one very interesting phenomenon: time was slowing down! Sometimes I was just starring at my screen without realizing that 5 minutes had passed without doing nothing substantial. That should be a clear signal for taking a break. This gives me an interesting idea: "stopwatch" programming. Prepare the list of tasks you want to program and program features by 5, 7 or 10 minutes cycles. And watch out when you have almost empty cycles.

The doctor says: fit for service

Conclusion for the project: it was very successful (this doesn't happen all the time so I'm glad to report it). We were able to analyze and spot 3 major bottlenecks and give good recommendations so that our maximum saving time was eventually around 400 ms. The tools served us, but actually ours brains served us more.

But what's the biggest room in the world? The room for improvement ;-) ! I think we'll be able to transfer more of that wisdom to smarter analysis scripts in the future.

Back to the original question

When you program for a trader you don't have time to write tests

True? False? At the light of my experience of this project, the first thing I'm really convinced of is the first part of the sentence: "You don't have time"!

And if you don't have time, you can essentially do 2 things:

Observe and experiment: whatever makes you spare time is good. In my case, writing tests is a proven way to go faster for example
Practice, learn, invest: new tools, new languages, new libraries, contests, coding dojos,...

Ok. Thanks for listening, I leave you there, I have to go,... and practice my "Commando-fu"!

A++ [Eric Torreborre's Blog]

Pages

15 October 2008

Commando programming