Pages

10 December 2013

The revenge of the chunks

This series of posts feels like a whole saga for something which should have a quick an easy way to demonstrate the obvious superiority of functional programming over a simple loop. In the first post. Then the second post was about defining proper scalaz-stream combinators to do the same thing, and particularly how to "chunk" the processing in order to get good performances.

However as I was writing unit tests for my requirements I realized that the problem was harder than I thought. In particular, the files I'm processing can have several sections made of HEADERs and TRAILERs. When you create chunks of lines to process this results in a number of combinations that need to be analysed. A chunk can:

  • start with a HEADER but not finish with a TRAILER which is in another chunk
  • contain lines only
  • contains lines + a TRAILER + a new HEADER
  • and so on...

For each of these cases it is necessary to use the current state and the contents of the lines to determine if the file is malformed or not. This is a lot less easy that previously.

All the combinations

This is what I came up with:

  def process(path: String, targetName: String, chunkSize: Int = 10000): String \/ File = {

    val targetPath = path.replace(".DAT", "")+targetName

    val read = 
      linesRChunk(path, chunkSize) |> 
      validateLines.map(lines => lines.mkString("\n"))

    val task = 
      ((read |> process1.intersperse("\n") |> 
      process1.utf8Encode) to io.fileChunkW(targetPath)).run

    task.attemptRun.leftMap(_.getMessage).map(_ => new File(targetPath))
  }

  /**
   * validate that the lines have the right sequence of HEADER/column names/lines/TRAILER
   * and the right number of lines
   */
  def validateLines: Process1[Vector[String], Vector[String]] = {

    // feed lines into the lines parser with a given state
    // when it's done, follow by parsing with a new state
    def parse(lines: Vector[String], state: LineState, newState: LineState) =
      emit(lines) |> linesParser(state) fby linesParser(newState)

    // parse chunks of lines
    def linesParser(state: LineState): Process1[Vector[String], Vector[String]] = {
      receive1[Vector[String], Vector[String]] { case lines =>
        lines match {
          case first +: rest if isHeader(first) =>
            if (state.openedSection) fail("A trailer is missing")
            else
              parse(lines.drop(2),
                    state.open,
                    LineState(lines.count(isHeader) > lines.count(isTrailer), 
                              lines.drop(2).size))

          case first +: rest if isTrailer(first) =>
            val expected = "\\d+".r.findFirstIn(first).map(_.toInt).getOrElse(0)

            if (!state.openedSection)             
              fail("A header is missing")
            else if (state.lineCount != expected) 
              fail(s"expected $expected lines, got ${state.lineCount}")
            else {
              val dropped = lines.drop(1)
              parse(dropped,
                    state.restart,
                    LineState(dropped.count(isHeader) > dropped.count(isTrailer), 
                              dropped.size))
            }

          case first +: rest =>
            if (!state.openedSection) fail("A header is missing")
            else {
              val (first, rest) = lines.span(line => !isTrailer(line))
              emit(first) fby
              parse(rest, state.addLines(first.size), state.addLines(lines.size))
            }

          case Vector() => halt
        }
      }
    }
    // initialise the parsing expecting a HEADER
    linesParser(LineState())
  }


  private def fail(message: String) = Halt(new Exception(message))
  private def isHeader(line: String) = line.startsWith("HEADER|")
  private def isTrailer(line: String) = line.startsWith("TRAILER|")

The bulk of the code is the validateLines process which verifies the file structure:

  • if the first line of this chunk is a HEADER the next line needs to be skipped, we know we opened a new section, and we feed the rest to the lines parser again. However we fail the process if we were not expecting a HEADER there

  • if the first line of this chunk is a TRAILER we do something similar but we also check the expected number of lines

  • otherwise we try to emit as many lines as possible until the next HEADER or TRAILER and we recurse

This is a bit complex because we need to analyse the first element of the chunk, then emit the rest and calculate the new state we will have when this whole chunk is emitted. On the other hand the processor is easy to test because I don't have to read or write files to check it. This would be a bit more difficult to do with the loop version.

But unfortunately not all the tests are green. One is still not passing. What if there is no ending TRAILER in the file? How can I raise an exception? There's no process to run, because there are no more lines to process! My test is pending for now, and I'll post the solution once I have it (maybe there's a smarter way to rewrite all of this?).

Is it worth it?

This was definitely worth it for me in terms of learning the scalaz-stream library. However in terms of pure programmer "productivity", for this kind of requirements, it feels like an overkill. The imperative solution is very easy to come up with and there is no problems with performances. This should change once streaming parsing is available (see the roadmap). Probably this use case will just be expressed as a one-liner. In the light of this post I'm just curious how the implementation will deal with chunking.

09 December 2013

`runState` 0 - combinators 1

In my previous blog post I was trying to implement a runState method with scalaz-stream to process a file and try to validate its internal structure. That was however not a good solution because:

  • it doesn't use combinators but a special purpose runState method
  • it stackoverflows on large files!

It turns out that there is a much better way of dealing with this use case.

Combinators

First of all it is possible to propagate some state with scalaz-stream without having to write a special runState method. The following uses only combinators to do the job:

def process(path: String, targetName: String): String \/ File = {

  val HEADER  = "HEADER(.*)".r
  val TRAILER = "TRAILER\\|(\\d+)".r

  val lineOrTrailer: Process1[String, String]  = {
    def go(lines: Int): Process1[String, String] =
      receive1[String, String] {
        case TRAILER(count) => 
          if (count.toInt == lines) halt 
          else Halt(new Exception(s"Expected $count lines, but got $lines"))
        case HEADER(h)      => 
          Halt(new Exception(s"Didn't expected a HEADER here: $h"))
        case s              => 
          emit(s) fby go(lines + 1)
      }
    go(0)
  }

  val linesStructure =
    discardRegex("HEADER.*") fby
    discardLine              fby
    lineOrTrailer

  val read       = io.linesR(path) |> linesStructure
  val targetPath = path.replace(".DAT", "")+targetName
  val task       = 
    ((read |> process1.intersperse("\n") |> 
     process1.utf8Encode) to io.fileChunkW(targetPath)).run

  task.attemptRun.leftMap(_.getMessage).map(_ => new File(targetPath))
}

val discardLine = receive1[String, String] { _ => halt }

/** discard a line if it matches the expected pattern */
def discardRegex(pattern: String): Process1[String,String] = {
  val compiled = Pattern.compile(pattern)
  receive1[String, String] { line =>
    if (compiled.matcher(line).matches) halt
    else Halt(new Exception(s"Failed to parse $line, does not match regex: $pattern"))
  }
}

With the code above, processing a file amounts to:

  • reading the lines
  • analysing them with linesStructure which propagates the current state, the number of lines already processed, with a recursive method (go) calling itself
  • writing the lines to a new file

The linesStructure method almost looks like a parser combinators expression with parsers sequenced with the fby ("followed by") method.

That looks pretty good but... it performs horribly! With the good-old "loop school", it took 8 seconds to process a 700M file:

def processLoop(path: String, targetName: String): String \/ File = {

  val targetPath = path.replace(".DAT", "")+targetName
  val writer = new FileWriter(targetPath)
  val source = scala.io.Source.fromFile(new File(path))
  var count = 0
  var skipNextLine = false
  try {
    source.getLines().foreach { line =>
      if (line.startsWith("HEADER")) skipNextLine = true
      else if (skipNextLine) { skipNextLine = false }
      else if (line.startsWith("TRAILER")) {
        val expected = line.drop(8).headOption.map(_.toInt).getOrElse(0)
        if (expected != count) throw new Exception(s"expected $expected, got $count")
      }
      else {
        count = count + 1
        writer.write(line)
      }
    }
  } catch {
    case t: Throwable => t.getMessage.left
  } finally {
    source.close
    writer.close
  }
  new File(targetPath).right
}

With the nice, "no-variables, no loop", method it took almost,... 8 minutes!

Chunky streaming

It is fortunately possible to retrieve correct performances by "chunking" the lines before processing them. To do this, we need a new combinator, very close to the io.linesR combinator in scalaz-stream:

// read a file, returning one "chunk" of lines at the time
def linesRChunk(filename: String, chunkSize: Int = 10000): Process[Task, Vector[String]] =
 io.resource(Task.delay(scala.io.Source.fromFile(filename)))(src => Task.delay(src.close)) { src =>
    lazy val lines = src.getLines.sliding(chunkSize, chunkSize) // A stateful iterator
    Task.delay {
      if (lines.hasNext) lines.next.toVector
      else throw End
    }
 }

Now we can process each chunk with:

def process(path: String, targetName: String, bufferSize: Int = 1): String \/ File = {

  val HEADER  = "HEADER(.*)".r
  val TRAILER = "TRAILER\\|(\\d+)".r

  def linesParser(state: LineState): Process1[Vector[String], Vector[String]] = {

    def onHeader(rest: Vector[String]) =
      (emit(rest) |> linesParser(ExpectLineOrTrailer(0))) fby
      linesParser(ExpectLineOrTrailer(rest.size))

    def onLines(ls: Vector[String], actual: Int) =
      emit(ls) fby linesParser(ExpectLineOrTrailer(actual + ls.size))

    def onTrailer(ls: Vector[String], count: Int, actual: Int) =
      if ((actual + ls.size) == count) emit(ls)
      else fail(new Exception(s"expected $count lines, got $actual"))

    receive1[Vector[String], Vector[String]] { case lines =>
      (lines, state) match {
        case (Vector(),                  _)                      => 
          halt
        case (HEADER(_) +: cols +: rest, ExpectHeader)           => 
          onHeader(rest)
        case (_,                         ExpectHeader)           => 
          fail(new Exception("expected a header"))
        case (ls :+ TRAILER(count),      ExpectLineOrTrailer(n)) =>
          onTrailer(ls, count.toInt, n)
        case (ls,                        ExpectLineOrTrailer(n)) => 
          onLines(ls, n)
      }
    }
  }

  val targetPath = path.replace(".DAT", "")+targetName

  val read = linesRChunk(path, bufferSize) |> 
             linesParser(ExpectHeader).map(lines => lines.mkString("\n"))
  val task = 
    ((read |> process1.intersperse("\n") |> 
    process1.utf8Encode) to io.fileChunkW(targetPath)).run

  task.attemptRun.leftMap(_.getMessage).map(_ => new File(targetPath))
}

The linesParser method uses receive1 to analyse:

  • the current state: are we expecting a HEADER, or some lines followed by a TRAILER?
  • the current chunk of lines

When we expect a HEADER and we have one, we skip the row containing the column names (see onHeader), we emit the rest of the lines to the linesParser (this is the recursive call) and we change the state to ExpectLineOrTrailer. If we get some lines with no TRAILER, we emit those lines and make a recursive call to linesParser with an incremented count to signal how many lines we've emitted so far (in the onLines method). Finally, if we get some lines and a TRAILER we check that the expected number of lines is equal to the actual one before emitting the lines and stopping the processing (no more recursive call in onTrailer).

For reference, here are the state objects used to track the current processing state:

sealed trait LineState

case object ExpectHeader                            extends LineState
case class  ExpectLineOrTrailer(lineCount: Int = 0) extends LineState

This new way of processing lines gets us:

  • a readable state machine with clear transitions, which was my first objective
  • adequate performances; it takes around 10 seconds to process a 700M file which is slightly more than the processLoop version but acceptable

One other explored avenue

It took me a loooooooooooong time to get there. I think I hit this issue when trying to use the built-in chunk combinator. When using chunk, my parser was being fed the same lines several times. For a chunk of 10 lines, I first had the first line, then the first 2, then the first 3,... Even with a modified version of chunk the performances were still very bad. This is why I wrote my own linesRChunk.

Now I got something working I hope that this will boost other's development time and show that it is possible to avoid loops + variables in that case!

05 December 2013

`runState` for a scalaz-stream Process

I was preparing to post this on the scalaz mailing-list but I thought that a short blog post could serve as a reference for other people as well. The following assumes that you have a good knowledge of Scalaz (at least of what's covered in my "Essence of the Iterator Pattern" post and some familiarity with the scalaz-stream library.

My use case

What I want to do is very common, just process a bunch of files! More precisely I want to (this is slightly simplified):

  1. read some pipe delimited files

  2. validate that the files have the proper internal structure:
    one(HEADER marker)
    one(column names)
    many(lines of pipe delimited values)
    one(TRAILER marker with total number of lines since the header)

  3. output only the lines which are not markers to another file

Scalaz stream

The excellent chapter 15 of Functional Programming in Scala highlights some of the potential problems with processing files:

  • you need to make sure you are closing resources properly even in the face of exceptions
  • you want to be able to easily compose small processing functions together instead of having a gigantic loop and a bunch of variables
  • you want to control the amount of data that is in memory at any moment in time

Based on the ideas of the book, Paul Chiusano created scalaz-stream, a library providing lots of combinators for doing this kind of input/output streaming operations (and more!).

A state machine for the job

My starting point for addressing our requirements is to devise a State object representing the both the expected file structure and the fact that some lines need to be filtered out. First of all I need to model the kind of lines I'm expecting when reading the file:

sealed trait LineState

case object ExpectHeader                            extends LineState
case object ExpectHeaderColumns                     extends LineState
case class  ExpectLineOrTrailer(lineCount: Int = 0) extends LineState

As you can see ExpectLineOrTrailer contains a counter to keep track of the number of lines seen so far.

Then I need a method (referred as the State function below) to update this state when reading a new line:

def lineState(line: String): State[Throwable \/ LineState, Option[String]] =
  State { state: Throwable \/ LineState =>
    def t(message: String) = new Exception(message).left

    (state, line) match {
      case (\/-(ExpectHeader), HeaderLine(_))           =>
        (ExpectHeaderColumns.right, None)
      case (\/-(ExpectHeaderColumns), _)                =>
        (ExpectLineOrTrailer(0).right, None)
      case (\/-(ExpectHeader), _)                       =>
        (t("expecting a header"), None)
      case (\/-(ExpectLineOrTrailer(n)), HeaderLine(_)) =>
        (t("expecting a line or a trailer"), None)
      case (\/-(ExpectLineOrTrailer(n)), TrailerLine(e)) =>
        if (n == e) (ExpectHeader.right, None)
        else        (t(s"wrong number of lines, expecting $e, got $n"), None)
      case (\/-(ExpectLineOrTrailer(n)), _)             =>
        (ExpectLineOrTrailer(n + 1).right, Some(line))
      case (-\/(e), _)                                  =>
        (state, None)
  }
}

The S type parameter (in the State[S, A] type) used to keep track of the "state" is Throwable \/ LineState. I'm using the "Left" part of the disjunction to represent processing errors. The error type itself is a Throwable. Originally I was using any type E but we'll see further down why I had to use exceptions. The value type A I extract from State[S, A] is going to be Option[String] in order to output None when I encounter a marker line.

This is all pretty good, functional and testable. But how can I use this state machine with a scalaz-stream Process?

runState

After much head scratching and a little help from the mailing-list (thanks Pavel!) I realized that I had to write a new driver for a Process. Something which would understand what to do with a State. Here is what I came up with:

def runState[F[_], O, S, E <: Throwable, A](p: Process[F, O])
                                           (f: O => State[E \/ S, Option[A]], initial: S)
                                           (implicit m: Monad[F], c: Catchable[F]) = {

  def go(cur: Process[F, O], init: S): F[Process[F, A]] = {
    cur match {
      case Halt(End) => m.point(Halt(End))
      case Halt(e)   => m.point(Halt(e))

      case Emit(h: Seq[O], t: Process[F, O]) => {
        println("emitting lines here!")
        val state = h.toList.traverseS(f)
        val (newState, result) = state.run(init.right)
        newState.fold (
          l => m.point(fail(l)),
          r => go(t, r).map(emitAll(result.toSeq.flatten) ++ _)
        )
      }

      case Await(req, recv, fb: Process[F, O], cl: Process[F, O]) =>
        m.bind (c.attempt(req.asInstanceOf[F[Any]])) { _.fold(
        { case End => go(fb, init)
          case e   => go(cl.causedBy(e), init) },
        o => go(recv.asInstanceOf[Any => Process[F ,O]](o), init)) }
    }
  }
  go(p, initial)
}

This deserves some comments :-)

The idea is to recursively analyse what kind of Process we're currently dealing with:

  1. if this is a Halt(End) we've terminated processing with no errors. We then return an empty Seq() in the context of F (hence the m.point operation). F is the monad that provides us input values so we can think of all the computations happening here as happening inside F (probably a scalaz.concurrent.Task when reading file lines)

  2. if this is a Halt(error) we use the Catchable instance for F to instruct the input process what to do in the case of an error (probably close the file, clean up resources,...)

  3. if this is an Emit(values, rest) we traverseS the list of values in memory with our State function and we use the initial value to get: 1. the state at the end of the traversal, 2. all the values returned by our State at each step of its execution. Note that the traversal will happen on all the values in memory, there won't be any short-circuiting if the State indicates an error. Also, this is important, the traverseS method is not trampolined. This means that we will get StackOverflow exceptions if the "chunks" that we are processing are too big. On the other hand we will avoid trampolining on each line so we should get good performances. If there was an error we stop all processing and return the error otherwise we emit all the values collected by the State appended to a recursive call to go

  4. if this is an Await Process we attempt to read input values, with c.attempt, and use the recv function to process them. We can do that "inside the F monad" by using the bind (or flatMap) method. The resulting Process is sent to go in order to be processed with the State function

Note what we do in case 2. when the newState returns an exception.left. We create a Process.fail process with the exception. This is why I used a Throwable to represent errors in the State function.

Now let's see how to use this new "driver".

Let's use it

First of all, we create a test file:

import scalaz.stream._
import Process._

val lines = """|HEADER|file
               |header1|header2
               |val11|val12
               |val21|val22
               |val21|val22
               |TRAILER|3""".stripMargin

// save 100 times the lines above in a file
(fill(100)(lines).intersperse("\n").pipe(process1.utf8Encode)
  .to(io.fileChunkW("target/file.dat")).run.run

Then we read the file but we buffer 50 lines at the time to control our memory usage:

val lines = io.linesR("target/file.dat").buffer(50)

We're now ready to run the state function:

// this task processes the lines with our State function
// the initial State is `ExpectHeader` because this is what we expect the first line to be
val stateTask: Task[Process[Task, String]] = runState(lines)(lineState, ExpectHeader)

// this one outputs the lines to a result file
// separating each line with a new line and encoding it in UTF-8
val outputTask: Task[Unit] = stateTask.flatMap(_.intersperse("\n").pipe(process1.utf8Encode)
                                      .to(io.fileChunkW("target/result.dat")).run)

// if the processing throws an Exception it will be retrieved here
val result: Throwable \/ Unit = task.attemptRun

When we finally run the Task, the result is either ().right if we were able to read, process, and write back to disc or exception.left if there was any error in the meantime, including when checking if the file has a valid structure.

The really cool thing about all of this is that we can now precisely control the amount of memory consumed during our processing by using the buffer method. In the example above we buffer 50 lines at the time then we process them in memory using traverseS. This is why I left a println statement in the runState method. I wanted to see "with my own eyes" how buffering was working. We could probably load more lines but the trade-off will then be that the stack that is consumed by traverseS will grow and that we might face StackOverflow exceptions.

I haven't done yet any benchmark but I can imagine lots of different ways to optimise the whole thing for our use case.

try { blog } finally { closing remarks }

I'm only scratching the surface of the scalaz-stream library and there is still a big possibility that I completely misunderstood something obvious!

First, it is important to say that you might not need to implement the runState method if you don't have complex validation requirements. There are 2 methods, chunkBy and chunkBy2, which allow to create "chunks" of lines based on a given line (for chunk) or pair of lines (for chunk2) naturally serving as "block" delimiters in the read file (for example a pair of "HEADER" followed by a "TRAILER" in my file).

Second, it is not yet obvious to me if I should use ++ or fby when I'm emitting state-processed lines + "the rest" (in step 2 when doing: emitAll(result.toSeq.flatten) ++ _). The difference has to do with error/termination management (the fallback process of Await) and I'm still unclear on how/when to use this.

Finally I would say that the scalaz-stream library is intriguing in terms of types. A process is Process[F[_], O] where O is the type of the output and the type of the input is... nowhere? Actually it is in the Await[F[_], A, O] constructor as a forall type. That's not all. In Await you have the type of request, F[A], a function to process elements of type A: recv: A => Process[F, O] but no way to extract or map the value A from the request to pass it to the recv method! The only way to do that is to provide an additional constraint to the "driver method" by saying, for example, that there is an implicit Monad[F] somewhere. This is the first time that I see a design where we build structures and then we give them properties when we want to use them. Very unusual.

I hope this can help other people exploring the library and, who knows, some of this might end up being part of it. Let's see what Paul and others think...

27 July 2013

Endorsing the move on to Java 6

This is a short public announcement to say that, as the maintainer of an open-source project, I support the move to Java 6 this year. I encourage other OSS projects, especially in the Scala ecosystem, to support this move as well.

20 June 2013

A Zipper and Comonad example

There are some software concepts which you hear about and after some time you roughly understand what they are. But you still wonder: "where can I use this?". "Zippers" and "Comonads" are like that. This post will show an example of:

  • using a Zipper for a list
  • using the Comonad cojoin operation for the Zipper
  • using the new specs2 contain matchers to specify collection properties

The context for this example is simply to specify the behaviour of the following function:

def partition[A](seq: Seq[A])(relation: (A, A) => Boolean): Seq[NonEmptyList[A]]

Intuitively we want to partition the sequence seq into groups so that all the elements in a group have "something in common" with at least one other element. Here is a concrete example

val near = (n1: Int, n2: Int) => math.abs(n1 - n2) <= 1
partition(Seq(1, 2, 3, 7, 8, 9))(near)

> List(NonEmptyList(1, 2, 3), NonEmptyList(7, 8, 9))

Properties

If we want to encode this behaviour with ScalaCheck properties we need to check at least 3 things:

  1. for each element in a group, there exists at least another related element in the group
  2. for each element in a group, there doesn't exist a related element in any other group
  3. 2 elements which are not related must end up in different groups

How do we translate this to some nice Scala code?

Contain matchers

"for each element in a group, there exists another related element in the same group"

prop { (list: List[Int], relation: (Int, Int) => Boolean) =>
  val groups = partition(list)(relation)
  groups must contain(relatedElements(relation)).forall
}

The property above uses a random list, a random relation, and does the partitioning into groups. We want to check that all groups satisfy the property relatedElements(relation). This is done by:

  • using the contain matcher
  • passing it the relatedElements(relation) function to check a given group
  • do this check forall groups

The relatedElements(relation) function we pass has type NEL[Int] => MatchResult[NEL[Int]] (type NEL[A] = NonEmptyList[A]) and is testing each group. What does it do? It checks that each element of a group has at least one element that is related to it.

def relatedElements(relation: (Int, Int) => Boolean) = (group: NonEmptyList[Int]) => {
  group.toZipper.cojoin.toStream must contain { zipper: Zipper[Int] =>
    (zipper.lefts ++ zipper.rights) must contain(relation.curried(zipper.focus)).forall
  }
}

This function is probably a bit mysterious so we need to dissect it.

Zipper

In the relatedElements function we need to check each element of a group in relation to the other elements. This means that we need to traverse the sequence, while keeping the context of where we are in the traversal. This is exactly what a Zipper is good at!

A List Zipper is a structure which keeps the focus on one element of the list and can return the elements on the left or the elements on the right. So in the code above we transform the group into a Zipper with the toZipper method. Note that this works because the group is a NonEmptyList. This wouldn't work with a regular List because a Zipper cannot be empty, it needs something to focus on:

// a zipper for [1, 2, 3, 4, 5, 6, 7, 8, 9]
//     lefts      focus    rights
// [  [1, 2, 3]     4      [5, 6, 7, 8, 9]  ]

Now that we have a Zipper that is focusing on the one element of the group. But we don't want to test only one element, we want to test all of them, so we need to get all the possible zippers over the original group!

Cojoin

It turns out that there is a method doing exactly this for Zippers, it is called cojoin. I won't go here into the full explanation of what a Comonad is, but the important points are:

  • Zipper has a Comonad instance
  • Comonad has a cojoin method with this signature cojoin[A](zipper: Zipper[A]): Zipper[Zipper[A]]

Thanks to cojoin we can create a Zipper of all the Zippers, turn it into a Stream[Zipper[Int]] and do the checks that really matters to us

def relatedElements(relation: (Int, Int) => Boolean) = (group: NonEmptyList[Int]) => {
  group.toZipper.cojoin.toStream must contain { zipper: Zipper[Int] =>
    val otherElements = zipper.lefts ++ zipper.rights
    otherElements must contain(relation.curried(zipper.focus))
  }
}

We get the focus of the Zipper, an element, and we check it is related to at least one other element in that group. This is easy because the Zipper gives us all the other elements on the left and on the right.

Cobind

If you know a little bit about Monads and Comonads you know that there is a dualism between join in Monads and cojoin in Comonads. But there is also one between bind and cobind. Is it possible to use cobind then to implement the relatedElements function? Yes it is, and the result is slightly different (arguably less understandable):

def relatedElements(relation: (Int, Int) => Boolean) = (group: NonEmptyList[Int]) => {
  group.toZipper.cobind { zipper: Zipper[Int] =>
    val otherElements = zipper.lefts ++ zipper.rights
    otherElements must contain(relation.curried(zipper.focus))
  }.toStream must contain((_:MatchResult[_]).isSuccess).forall
}

In this case we cobind each zipper with a function that will check if there are related elements in the groups. This will gives us back a Zipper of results and we need to make sure that it full of success values.

Second property

"for each element in a group, there doesn't exist a related element in another group"

prop { (list: List[Int], relation: (Int, Int) => Boolean) =>
  val groups = partition(list)(relation)
  groups match {
    case Nil          => list must beEmpty
    case head :: tail => nel(head, tail).toZipper.cojoin.toStream must not contain(relatedElementsAcrossGroups(relation))
  }
}

This property applies the same technique but now across groups of elements by creating a Zipper[NonEmptyList[Int]] instead of a Zipper[Int] as before:

def relatedElementsAcrossGroups(relation: (Int, Int) => Boolean) = (groups: Zipper[NonEmptyList[Int]]) =>
  groups.focus.list must contain { e1: Int =>
    val otherGroups = (groups.lefts ++ groups.rights).map(_.list).flatten
    otherGroups must contain(relation.curried(e1))
  }

Note that the ability to "nest" the new specs2 contain matchers is very useful in this situation.

Last property

Finally the last property is much easier because it doesn't require any context to be tested. For this property we just make sure that no element is ever related to another one and check that they end up partitioned into distinct groups.

"2 elements which are not related must end up in different groups"

prop { (list: List[Int]) =>
  val neverRelated = (n1: Int, n2: Int) => false
  val groups = partition(list)(neverRelated)
  groups must have size(list.size)
}

Conclusion

Building an intuition for those crazy concepts is really what counts. For me it was "traversal with a context". Then I was finally able to spot it in my own code.

08 June 2013

Specs2 2.0 - Interpolated - RC2

This is a quick update to present the main differences with specs2 2.0-RC1. I have been fixing a few bugs but more importantly I have:

  • made the Tags trait part of the standard specification
  • removed some arguments for reporting and made the formatting of specifications more granular

This all started with an issue on Github...

Formatting

Creating reports for specifications is a bit tricky. On one hand you hand different possible "styles" for the specifications: "old" acceptance style (with the ^ operator), "new" acceptance style (with interpolated strings), "unit" style... Then, on the other hand, you want to report the results in the console, where information is logged on a line-by-line base and in HTML files, where newlines, whitespace and indentation all needs great care.

I don't think I got it quite right yet, especially for HTML, but working on issue #162 forced me to make specs2 implementation and API a bit more flexible. In particular, in specs2 < 2.0, you could set some arguments to control the display of the specification in the console and/or HTML. For example noindent is a Specification argument saying that you don't want the automatic indentation of text and examples. And markdown = false means that you don't want text to be parsed as Markdown before being rendered to HTML.

However issue 162 shows that setting formatting properties at the level of the whole specification doesn't play well with other features like specification inclusion. I decided to fix this issue by using an existing specs2 feature: tags.

Tags and Specification

Tags in specs2 are different from tags you can find in other testing libraries. Not only you can tag single examples but you can also mark a full section of a specification with some tags. We can use this capability to select specific parts of a specification for execution but we can also use it to direct the formatting of the specification text. For example you can now write:

class MySpec extends Specification { def is = s2""" ${formatSection(verbatim = false)}
 This text uses Markdown when printed to html, however if some text is indented with 4 spaces
     it should *not* be rendered as a code block because `verbatim` is false.

  """
}

Given the versatile use of tags now, I decided to include the Tags trait, by default, in the Specification class. I resisted doing that in the past because I didn't want to encumber too much the Specification namespace with something that was rarely used by some users. Which leads me to the following tip on how to use the Specification class:

  • when starting a new project or prototyping some code, use the Specification class directly with all inherited features

  • when making your project more robust and production-like, create your own Spec trait, generally inheriting from the BaseSpecification class for basic features, and mix in only the traits you think you will generally use

This should give you more flexibility and choice over which specs2 feature you want to use with a minimal cost in terms of namespace footprint and compile times (because each new implicit you bring in might have an impact in terms of performances)

API changes

The consequence of this evolution is yet another API break:

  • the Text and Example classes now use a FormattedString class containing the necessary parameters to display that string as HTML or in the console
  • for implementation reasons I have actually changed the constructor parameters of all Fragment classes to avoid storing state as private variables
  • the noindent, markdown arguments are now gone (you need to replace them with ${formatSection(flow=true)} and ${formatSection(markdown=true)}, see below)
  • the Tags trait is mixed in the Specification class so if you had methods like def tag you might get conflicts

And there are now 2 methods formatSection(flow: Boolean, markdown: Boolean, verbatim: Boolean) and formatTag(flow: Boolean, markdown: Boolean, verbatim: Boolean) to tag specification fragments with the following parameters:

  • flow: the fragment (Text or Example) shouldn't be reported with automatic indenting (default = false, set automatically to true when using s2 interpolated strings)
  • markdown: the fragment is using Markdown (default = true)
  • verbatim: indented text with more than 4 spaces must be rendered as a code block (default = true but can be set to false to solve #162)

HTML reports

I'm currently thinking that I should try out a brand new way of translating an executed specification with interpolated text into HTML. My first attempts were not completely successful and I find it hard to preserve the original layout of the specification text, especially with the Markdown translation in the middle. Yet, I must say a word on the Markdown library I'm using, Pegdown. I found this library extremely easy to adapt for my current needs (to implement the verbatim = false option) and I send my kudos to Mathias for such a great job.


This is it. Download RC2, use it and provide feedback as usual, thanks!

21 May 2013

Specs2 2.0 - Interpolated

The latest release of specs2 (2.0) deserves a little bit more than just release notes. It needs explanations, apologies and a bit of celebration!

Explanations

  • why is there another (actually several!) new style(s) of writing acceptance specifications
  • what are Scripts and ScriptTemplates
  • what has been done for compilation times
  • what you can do with Snippets
  • what is an ExampleFactory

Apologies

  • the >> / in problem
  • API breaks
  • Traversable matchers
  • AroundOutside and Fixture
  • the never-ending quest for Given/When/Then specifications

Celebration

  • compiler-checked documentation!
  • "operator-less" specifications!
  • more consistent Traversable matchers!

Explanations

Scala 2.10 is a game changer for specs2, thanks to 2 features: String interpolation and Macros.

String interpolation

Specs2 has been designed from the start with the idea that it should be immutable by default. This has led to the definition of Acceptance specifications with lots of operators, or, as someone put it elegantly, "code on the left, brainfuck on the right":

class HelloWorldSpec extends Specification { def is =

  "This is a specification to check the 'Hello world' string"            ^
                                                                         p^
    "The 'Hello world' string should"                                    ^
    "contain 11 characters"                                              ! e1^
    "start with 'Hello'"                                                 ! e2^
    "end with 'world'"                                                   ! e3^
                                                                         end
  def e1 = "Hello world" must have size(11)
  def e2 = "Hello world" must startWith("Hello")
  def e3 = "Hello world" must endWith("world")
}

Fortunately Scala 2.10 now offers a great alternative with String interpolation. In itself, String interpolation is not revolutionary. A string starting with s can have interpolated variables:

val name = "Eric"
s"Hello $name!"

Hello Eric!

But the great powers behind Scala realized that they could both provide standard String interpolation and give you the ability to make your own. Exactly what I needed to make these pesky operators disappear!

class HelloWorldSpec extends Specification { def is =         s2"""

 This is a specification to check the 'Hello world' string

 The 'Hello world' string should
   contain 11 characters                                      $e1
   start with 'Hello'                                         $e2
   end with 'world'                                           $e3
                                                              """

   def e1 = "Hello world" must have size(11)
   def e2 = "Hello world" must startWith("Hello")
   def e3 = "Hello world" must endWith("world")
}

What has changed in the specification above is that text Fragments are now regular strings in the multiline s2 string and the examples are now inserted as interpolated variables. Let's explore in more details some aspects of this new feature:

  • layout
  • examples descriptions
  • other fragments
  • implicit conversions
  • auto-examples
Layout

If you run the HelloWorldSpec you will see that the indentation of each example is respected in the output:

This is a specification to check the 'Hello world' string

The 'Hello world' string should
  + contain 11 characters
  + start with 'Hello'
  + end with 'world'

This means that you don't have to worry anymore about the layout of text and use the p, t, bt, end, endp formatting fragments as before.

Examples descriptions

On the other hand, the string which is taken as the example description is not as well delimited anymore, so it is now choosen by convention to be everything that is on the same line. For example this is what you get with the new interpolated string:

s2"""
My software should
  do something that it pretty long to explain,
  so long that it needs 2 lines" ${ 1 must_== 1 }
"""
My software should
  do something that it pretty long to explain,
  + so long that it needs 2 lines"

If you want the 2 lines to be included in the example description you will need to use the "old" form of creating an example:

s2"""
My software should
${ """do something that it pretty long to explain,
    so long that it needs 2 lines""" ! { 1 must_== 1 } }
"""
My software should+ do something that it pretty long to explain,
    so long that it needs 2 lines

But I suspect that there will be very few times when you will want to do that.

Other fragments and variables

Inside the s2 string you can interpolate all the usual specs2 fragments: Steps, Actions, included specifications, Forms... However you will quickly realize that you can not interpolate arbitrary objects. Indeed, excepted specs2 objects, the only other 2 types which you can use as variables are Snippets (see below) and Strings.

The restriction is there to remind you that, in general, interpolated expressions are "unsafe". If the expression you're interpolating is throwing an Exception, as it is commonly the case with tested code, there is no way to catch that exception. If that exception is uncaught, the whole specification will fail to be built. Why is that?

Implicit conversions

When I first started to experiment with interpolated strings I thought that they could even be used to write Unit Specifications:

s2"""
This is an example of conversion using integers ${
  val (a, b) = ("1".toInt, "2".toInt)
  (a + b) must_== 3
}
"""

Unfortunately such specifications will horribly break if there is an error in one of the examples. For instance if the example was:

This is an example of conversion using integers ${
  // oops, this is going to throw a NumberFormatException!
  val (a, b) = ("!".toInt, "2".toInt) 
  (a + b) must_== 3
}

Then the whole string and the whole specification will fail to be instantiated!

The reason is that everything you interpolate is converted, through an implicit conversion, to a "SpecPart" which will be interpreted differently depending on its type. If it is a Result then we will interpret this as the body of an Example and use the preceding text as the description. If it is just a simple string then it is just inserted in the specification as a piece of text. But implicit conversions of a block of code, as above, are not converting the whole block. They are merely converting the last value! So if anything before the last value throws an Exception you will have absolutely no way to catch it and it will bubble up to the top.

That means that you need to be very prudent when interpolating arbitrary blocks. One work-around is to do something like that

import execute.{AsResult => >>}
s2"""

This is an example of conversion using integers ${>>{
  val (a, b) = ("!".toInt, "2".toInt)
  (a + b) must_== 3
}}
  """

But you have to admit that the whole ${>>{...}} is not exactly gorgeous.

Auto-examples

One clear win of Scala 2.10 for specs2 is the use of macros to capture code expressions. This particularly interesting with so-called "auto-examples". This feature is really useful when your examples are so self-descriptive that a textual description feels redundant. For example if you want to specify the String.capitalize method:

s2"""
 The `capitalize` method verifies
 ${ "hello".capitalize       === "Hello" }
 ${ "Hello".capitalize       === "Hello" }
 ${ "hello world".capitalize === "Hello world" }
"""
 The `capitalize` method verifies
 + "hello".capitalize       === "Hello"
 + "Hello".capitalize       === "Hello"
 + "hello world".capitalize === "Hello world"
 

It turns out that the method interpolating the s2 extractor is using a macro to extract the text for each interpolated expression and so, if on a given line there is no preceding text, we take the captured expression as the example description. It is important to note that this will only properly work if you enable the -Yrangepos scalac option (in sbt: scalacOptions in Test := Seq("-Yrangepos")).

However the drawback of using that option is the compilation speed cost which you can incur (around 10% in my own measurements). If you don't want (or you forget :-)) to use that option there is a default implementation which should do the trick in most cases but which might not capture all the text in some edge cases.

Scripts

The work on Given/When/Then specifications has led to a nice generalisation. Since the new GWT trait decouples the specification text from the steps and examples to create, we can push this idea a bit further and create "classical" specifications where the text is not annotated at all and examples are described somewhere else.

Let's see what we can do with the org.specs2.specification.script.Specification class:

import org.specs2._
import specification._

class StringSpecification extends script.Specification with Grouped { def is = s2"""

Addition
========

 It is possible to add strings with the + operator
  + one string and an empty string
  + 2 non-empty strings

Multiplication
==============

 It is also possible to duplicate a string with the * operator
  + using a positive integer duplicates the string
  + using a negative integer returns an empty string
                                                                                """

  "addition" - new group {
    eg := ("hello" + "") === "hello"
    eg := ("hello" + " world") === "hello world"
  }
  "multiplication" - new group {
    eg := ("hello" * 2) === "hellohello"
    eg := ("hello" * -1) must beEmpty
  }
}

With script.Specifications you just provide a piece of text where examples are starting with a + sign and you specify examples groups. Example groups were introduced in a previous version of specs2 with the idea of providing standard names for examples in Acceptance specifications.

When the specification is executed, the first 2 example lines are mapped to the examples of the first group, and the examples lines from the next block (as delimited with a Markdown title) are used to build examples by taking expectations in the second group (those group are automatically given names, g1 and g2, but you can specify them yourself: "addition" - new g1 {...).

This seems to be a lot of "convention over configuration" but this is actually all configurable! The script.Specification class is an example of a Script and it is associated with a ScriptTemplate which defines how to parse text to create fragments based on the information contained in the Script (we will see another example of this in action below with the GWT trait which proposes another type of Script named Scenario to define Given/When/Then steps).

There are lots of advantages in adopting this new script.Specification class:

  • it is "operator-free", there's no need to annotate your specification on the right with strange symbols

  • tags are automatically inserted for you so that it's easy to re-run a specific example or group of examples by name: test-only StringSpecification -- include g2.e1

  • examples are marked as pending if you haven't yet implemented them

  • it is configurable to accomodate for other templates (you could even create Cucumber-like specifications if that's your thing!)

The obvious drawback is the decoupling between the text and the examples code. If you restructure the text you will have to restructure the examples accordingly and knowing which example is described by which piece of text is not obvious. This, or operators on the right-hand side, choose your poison :-)

Compilation times

Scala's typechecking and JVM interoperability comes with a big price in terms of compilation times. Moderately-sized projects can take minutes to compile which is very annoying for someone coming from Java or Haskell.

Bill Venners has tried to do a systematic study of which features in testing libraries seems to have the biggest impact. It turns out that implicits, traits and byname parameters have a significant impact on compilation times. Since specs2 is using those features more than any other test library, I tried to do something about it.

The easiest thing to do was to make Specification an abstract class, not a trait (and provide the SpecificationLike trait in its place). My unscientific estimation is that this single change removed 0.5 seconds per compiled file (from 313s to 237s for the specs2 build, and a memory reduction of 55Mb, from 225Mb to 170Mb).

Then, the next very significant improvement was to use interpolated specifications instead of the previous style of Acceptance specifications. The result is impressive: from 237 seconds to 150 seconds and a memory reduction of more than 120Mb, from 170Mb to 47Mb!

On the other hand, when I tried to remove some of the byname parameters (the left part of a must_== b) I didn't observe a real impact on compilation times (only 15% less memory).

The last thing I did was to remove some of the default matchers (and to add a few others). Those matchers are the "content" matchers: XmlMatchers, JsonMatchers, FileMatchers, ContentMatchers (and I added instead the TryMatchers). I did this to remove some implicits from the scope when compiling code but also to reduce the namespace footprint everytime you extend the Specification class. However I couldn't see a major improvement to compile-time performances with this change.

Snippets

One frustration of software documentation writers is that it is very common to have stale or incorrect code because the API has moved on. What if it was possible to write some code, in the documentation, that will be checked by the compiler? And automatically refactored when you change a method name?

This is exactly what Snippets will do for you. When you want to capture and display a piece of code in a Specification you create a Snippet:

s2"""
This is an example of addition: ${snippet{

// who knew?
1 + 1 == 2
}}
"""

This renders as:

This is an example of addition

// who knew?
1 + 1 == 2

And yes, you guessed it right, the Snippet above was extracted by using another Snippet! I encourage you to read the documentation on Snippets to see what you can do with them, the main features are:

  • code evaluation: the last value can be displayed as a result

  • checks: the last value can be checked and reported as a failure in the Specification

  • code hiding: it is possible to hide parts of the code (initialisations, results) by enclosing them in "scissors" comments of the form // 8<--

Example factory

Every now and then I get a question from users who want to intercept the creation of examples and use the example description to do interesting things before or after the example execution. It is now possible to do so by providing another ExampleFactory rather than the default one:

import specification._

class PrintBeforeAfterSpec extends Specification { def is =
  "test" ! ok

  case class BeforeAfterExample(e: Example) extends BeforeAfter {
    def before = println("before "+e.desc)
    def after  = println("after "+e.desc)
  }

  override def exampleFactory = new ExampleFactory {
    def newExample(e: Example) = {
      val context = BeforeAfterExample(e)
      e.copy(body = () => context(e.body()))
    }
  }
}

The PrintBeforeAfterSpec will print the name of each example before and after executing it.

Apologies

the >> / in problem

This issue has come up at different times and one lesson is: Unit means "anything" so don't try to be too smart about it. So I owe an apology to the users for this poor API design choice and for the breaking API change that is now ensuing. Please read the thread in the Github issue to learn how to fix compile errors that would result from this change.

API breaks

While we're on the subject of API breaks, let's make a list:

  • Unit values in >> / in: now you need to explicitly declare if you mean "a list of examples created with foreach" or "a list of expectations created with foreach"

  • Specification is not a trait anymore so you should use the SpecificationLike trait instead if that's what you need (see the Compilation times section)

  • Some matchers traits have been removed from the default matchers (XML, JSON, File, Content) so you need to explicitly mix them in (see the Compilation times section)

  • The Given/When/Then functionality has been extracted as a deprecated trait specification.GivenWhenThen (see the Given/When/Then? section)

  • the negation of the Map matchers has changed (this can be considered as a fix but this might be a run-time break for some of you)

  • many of the Traversable matchers have been deprecated (see the next section)

Traversable matchers

I've had this nagging thought in my mind for some time now but it only reached my conscience recently. I always felt that specs2 matchers for collections were a bit ad-hoc, with not-so-obvious ways to do simple things. After lots of fighting with implicit classes, overloading and subclassing, I think that I have something better to propose.

With the new API we generalize the type of checks you can perform on elements:

  • Seq(1, 2, 3) must contain(2) just checks for the presence of one element in the sequence

  • this is equivalent to writing Seq(1, 2, 3) must contain(equalTo(2)) which means that you can pass a matcher to the contain method. For example containAnyOf(1, 2, 3) is contain(anyOf(1, 2, 3)) where anyOf is just another matcher

  • and more generally, you can pass any function returning a result! Seq(1, 2, 3) must contain((i: Int) => i must beEqualTo(2)) or Seq(1, 2, 3) must contain((i: Int) => i == 2) (you can even return a ScalaCheck Prop if you want)

Then we can use combinators to specify how many times we want the check to be performed:

  • Seq(1, 2, 3) must contain(2) is equivalent to Seq(1, 2, 3) must contain(2).atLeastOnce

  • Seq(1, 2, 3) must contain(2).atMostOnce

  • Seq(1, 2, 3) must contain(be_>=(2)).atLeast(2.times)

  • Seq(1, 2, 3) must contain(be_>=(2)).between(1.times, 2.times)

This covers lots of cases where you would previously use must have oneElementLike(partialFunction) or must containMatch(...). This also can be used instead of the forall, atLeastOnce methods. For example forall(Seq(1, 2, 3)) { (i: Int) => i must be_>=(0) } is Seq(1, 2, 3) must contain((i: Int) => i must be_>=(0)).forall.

The other type of matching which you want to perform on collections is with several checks at the time. For example:

  • Seq(1, 2, 3) must contain(allOf(2, 3))

This seems similar to the previous case but the combinators you might want to use with several checks are different. exactly is one of them:

  • Seq(1, 2, 3) must contain(exactly(3, 1, 2)) // we don't expect ordered elements by default

Or inOrder

  • Seq(1, 2, 3) must contain(exactly(be_>(0), be_>(1), be_>(2)).inOrder) // with matchers here

One important thing to note though is that, when you are not using inOrder, the comparison is done greedily, we don't try all the possible combinations of input elements and checks to see if there would be a possibility for the whole expression to match.

Please explore this new API and report any issue (bug, compilation error) you will find. Most certainly the failure reporting can be improved. The description of failures is much more centralized with this new implementation but also a bit more generic. For now, the failure messages are just listing which elements were not passing the checks but they do not output something nice like: The sequence 'Seq(1, 2, 3) does not contain exactly the elements 4 and 3 in order: 4 is not found'.

AroundOutside vs Fixture

My approach to context management in specs2 has been very progressive. First I provided the ability to insert code (and more precisely effects) before or after an Example, reproducing here standard JUnit capabilities. Then I've introduced Around to place things "in" a context, and Outside to pass data to an example. And finally AroundOutside as the ultimate combination of both capabilities.

I thought that with AroundOutside you could do whatever you needed to do, end of story. It turns out that it's not so simple. AroundOutside is not general enough because the generation of Outside data cannot be controled by the Around context. This proved to be very problematic for me on a specific use case where I needed to re-run the same example, based on different parameters, with slightly different input data each time. AroundOutside was just not doing it. The solution? A good old Fixture. Very simple, a Fixture[T], is a trait like that:

trait Fixture[T] {
  def apply[R : AsResult](f: T => R): Result
}

You can define an implicit fixture for all the examples:

class s extends Specification { def is = s2"""
  first example using the magic number $e1
  second example using the magic number $e1
"""

  implicit def magicNumber = new specification.Fixture[Int] {
    def apply[R : AsResult](f: Int => R) = AsResult(f(10))
  }

  def e1 = (i: Int) => i must be_>(0)
  def e2 = (i: Int) => i must be_<(100)
}

I'm not particularly happy to add this to the API because it adds to the overall API footprint and learning curve, but in some scenarios this is just indispensable.

Given/When/Then?

With the new "interpolated" style I had to find another way to write Given/When/Then (GWT) steps. But this is tricky. The trouble with GWT steps is that they are intrisically dependent. You cannot have a Then step being defined before a When step for example.

The "classic" style of acceptance specification is enforcing this at compile time because, in that style, you explicitly chain calls and the types have to "align":

class GivenWhenThenSpec extends Specification with GivenWhenThen { def is =

  "A given-when-then example for a calculator"                 ^ br^
    "Given the following number: ${1}"                         ^ aNumber^
    "And a second number: ${2}"                                ^ aNumber^
    "And a third number: ${6}"                                 ^ aNumber^
    "When I use this operator: ${+}"                           ^ operator^
    "Then I should get: ${9}"                                  ^ result^
                                                               end

  val aNumber: Given[Int]                 = (_:String).toInt
  val operator: When[Seq[Int], Operation] = (numbers: Seq[Int]) => (s: String) => Operation(numbers, s)
  val result: Then[Operation]             = (operation: Operation) => (s: String) => { operation.calculate  must_== s.toInt }

  case class Operation(numbers: Seq[Int], operator: String) {
    def calculate: Int = if (operator == "+") numbers.sum else numbers.product
  }
}

We can probably do better than this. What is required?

  • to extract strings from text and transform them to well-typed values
  • to define functions using those values so that types are respected
  • to restrict the composition of functions so that a proper order of Given/When/Then is respected
  • to transform all of this into Steps and Examples

So, with apologies for coming up with yet-another-way of doing the same thing, let me introduce you to the GWT trait:

import org.specs2._                                                                                      
import specification.script.{GWT, StandardRegexStepParsers}                                                                                                         
                                                                                                         
class GWTSpec extends Specification with GWT with StandardRegexStepParsers { def is = s2"""              
                                                                                                         
 A given-when-then example for a calculator                       ${calculator.start}                   
   Given the following number: 1                                                                         
   And a second number: 2                                                                                
   And a third number: 6                                                                                 
   When I use this operator: +                                                                           
   Then I should get: 9                                                                                  
   And it should be >: 0                                          ${calculator.end}
                                                                  """

  val anOperator = readAs(".*: (.)$").and((s: String) => s)

  val calculator =
    Scenario("calculator").
      given(anInt).
      given(anInt).
      given(anInt).
      when(anOperator) { case op :: i :: j :: k :: _ => if (op == "+") (i+j+k) else (i*j*k) }.
      andThen(anInt)   { case expected :: sum :: _   => sum === expected }.
      andThen(anInt)   { case expected :: sum :: _   => sum must be_>(expected) }

}

In the specification above, calculator is a Scenario object which declares some steps through the given/when/andThen methods. The Scenario class provides a fluent interface in order to restrict the order of calls. For example, if you try to call a given step after a when step you will get a compilation error. Furthermore steps which are using extracted values from previous steps must use the proper types, what you pass to the when step has to be a partial function taking in a Shapeless HList of the right type.

You will also notice that the calculator is using anInt, anOperator. Those are StepParsers, which are simple objects extracting values from a line of text and returning Either[Exception, T] depending on the correct conversion of text to a type T. By default you have access to 2 types of parsers. The first one is DelimitedStepParser which expects that values to extract are enclosed in {} delimiters (this is configurable). The other one is RegexStepParser which uses a regular expression with groups in order to know what to extract. For example anOperator defines that the operator to extract will be just after the column at the end of the line.

Finally the calculator scenario is inserted into the s2 interpolated string to delimitate the text it applies to. Scenario being a specific kind of Script it has an associated ScriptTemplate which defines that the last lines of the text should be paired with the corresponding given/when/then method declarations. This is configurable and we can imagine other ways of pairing text to steps (see the org.specs2.specification.script.GWT.BulletTemplate class for example).

For reasons which are too long to expose here I've never been a big fan of Given/When/Then specifications and I guess that the multitude of ways to do that in specs2 shows it. I hope however that the GWT fans will find this approach satisfying and customisable to their taste.

Celebration!

I think there are some really exciting things in this upcoming specs2 release for "Executable Software Specifications" lovers.

Compiler-checked documentation

Having compiler-checked snippets is incredibly useful. I've fixed quite a few bugs in boths specs2 and Scoobi user guides and I hope that I made them more resistant to future changes that will happen through refactoring (when just renaming things for example). I'm also very happy that, thanks to macros, the ability to capture code was extended to "auto-examples". In previous specs2 versions, this is implemented by looking at stack traces and doing horrendous calculations on where a piece of code would be. This gives me the shivers everytime I have to look at that code!

No operators

The second thing is Scripts and ScriptTemplates. There is a trade-off when writing specifications. On one hand we would like to read pure text, without the encumbrance of implementation code, on the other hand, when we read specification code, it's nice to have a short sentence explaining what it does. With this new release there is a continuum of solutions on this trade-off axis:

  1. you can have pure text, with no annotations but no navigation is possible to the code (with org.specs2.specification.script.Specification)
  2. you can have annotated text, with some annotations to access the code (with org.specs2.Specification)
  3. you can have text interspersed with the code (with org.specs2.mutable.Specification)
New matchers

I'm pretty happy to have new Traversable matchers covering a lot more use cases than before in a straight-forward manner. I hope this will reduce the thinking time between "I need to check that" and "Ok, this is how I do it".


Please try out the new Release Candidate, report bugs, propose enhancements and have fun!