Pages

06 May 2010

Mini-Parsers to the rescue

An example of curryfication

I'm implementing a reasonably complex algorithm at the moment with different transformation phases. One transformation is a "curryfication" of some terms to be able to transform some expressions like:

f(a, b, c)

to

.(.(.(f, a), b), c)

Being test-compulsive, I want to be able to test that my transformation works. Unfortunately the data structures I'm dealing with are pretty verbose in my tests.

One of my specs examples was this:

"A composed expression, when curried" should {
"be 2 successive applications of one parameter" +
"when there are 2 parameters to the method expression" in {
ComposedExp(MethodEx(method), const :: arb :: Nil).curryfy must_==
Apply(Apply(Curry(method), Curry(const)), Curry(arb))
}
}

But Apply(Apply(Curry(method), Curry(const)), Curry(arb))) is really less readable than .(.(method, const), arb). And this can only get worse on a more complex example.

With the help of a mini-parser

So I thought that I could just write a parser to recreate a "Curried" expression from a String.

A first look at the Scala 2.8.0 Scaladoc(filter with "parser") was a bit scary. Especially because my last parser exercise was really a long time ago now.

But no, parser combinators and especially small, specific parsers like the one I wrote, are really straightforward:

object CurriedParser extends JavaTokenParsers {
val parser: Parser[Curried] = application | constant
val constant = ident ^^ { s => Curry(s) }
val application = (".(" ~> parser) ~ (", " ~> constant <~ ")") ^^ { case a ~ b =>
Apply(a, b)
}
def fromString(s: String): Curried = parser.apply(new CharSequenceReader(s)).get
}

In the snippet above I declare that:

  • my parser returns a Curried object

  • my parser is either an application or a constant (with the | operator)

  • a constant is a Java identifier (ident is a parser inherited from the JavaTokenParsers trait)

  • a constant should be transformed to a Curry object

  • an application is composed of two parts (separated by the central ~ operator). The first part is: some syntax ("(") and the result of something to parse recursively with the parser (i.e. an application or a constant). The second part is a constant, surrounded by some syntax (", " and ")"). Note that the syntactic elements are being discarded by using ~> and <~ instead of ~.

  • once an application is parsed, part 1 and part 2 are accessible as matchable object of the form a ~ b and this object can be used to build an Apply object

  • the fromString method simply passes a String to the parser and gets the result

I have to say that this parser is really rudimentary. It doesn't handle errors in the input text, there absolutely needs to be a space after the comma, and so on.

Yet it really fits my testing purpose for a minimum amount of development time.

Open question

I hope that this post can serve as an example to anyone new to Scala wanting to play with parsers and I leave an open question for senior Scala developers:

Is there a way to extends the case classes representing algebraic datatypes in Scala so that:
  • each case class has a proper toString representation (that's already the case and that's one benefit of case classes), but that representation can be overriden (for example to replace Apply(x, y) with .(x, y))

  • there is an implicit parser that is able to reconstruct the hierarchy of objects from its string representation