Combinators in C#

by EFVincent on 05/03/2010

As C# has evolved it has acquired more and more of what some people refer to has functional programming features and constructs. One such concept is the idea that a function is a "first class” value. This is a fancy way of saying that functions are values that can be passed to and returned from other functions. A function that operates on other functions is called a high-order function.

Combinators are high order functions that compose, combine, or otherwise modify functions in useful and interesting ways. These types of operations are not typically seen in C#, but expanding your problem solving toolkit to include these concepts is not only fun and interesting, but can result in new, efficient, robust solutions.

The Timer

Imagine you’ve got a method that you want to put a stopwatch on. For example, in the following code, you need to see how long it’s taking each download to complete, for debugging information.

class Program {

    static string url;

    static void Main(string[] args) {

        url = "http://microsoft.com";
        RetrieveFromWeb();

        url = "http://amazon.com";
        RetrieveFromWeb();

        url = "http://dpreview.com";
        RetrieveFromWeb();

        Console.Write("Press any key...");
        Console.ReadKey(true);
    }

    static void RetrieveFromWeb() {
        System.Net.WebClient wc = new System.Net.WebClient();
        var s = wc.DownloadString(url);
        Console.WriteLine("URL: {0} - string length is {1}", url, s.Length);
    }
}

An ugly solution would be to do this.

static void Main(string[] args) {
    var sw = new System.Diagnostics.Stopwatch();

    sw.Start();
    url = "http://microsoft.com";
    RetrieveFromWeb();
    Console.WriteLine("{0:#,##0}ms", sw.ElapsedMilliseconds);

    sw.Restart();
    url = "http://amazon.com";
    RetrieveFromWeb();
    Console.WriteLine("{0:#,##0}ms", sw.ElapsedMilliseconds);

    sw.Restart();
    url = "http://dpreview.com";
    RetrieveFromWeb();
    Console.WriteLine("{0:#,##0}ms", sw.ElapsedMilliseconds);

    Console.Write("Press any key...");
    Console.ReadKey(true);
}

Try not to get physically ill. You and I both know there’s plenty of code running around in the wild that looks a lot like this. Red flags – you see repeated code – starting, stopping, printing the time. Should we go into the RetrieveFromWeb() method and modify it for timing? Let’s not. Lets try this instead. Step one, let’s define a variable for the method we’re calling. Leaving out the timing code for now, we make this change:


static void Main(string[] args) {

    // Create a delegate, set it to the method of interest

    Action retrieveFromWeb = RetrieveFromWeb;

    url = "http://microsoft.com";
    retrieveFromWeb();

    url = "http://amazon.com";
    retrieveFromWeb();

    url = "http://dpreview.com";
    retrieveFromWeb();

    Console.Write("Press any key...");
    Console.ReadKey(true);
}

I’ve added a local variable of type Action, and set it equal to the call to the method RetrieveFromWeb(), and now we’re calling that method indirectly. It has the same exact effect, the method gets called three times. Only we’ve added a layer of indirection. We do this all the time in OO programming; for example, you might have a Person object, but rather than coding directly to that object, you create an IPerson interface, and code to that, opening up the possibility of mocking the object, decorating it, etc. Similar thing here. Rather than “binding” the call sites directly to the method, we’re binding to a variable that points to the method.

We’re doing this because now we’ve got a value that can be altered or augmented to add additional functionality. This is where a combinator comes in. This is a simple timer combinator:

class Combinators {

    public static Action Time(Action a) {
        return () => {
            var sw = new System.Diagnostics.Stopwatch();
            sw.Start();
            try {
                a();
            } finally {
                Console.WriteLine("{0:#,##0}ms", sw.ElapsedMilliseconds);
            }
        };
    }
}

This combinator takes an Action and returns a new action (aka a new function), that has timing included. At line 8 the parameter action is being called, the timing is what’s added. The only change we have to make to the main method is where the delegate is being defined:


Action retrieveFromWeb = Combinators.Time(RetrieveFromWeb);

The rest of the method stays the same, but now, there’s timing added. This is a super-simple, contrived example, but you should be starting to see what’s possible. Let’s take this a step further. Here’s an example simulating retrieval of information from the database:

static void Main(string[] args) {

    Func<string, Guid> lookupUser = LookupUser;

    var emails = new[] {
        "eric@work.com", "joel@office.com", "cole@school.com",
        "karin@job.com", "haley@home.com" };

    foreach (var em in emails) {
        var id = lookupUser(em);
        Console.WriteLine("user {0} id = {1}", em, id);
    }

    Console.Write("Press any key...");
    Console.ReadKey(true);
}

static Guid LookupUser(string email) {
    // Fake looking up a user in the database
    var rnd = new Random(System.DateTime.Now.Millisecond);
    Thread.Sleep(rnd.Next(250, 2500));
    return Guid.NewGuid();
}

Memoization is the idea that a function can remember the results for given parameters. It’s like caching, but the term is specific to caching function results based on input. Here’s a simple Memoization combinator for C#:

public static Func<A, B> Memoize<A, B>(Func<A, B> fn) {
    var dict = new Dictionary<A, B>();
    return a => {
        B b = default(B);
        if (!dict.TryGetValue(a, out b)) {
            b = fn(a);
            dict.Add(a, b);
        }
        return b;
    };
}

It’s simple, but it demonstrates some very useful and interesting functional programming concepts. First, it’s takes and returns Func<A,B>. This is a delegate with a parameter of type A that returns a type B. This will work for effectively any method with that signature. Next point of interest, a dictionary is created, then the lambda is created and returned. The lambda refers to the dictionary defined outside the lambda. It is said that the dictionary is captured in a closure. It’s not important that you remember the terms, but look over the code and see if the concept is clicking for you. This function will return (effectively) a function, the dictionary is captured by that function. Even when the call to Memoize() goes out of scope, the dict variable will still exist in the returned function. Enough talk. We modify the main program just slightly:

Func<string, Guid> lookupUser = Combinators.Memoize(LookupUser);

The Memoize function will create a new function, one that caches results of the LookupUser function automatically. Nothing else has to change in the program to take advantage of this. Want to be sure that it’s actually working? Time it! The non-memoized LookupUser() has a built in Thread.Sleep(2500), and so takes 2.5sec * number of lookups to run. The memoized version will run almost instantly, so we can prove the memoizer is working by timing. The time combinator we created earlier was for timing Actions. I’ve created a overload of the time combinator that has the signature we need – Func<A,B>:

public static Func<A, B> Time<A, B>(Func<A, B> fn) {
    return a => {
        var sw = new System.Diagnostics.Stopwatch();
        sw.Start();
        try {
            return fn(a);
        } finally {
            Console.WriteLine("{0:#,##0}ms", sw.ElapsedMilliseconds);
        }
    };
}

and added it to the definition of lookUpUser in the main function


Func<string, Guid> lookupUser =
    Combinators.Time(
        Combinators.Memoize<string,Guid>(LookupUser));

Again without modifying the main code, we’ve augmented the method. Throw in some duplicate email addresses to test it out. The real power of these techniques becomes more evident the more you use them. Functional programming developers have been using these are similar techniques for years, and now with the added ability and flexibility of C#, we can employ these patters as well.

{ 0 comments }

Parsing Json using F#

by EFVincent on 04/27/2010

I was in one of those curious moods the other day, and decided to check out the API for the service behind my RSS reader, FeedDemon. I like this reader because it keeps my subscriptions and read / unread data synchronized across machines and my iPhone via an app called NetNewsWire.

Well it ends up that Google has gobbled up the service and it’s now Google Reader. Fair enough. It means you can also read your RSS feeds on the web at the Google reader site. Whatevs. It also turns out that Google has not publicized the API for that service yet. Yuck. But there are a couple of people out there (like here and here) who have picked it apart using fiddler or some such thing.

Well to put this meandering story to an end, I was playing with the API and it seems that some of the methods only return JSON. Others return XML, and others still are switchable. Messy. No wonder it’s not public yet. So that brings me to the point of this post… Parsing Json using F#.

Why F#?

I’ve avoided posting about F# to date, even though I love “playing” with it; I feel that posts about C# are more relevant. People are using F#, but I’ve never had the chance to use it on the job, and I don’t know anyone personally who has either. But since it’s mainstream now, what the heck. Plus parsing is one of those tasks that’s right up F#’s alley. If you haven’t worked with a functional language since college (or ever), give it a whirl. It’s refreshingly different from pure, straight, intensely object oriented thinking.

Json, Briefly

Most everyone knows what Json is by now. It’s just a tad more horrifying in real life as the supernatural killer from the Friday the 13th movies with whom it homophonetically shares its name. It’s a text format for representing data that can be evaluated in that wonderfully fast and lose language of the web, JavaScript, resulting in actual JavaScript objects. Read about it from the experts here. Here’s an example of some awesome Json:

{
    "glossary": {
        "title": "example glossary",
        "GlossDiv": {
            "title": "S",
            "GlossList": {
                "GlossEntry": {
                    "ID": "SGML",
                    "SortAs": "SGML",
                    "GlossTerm": "Standard Generalized Markup Language",
                    "Acronym": "SGML",
                    "Abbrev": "ISO 8879:1986",
                    "GlossDef": {
                        "para": "A meta-markup language, used to create markup languages such as DocBook.",
                        "GlossSeeAlso": ["GML", "XML"]
                    },
                    "GlossSee": "markup"
                }
            }
        }
    }
}

The Tokenizer

I’m no parsing / compiler / language expert, but I do know that tokenizing is the first step. This is when we run through the string and create tokens; identifying the open and close braces, the name value pairs, etc.

First thing we want to look at is the Token type, which is a discriminated union. This type defines the logical things that are in the Json string. For example, an open quote, some characters, and a close quote is a string.

type Token =
  | OpenBracket | CloseBracket
  | OpenArray | CloseArray
  | Colon | Comma
  | String of string
  | Number of string

For this simple parser, the tokens are as above, open and close bracket and square bracket (aka array), colon, comma, numbers, and strings. This is a good reason to start with Json, it’s painfully simple. The tokenize function below turns the string into a list of tokens.

let tokenize source = 

  let rec parseString acc = function
    | '\\' :: '"' :: t -> // escaped quote
                          parseString (acc + "\"") t
    | '"' :: t -> // closing quote terminates
                  acc, t
    | c :: t -> // otherwise accumulate
                parseString (acc + (c.ToString())) t
    | _ -> failwith "Malformed string."

  let rec token acc = function
    | (')' :: _) as t -> acc, t // closing paren terminates
    | (':' :: _) as t -> acc, t // colon terminates
    | (',' :: _) as t -> acc, t // comma terminates
    | w :: t when Char.IsWhiteSpace(w) -> acc, t // whitespace terminates
    | [] -> acc, [] // end of list terminates
    | c :: t -> token (acc + (c.ToString())) t // otherwise accumulate chars 

  let rec tokenize' acc = function
    | w :: t when Char.IsWhiteSpace(w) -> tokenize' acc t   // skip whitespace
    | '{' :: t -> tokenize' (OpenBracket :: acc) t
    | '}' :: t -> tokenize' (CloseBracket :: acc) t
    | '[' :: t -> tokenize' (OpenArray :: acc) t
    | ']' :: t -> tokenize' (CloseArray :: acc) t
    | ':' :: t -> tokenize' (Colon :: acc) t
    | ',' :: t -> tokenize' (Comma :: acc) t
    | '"' :: t -> // start of string
      let s, t' = parseString "" t
      tokenize' (Token.String(s) :: acc) t'
    | '-' :: d :: t when Char.IsDigit(d) -> // start of negative number
        let n, t' = token ("-" + d.ToString()) t
        tokenize' (Token.Number(n) :: acc) t'
    | '+' :: d :: t | d :: t when Char.IsDigit(d) -> // start of positive number
        let n, t' = token (d.ToString()) t
        tokenize' (Token.Number(n) :: acc) t'
    | [] -> List.rev acc // end of list terminates
    | _ -> failwith "Tokinzation error"

  tokenize' [] source

We don’t have time for the full treatment of F#. If there’s any interest, I’ll post up some tutorials or at least links to some of the very many existing good tutorials out there already. For now, assume we have an understanding of the syntax, and the logic of tokenizing and parsing is what we’re after here.

One of the first things we see is that tokenize defines a few functions within the function. Great feature of F#, allowing definitions of inner functions; it allows the coder to partition logic without having function explosion.

The driving inner function is tokenize’, whose signature is (Token list –> char list –> Token list). This means it takes a Token list and a char list, and returns a Token list. The first token list (acc) is an accumulator. This list is built up as the procedure calls itself recursively. This is a common pattern in functional languages, to “thread” an accumulator through recursive calls. The second parameter is a char list, or a list of characters from the source string.

We don’t see the char list explicitly because of the use of the  function keyword at like 20, this keyword means that the last parameter (the char list) is implicitly used in a match statement, the cases of which follow after the function keyword. The syntax would be equivalent to:

  let rec tokenize' acc sourceChars =
    match sourceChars with
    | w :: t when Char.IsWhiteSpace(w) -> tokenize' acc t   // skip whitespace
    | '{' :: t -> tokenize' (OpenBracket :: acc) t
    | '}' :: t -> tokenize' (CloseBracket :: acc) t
...

The idea that the sourceChars variable being passed immediately to the match..with construct is so common in F# that the function keyword is used as a contraction, allowing the elimination of what is an unnecessary variable. In many cases, this construct allows for a one line function definition.

Most of the cases in the block matches a character which in turn maps to a token, which is appended to the accumulator (acc : Token list), and passed to a recursive call of tokenize’. This way the tokens are built up (in reverse order) as the string is traversed. This is seen in lines 22-27. Line 21 skips whitespace by calling tokenize’ without adding a token to the accumulator first.

Things are a bit more interesting at lines 28, 31, and 34. Line 28 detects the beginning of a string, and starts a new recursive thread with the parseString function, the signature of which is (string –> char list –> string * char list). So the accumulator is a string, the “work” is being done on a char list, to which is passed our source char list, and the return is a tuple of string * char list, which is our parsed string and the rest of the source char list. One (of I’m sure many) optimizations that could be made is to use a mutable StringBuilder as the accumulator in parseString.

The token function on line 12 does a similar job for non-string literals (numbers in this case, there are no bools in Json).

Lastly, line 37 handles the case where we run out of source characters. The Token list we’ve been accumulating is backwards; we’ve been “appending” to the front of the list all this time. This is a common pattern. At the end, we return the reverse of the list. When the Json at the top is put through the tokenizer, we get this:

> let tk = tokenize source;;

val tk : Token list =
  [OpenBracket; String "glossary"; Colon; OpenBracket; String "title"; Colon;
   String "example glossary"; Comma; String "GlossDiv"; Colon; OpenBracket;
   String "title"; Colon; String "S"; Comma; String "GlossList"; Colon;
   OpenBracket; String "GlossEntry"; Colon; OpenBracket; String "ID"; Colon;
   String "SGML"; Comma; String "SortAs"; Colon; String "SGML"; Comma;
   String "GlossTerm"; Colon; String "Standard Generalized Markup Language";
   Comma; String "Acronym"; Colon; String "SGML"; Comma; String "Abbrev";
   Colon; String "ISO 8879:1986"; Comma; String "GlossDef"; Colon; OpenBracket;
   String "para"; Colon;
   String
     "A meta-markup language, used to create markup languages such as DocBook.";
   Comma; String "GlossSeeAlso"; Colon; OpenArray; String "GML"; Comma;
   String "XML"; CloseArray; CloseBracket; Comma; String "GlossSee"; Colon;
   String "markup"; CloseBracket; CloseBracket; CloseBracket; CloseBracket;
   CloseBracket]

This list of tokens is an intermediate step. From here, we could go in several directions. For my purposes, I wanted to see it as XML. To get there, I created a parser that takes the token list and returns an XElement.

let parseToXml source =
  let map = function
    | Token.Number(n) -> n.ToString()
    | Token.String(x) -> x
    | v -> failwith "Syntax Error, unrecognized token in map()"

  let rec parseValue (acc:XElement) = function
    | OpenBracket :: t ->
      let newElement, t' = parseElement acc t
      newElement, t'
    | OpenArray :: t ->
      let name = acc.Name.LocalName
      if name.EndsWith("ies") && name.Length > 3 then
        let childName = name.Substring(0, name.Length - 3) + "y"
        let newListElement, t' = parseArray childName acc t
        newListElement, t'
      elif name.EndsWith("s") && name.Length > 1 then
        let childName = name.Substring(0, name.Length - 1)
        let newListElement, t' = parseArray childName acc t
        newListElement, t'
      else
        let childName = acc.Name.LocalName
        acc.Name <- XName.Get(childName + "s")
        let newListElement, t' = parseArray childName acc t
        newListElement, t'
    | h :: t ->
      acc.Value <- map(h)
      acc, t
    | _ -> failwith "bad value"

  and parseArray name acc = function
    | Comma :: t -> parseArray name acc t
    | CloseArray :: t ->  acc, t
    | t ->
      let newElement = XElement(XName.Get(name))
      let acc', t' = parseValue newElement t
      acc.Add(acc')
      parseArray name acc t'

  and parseElement (acc : XElement) = function
    | Comma :: t ->
      parseElement acc t
    | Token.String(n) :: Colon :: t ->
      let newElement = XElement(XName.Get(n))
      let v, t' = parseValue newElement t
      acc.Add(v)
      parseElement acc t'
    | CloseBracket :: t ->
      acc, t
    | _ -> failwith "Malformed JSON object"

  let root = XElement(XName.Get("root"))
  let tokens = tokenize source

  match tokens with
    | OpenBracket :: t ->
      let result, t' = parseElement root t
      result
    | _ -> failwith "Json did not begin with an object"

There is still syntax checking happening in the parse step. The tokenizer would have found illegal tokens, but the parse function will find illegal Json structure. It works in a similar way to the tokenizer, except instead of a raw stream of characters, we’ve got a stream of tokens, which is examined one by one and passed to a set of recursive functions which accumulates the XML, in this case as an XElement.

parseToXml defines inner functions map, parseValue, parseArray, and parseElement. It then gets to work by creating a root XElement, and getting the Token list. Json should start with an OpenBracket token. Anything else and we failwith an error (like C#’s throw). When we find that OpenBracket, the recursion begins with a call to parseElement. The root XElement serves as the accumulator, and the Token list is the “work” to be done.

You should be able to see what’s happening if you’re somewhat comfortable with F#. The parseValue function does some extra work to wrap arrays in nodes where the singularity / plurality of the node makes it read a bit better (in English, most of the time). The other point of interest is that the accumulator is a mutable object in parseToXml, unlike the Token list was in tokenize. At lines 37 and 46, the newly created elements are added to the accumulator using the Add(XElement) method, which mutates the accumulator and keeps using it. This is a departure from “purer” functional techniques, but that’s what’s cool about F#, you can make those departures where it makes sense. In this case, leveraging the XElement class of the .NET framework was worth it. Consuming this class from C# is that much easier and more intuitive.

After all is said and done, this is the XML that is emitted.

<root>
  <glossary>
    <title>example glossary</title>
    <GlossDiv>
      <title>S</title>
      <GlossList>
        <GlossEntry>
          <ID>SGML</ID>
          <SortAs>SGML</SortAs>
          <GlossTerm>Standard Generalized Markup Language</GlossTerm>
          <Acronym>SGML</Acronym>
          <Abbrev>ISO 8879:1986</Abbrev>
          <GlossDef>
            <para>A meta-markup language, used to create markup languages such as DocBook.</para>
            <GlossSeeAlsos>
              <GlossSeeAlso>GML</GlossSeeAlso>
              <GlossSeeAlso>XML</GlossSeeAlso>
            </GlossSeeAlsos>
          </GlossDef>
          <GlossSee>markup</GlossSee>
        </GlossEntry>
      </GlossList>
    </GlossDiv>
  </glossary>
</root>

It isn’t the most sophisticated, performant, or I’m sure accurate Json parser out there. But it does the trick for my immediate need, is a good starting point for more sophistication later if needed, and was an excellent exercise in parsing and F#. It was fun to write, and hopefully it stimulates some curiosities. Let me know if you’ve got any questions or comments.

{ 1 comment }

Roll Your Own Language Constructs

February 3, 2010

My last couple of blog entries were ostensibly about lambda functions (last one, one before). With these ideas, or a general comfort level with lambdas and the Action<> and Func<> delegate types in the back of your mind, consider this.
I was working on a project where the development team was instructed to implement a retry [...]

Read the full article →

Lambdas – Exposing Disposable Resources in your API

February 2, 2010

The last article described the absolute (and I mean really absolute) basics of lambdas in C#. Assuming you’re continuing from there, where to next? Let’s use some lambdas. There are two approaches we can use. First is using lambdas in places you traditionally used other approaches. Second is using them in new and interesting ways. [...]

Read the full article →

C# Nuts and Bolts: Lambdas

February 1, 2010

You don’t need to use Lambdas to make a living writing code in .NET. You can probably get away with not even knowing what Linq is. You can also still find work coding in .NET 2.0. But that’s painfully BORING. So to avoid the stinging monotony of working in a 5 year reverse time-shift, let’s [...]

Read the full article →

Applied Functional: Getting User Input

January 31, 2010

The thing with spending time on functional programming is that the typical .NET programmer will not likely be able to write production code in F#, and almost certainly not in Haskell. But what you are able to do is apply a new set of problem solving approaches to your every day work.
So here’s a simple [...]

Read the full article →

C# Nuts and Bolts: Delegates and Functions, Part I

August 24, 2009

Two years ago I took up the OCaml programming language, purely out of intellectual curiosity. For me, OCaml was a gateway drug to functional programming in general. I’ve since spent a good bit of time with F# and Haskell. With the consulting work I do I don’t have the opportunity to use these languages in [...]

Read the full article →

Adding Concurrency Optimization in Silverlight 3

August 18, 2009

About a week ago Joa Ebert posted Flirting with Silverlight, in which he discussed his experiences with a cool little Lorenz attractor application. The strange attractor app (which you can check at his blog) was his way of playing around with the newly released Silverlight 3 (Props to Joa for some very cool work BTW). [...]

Read the full article →