I’ve been thinking a lot lately about the way we work with and represent programs: almost exclusively as text files, while they actually have much more structure than that. Python has seemed like a good language to experiment with, so I’ve been looking at the formal grammar of python expressed here as an EBNF (Extended Backus Nauer Form) grammar. This lead me to think, “I should write an EBNF parser of some sort.” The linked page suggests there is a piece of the python build process that consumes the given document and generates a language parser. Parsing a document to generate a new parser. Maybe even its own parser. Seems cool. I also thought, “Know what else is cool? Haskell. I never gave Haskell enough love. I should learn more Haskell (for great good).” So I installed Haskell for Windows (because I work on a Windows laptop these days), fired up powershell, and remembered just how daunting it can be to look at that Haskell prompt:

HaskellPrelude

Also, I wasn’t exactly sure what it should do. So it consumes a grammar, but what does it produce. Another program for consuming grammars? Depending on how we define consuming a grammar, I could do that pretty easily with a program that ignores its arguments and returns itself (think about it). Clearly that’s not what’s intended. Regardless, best to start getting dirty and figure it out as we go. But, still the matter of that pesky intimidating prompt. What to do? Well, maybe I’ll want to split strings by some character (that may be sufficient to parse an EBNF. The parser it generates won’t be able to rely on that). So I try a few commands:

Prelude> split ' ' "Hello World"

<interactive>:1:1: error:
 Variable not in scope: split :: Char -> [Char] -> t

Okay. No split function. So let’s write it. Probably want something recursive (I always jump to recursion. But not before I jump to recursion), but that means we’re probably not working in the prompt. So I opened a file, and eventually wound up with this:

split :: (Eq a) => a -> [a] -> [[a]]
split x [] = []
split x (y:[]) 
 | x == y = split x []
 | otherwise = [[y]]
split x (y1:y2:ys)
 | x == y1 = split x $ y2:ys
 | x == y2 = [y1]:split x ys
 | otherwise = (((:) y1 $ head $ split x $ y2:ys):) $ tail $ split x $ y2:ys

Not the most elegant Haskell one has ever seen. Regardless, let’s look at what this means (this is as much for me as it is for you).

split :: (Eq a) => a -> [a] -> [[a]]

This line is the type signature of the function. Without going into detail about the specifics of the symbols, it states that for any type “a” that is a member of the typeclass “Eq” (which means we can compare things of that type with each other for equality e.g. ints, characters, most things) we define this function to take an “a”, a list of “a” and return a list of lists of “a”. The use of the same symbol to delimit the operands and to indicate the return type is intentional. We can also read this as, “Take an ‘a’, return a function that takes a list of ‘a’ and returns a list of lists of ‘a’.” Neat.

split x [] = []

Now we use pattern matching to specify some cases. If the first operand is anything (x) and the second is an empty list, we want to return an empty list (which was not initially obvious to me. Why not a list containing a single empty list? I actually went back and filled this in once I was convinced).

split x (y:[])

If we get a list with a single element ‘y’ (y:[] means a list beginning with y and ending with an empty list), let’s consider some more specific cases:

 | x == y = split x []

If x== y, get rid of y (its our delimiter) and continue processing the rest of the list.

 | otherwise = [[y]]

Otherwise, just return a list of a list containing y.

split x (y1:y2:ys)

If we have a list with at least 2 things, then we can handle the meat of the problem:

 | x == y1 = split x $ y2:ys

If the first element is our delimiter, ignore it and proceed. The $ means, evaluate the right fully instead of taking the first thing you see (Haskell favors left association). Without the $, Haskell would attempt to evaluate (split x y2): ys which is nonsense and would disagree with our type declaration.

 | x == y2 = [y1]:split x ys

If the first element is not our delimiter and the second element is, we want to return the first element as its own list, and then the result of the rest of the list. Here left association makes things simpler.

 | otherwise = (((:) y1 $ head $ split x $ y2:ys):) $ tail $ split x $ y2:ys

Lastly, if neither of them are the delimiters, then (split x $ y2:ys) will represent how we would split everything ignoring the first character. We know it won’t be an empty list, so we can take the head and tail of it. We want to attach y1 to the head of the list, and reattach that to the tail of the list. And what do you know?

HaskellSuccess

It worked.

Now to the parsing…

Advertisements