Nurina – The Elixir URI parser

Posted on .

I had some free time this weekend, so I decided to pick up on an old piece of code I wrote back when I started learning Elixir. It's a URI parser I called Nurina (the word nurina is Finnish and means grumbling or complaining -- it sounded funny and it contains the word URI). It's not really a well put together piece of code but more of a learning excercise. I also decided to avoid using regular expressions entirely and instead used pattern matching to parse the whole URI -- an additional challenge.

The first version of Nurina ran on Elixir 0.13, so I had to fix it somewhat to get it to compile on 1.0. I managed to rewrite it, replacing records with structs, and even make it pass all the tests. I consider it good enough to publish, so I've uploaded it onto my BitBucket account here.

The code showcases some of the things I love about Elixir: the lovely syntax -- though my code is not really a good testament to it -- the functional programming style and most of all the power of pattern matching. The string matching functionality is very useful in parsing and splitting text, but you can match all kinds of stuff, which makes it an incredibly powerful tool. Take for example the following code:

case parsed do
  %{valid: true, port: nil} -> %{parsed | port: URI.default_port parsed.scheme}
  _ -> parsed
end

In this example, I'm running code conditionally based on the value of parsed. The left side of the arrow specifies the match conditions. In this case, the first line will only match if parsed is a map (denoted by %{}) which contains the key-value pairs valid: true and port: nil. The second match -- _ -- will accept anything, so the case will always match something.

The example above was really simple, but it shows the basic idea well. One thing you can notice right off the bat is that you do not need to write an ugly series of conditions like you would in other languages, where you would easily end up writing something like is_map(parsed) and 'valid' in parsed and parsed.valid and 'port' in parsed and parsed.port == nil. You just describe what you need it to be and the language will take care of testing if it matches. If there were many other conditions, you would quickly notice how pattern matching makes it easier to write and grok than a series of if statements.

Of course, conditions are not the only place you can use pattern matching. You can match values as a statement of its own, like I do in Nurina's tests:

test :parse_bad_uris do
  %Nurina.Info{valid: false} = Nurina.parse("https??@?F?@#>F//23/")
  %Nurina.Info{valid: false} = Nurina.parse("")
  %Nurina.Info{valid: false} = Nurina.parse(":https")
  %Nurina.Info{valid: false} = Nurina.parse("https")
  %Nurina.Info{valid: false} = Nurina.parse("http://example.com:what/")
end

In the test I don't care what values the Info struct gets, as long as valid is false. If the right hand side does not match, an error will be raised and the test fails. This makes it really easy to write simple assertions.

Another very powerful use of pattern matching is in function definitions. You can define many functions with the same name but different expectations of what they want to receive in their arguments. The language will then take care of calling the correct function based on matching the given arguments. This may sound complicated, but an example will hopefully clear it up:

def parse(<< "//", rest :: binary >>, parsed, :hier_parse), do: parse(rest, parsed, :hier_auth)
def parse(hier,                       parsed, :hier_parse), do: parse(hier, parsed, :hier_no_auth)

Here I have defined two functions, both having the signature parse/3, meaning they are called parse and take three arguments. The difference is that the first function will only accept a string as its first argument, and furthermore, only a string that begins with ”//”, the rest of it being inserted into the variable rest. The second function on the other hand will accept anything as the first argument. You don't have to specify which function you want to call, the language will take care of that automatically based on the arguments you have given.

Here's another example of pattern matching in function definitions (the last one in this post, I swear!):

defmodule Dropper do

  def drop5([],             acc),       do: acc
  def drop5([5 | rest],     acc),       do: drop5(rest, acc)
  def drop5([item | rest],  acc),       do: drop5(rest, acc ++ [item])

  def drop5(list) when is_list(list),   do: drop5(list, [])

end

This is a pretty typical use of pattern matching when dealing with lists. Here the drop5/1 function takes a list as input, returning a new list with all number 5s removed:

iex(1)> Dropper.drop5 [1,4,5,6,2,6,4,5,6,4,5,6,5,5,5]
[1, 4, 6, 2, 6, 4, 6, 4, 6]

It should be pretty obvious to see how it works, but basically the | operator splits the first element off the given list. Hence, when given a list where the first item is an integer 5, the second function will be called. Otherwise the third function will be called, because the second didn't match the given input.

This post started by telling about my little URI parser thing and ended up lecturing about pattern matching. It's not a good tutorial by any means, mostly just a stream of conciousness, but in case it got you interested, I really recommend taking a look at Elixir's Getting Started guide. It shows you all the cool things Elixir can do and covers pattern matching specifically in chapter 4.