Nurina – The Elixir URI parser
Posted on .
I had some free time this weekend, so I decided to pick up on an old piece of code I wrote back when I started learning Elixir. It's a URI parser I called Nurina (the word nurina is Finnish and means grumbling or complaining -- it sounded funny and it contains the word URI). It's not really a well put together piece of code but more of a learning excercise. I also decided to avoid using regular expressions entirely and instead used pattern matching to parse the whole URI -- an additional challenge.
The first version of Nurina ran on Elixir 0.13, so I had to fix it somewhat to get it to compile on 1.0. I managed to rewrite it, replacing records with structs, and even make it pass all the tests. I consider it good enough to publish, so I've uploaded it onto my BitBucket account here.
The code showcases some of the things I love about Elixir: the lovely syntax -- though my code is not really a good testament to it -- the functional programming style and most of all the power of pattern matching. The string matching functionality is very useful in parsing and splitting text, but you can match all kinds of stuff, which makes it an incredibly powerful tool. Take for example the following code:
case parsed do
%{valid: true, port: nil} -> %{parsed | port: URI.default_port parsed.scheme}
_ -> parsed
end
In this example, I'm running code conditionally based on the value of parsed
. The left side of the arrow specifies the
match conditions. In this case, the first line will only match if parsed
is a map (denoted by %{}
) which contains
the key-value pairs valid: true
and port: nil
. The second match -- _
-- will accept anything, so the case will
always match something.
The example above was really simple, but it shows the basic idea well. One thing you can notice right off the bat is
that you do not need to write an ugly series of conditions like you would in other languages, where you would easily end
up writing something like is_map(parsed) and 'valid' in parsed and parsed.valid and 'port' in parsed and parsed.port == nil
.
You just describe what you need it to be and the language will take care of testing if it matches. If there were many
other conditions, you would quickly notice how pattern matching makes it easier to write and grok than a series of if
statements.
Of course, conditions are not the only place you can use pattern matching. You can match values as a statement of its own, like I do in Nurina's tests:
test :parse_bad_uris do
%Nurina.Info{valid: false} = Nurina.parse("https??@?F?@#>F//23/")
%Nurina.Info{valid: false} = Nurina.parse("")
%Nurina.Info{valid: false} = Nurina.parse(":https")
%Nurina.Info{valid: false} = Nurina.parse("https")
%Nurina.Info{valid: false} = Nurina.parse("http://example.com:what/")
end
In the test I don't care what values the Info
struct gets, as long as valid
is false
. If the right hand side does
not match, an error will be raised and the test fails. This makes it really easy to write simple assertions.
Another very powerful use of pattern matching is in function definitions. You can define many functions with the same name but different expectations of what they want to receive in their arguments. The language will then take care of calling the correct function based on matching the given arguments. This may sound complicated, but an example will hopefully clear it up:
def parse(<< "//", rest :: binary >>, parsed, :hier_parse), do: parse(rest, parsed, :hier_auth)
def parse(hier, parsed, :hier_parse), do: parse(hier, parsed, :hier_no_auth)
Here I have defined two functions, both having the signature parse/3
, meaning they are called parse
and take three
arguments. The difference is that the first function will only accept a string as its first argument, and furthermore,
only a string that begins with ”//”, the rest of it being inserted into the variable rest
. The second function on the
other hand will accept anything as the first argument. You don't have to specify which function you want to call, the
language will take care of that automatically based on the arguments you have given.
Here's another example of pattern matching in function definitions (the last one in this post, I swear!):
defmodule Dropper do
def drop5([], acc), do: acc
def drop5([5 | rest], acc), do: drop5(rest, acc)
def drop5([item | rest], acc), do: drop5(rest, acc ++ [item])
def drop5(list) when is_list(list), do: drop5(list, [])
end
This is a pretty typical use of pattern matching when dealing with lists. Here the drop5/1
function takes a list as
input, returning a new list with all number 5s removed:
iex(1)> Dropper.drop5 [1,4,5,6,2,6,4,5,6,4,5,6,5,5,5]
[1, 4, 6, 2, 6, 4, 6, 4, 6]
It should be pretty obvious to see how it works, but basically the |
operator splits the first element off the given
list. Hence, when given a list where the first item is an integer 5, the second function will be called. Otherwise the
third function will be called, because the second didn't match the given input.
This post started by telling about my little URI parser thing and ended up lecturing about pattern matching. It's not a good tutorial by any means, mostly just a stream of conciousness, but in case it got you interested, I really recommend taking a look at Elixir's Getting Started guide. It shows you all the cool things Elixir can do and covers pattern matching specifically in chapter 4.