Hidden jewels of Ruby stdlib
by Paweł Świątkowski
Ruby comes distributed with a vast standard library. We only use a fraction of it, usually. Everyone knows
json, maybe even
csv. But there are many more things hidden there.
Some time ago a discussion somewhere (Reddit perhaps?) prompted me to take a deep plunge into Ruby stdlib and in this post I described what I found. Not all things were new to me, some of them were simply forgotten. I chose ones I found most entertaining, interested or standing out in any other way.
While reading through those and asking yourself “why would I use it in a web app?”, please bear in mind that Ruby was not designed to be a language powering one of the most important web frameworks in history. Things listed here are more suitable for system scripting etc.
Parsing command-line options
These days when we write a Ruby script it usually comes as a Rake task. But even if it’s a standalone file, it is usually steered in a similar way: via environment variables or just by positional parameters accessed via
ARGV array. However, in stdlib we can find two libraries for handling more complex input.
One of them is
GetoptLong. Let’s see it in action:
As you see, I defined three options:
url- it is a required argument
count- which is optional
verbosewhich serves as a flag
After that there is code that for each option prints its name and value. So when I test it with
ruby getoptlong.rb -c 5 --verbose --url http://github.com I get:
There are few interesting quirks with that. For example, if I omit
url totally, nothing happens. Only if I use it as a flag (
ruby getoptlong.rb --url), I get an exception. Also, if I use some option that is not defined, it throws an error as well.
You can find docs for GetoptLong here.
This solution is much more robust and advanced. Let’s see it in action with a similar example:
The code is much more idiomatic here. The result is as expected. Behaviour regarding extra options etc. is the same as with GetoptLong. One thing we get for (almost) free here is a help message. Try it with
ruby optparse.rb -h:
But there’s much more to OptionParser than that - coercing types, something called conventions etc. Read more in the docs.
Simple persistent key-value store
When we, Ruby developers, think about a key-value store, we usually have some kind of server-based solution, such as Redis or Riak. However, when writing simple application it’s usually more reasonable to use embedded stores. Lately, RocksDB from Facebook became famous as one of such solutions. But with Ruby, we are lucky to have embedded key-value store right in the standard library.
And, there’s more… It’s not one KV store. It’s three of them: DBM, GDBM and SDBM. They are really similar to one another, so I will only quickly outline differences:
- DBM relies on what’s installed on your system. It can use many things under the hood and most of the times it will be incompatible between different machines (or even on the same machine when system configuration changes). Therefore it’s not well-suited for a persistent storage but is good for temporary applications.
- GDBM is based on one particular implementation of KV store call, not surprisingly, GDBM. Aforementioned DBM may, in some cases, choose to use GDBM as it’s underlying storage. It should be compatible between different systems.
- SDBM’s code, contrary to previous ones, is shipped with Ruby, so it should be same for all machines.
How do we use it? For example with SDBM (because we don’t need to install anything extra to have it):
This creates two files in current directory.
fruits.dir is empty (I really don’t know why), but real data is in
fruits.pag. You can peek into it with
hexdump -C fruits.pag:
The data is actually there.
Usefulness of this solution is probably quite limited. You can use it when you want to persist some state between script runs. Or when you really care about memory. Having some big hashes loaded in RAM all the time can slow down your program. With (S/G)DBM you can dump data which is unused for a while to disk and pick it up later when you need it.
Persisting whole objects hierarchy with PStore
Speaking of persisting… In examples above we could only use strings. That’s ok in many cases, but not always. What if you want to save part of your application state - with objects, their states, and relations?
Ruby stdlib has you covered! PStore is exactly what you are looking for. In this example we are going to create some very simple Finite-State-Machine-like structure with states connected via named edges to each other:
traverse method simply displays connections from start node to the end (watch out, we don’t handle loops!). So now let’s create some structure and traverse it:
What we got is:
Now let’s save it using
PStore to a file on disk:
And then, in a different script, we load and traverse:
And output is exactly the same! If you’re curious, like me, you can peek into
fsm.pstore file using
Useful? Perhaps not, but maybe? I can see the potential to save a state of some simple game this way, for example.
Usage of Ruby’s
Observable was actually part of the first (?) book from which I learned Ruby back in 2008 (?). So it’s not new to me, but it’s worth reminding that we have such thing built-in. It actually can make the code cleaner in some cases.
To illustrate how it works, I’m going to implement yet another FizzBuzz (it will be a bit incorrect though because will print a number every time):
If you run this code, you’ll se it works. There are just two things to remember: call
changed to indicate that the object has changed and calling
notify_observers when you want to emit new values.
Why useful? You can abstract some things (such as logging) outside of your main class. Note, however, that abusing it will lead to callback hell, which would be hard to debug and understand. Just like ActiveRecord callbacks.
DRb or dRuby is a real gem in the standard library. Described simply as “distributed object system for Ruby”, it can give you a lot of fun. To see it live, I decided to go with something really useful: a service that prints random number from 0 to
@interval seconds. Here the code, with DRb included:
The class itself is really straightforward and I’m not going into details about it. The only (hopefully) unfamiliar thing here is the call to DRb, where we wrap our service in dRuby protocol. Basically what it does is exposing our interface on localhost on port 9394. Now, remembering it, I recommend to start the service and split your terminal in two (iTerm can do it on Mac, I recommend Tilix for Linux).
Now, when we have our little service running, fire up irb in second terminal and type:
When it’s done, you can start to play by calling methods on
service. Decrease interval to
max_num to 1000 – whatever you want. Finally, stop the show by running
service.stop!. All that you have done is reflected immediately in the process you’re running in a completely different process in a different terminal! Needless to say, you can also do it over the network, if you wish.
You may think right now that this is just a nice toy. But I’ve actually seen things like that used in practice. Probably most notable example was an IRC bot where from Ruby console you could do many things, starting from temporary adding admins to some array usually populated on start (so, no downtime for restart required!), ending by defining completely new methods and commends to test them out before actually putting them in the code. I can also imagine exposing such interface to, for example, manipulate the size of some workers pool etc. Actually, the sky is pretty much the limit here.
There are many more things in stdlib. I’m going to mention few of them but without such details descriptions.
I had a bit of trouble understanding what tsort is really for. What it does is a topological sorting of directed acyclic graphs. If this sounds pretty specific, that’s because it is. This kind of sorting is mostly useful in dependency sorting, when you have a graph of dependencies (A depends on B and C, B depends on D, E depends on A) and you need to determine an order of installing those dependencies, so that every item has its dependencies already installed when being installed.
There is a great article by Lawson Kurtz explaining how it’s used in Bundler.
Some math-related classed in Ruby standard library:
Matrixhas methods for matrix operations, such as (but not limited to):
inverseand many more (see the docs)
Primerepresents an infinite set of all prime numbers. You don’t need to implement this Eratosthenes sieve yourself!
- [sidenote] I was surprised that there is no
Complexclass in stdlib, especially after I learned that it used to be there, but was removed. It turns out that it actually made it to core (so it is automatically required). Check this out by firing up your
(2 + 3i) * (-6i)(spoiler: it won’t be a
NameErrorbecause fo undefined
This is probably more of a toy that really useful tool, but in case you need it, it’s there.
Abbrev module has one method
abbrev that takes a list of strings and returns possible abbreviations that are non-ambiguous. For example:
So, you know you can’t use
ru as an abbreviation.
Last but not least, there is
zlib. To quote:
Zlib is designed to be a portable, free, general-purpose, legally unencumbered – that is, not covered by any patents – lossless data-compression library for use on virtually any computer hardware and operating system.
For me, it sounds quite good. Compared to gzip:
The zlib format was designed to be compact and fast for use in memory and on communications channels. The gzip format was designed for single-file compression on file systems, has a larger header than zlib to maintain directory information, and uses a different, slower check method than zlib.
So zlib could actually be a good choice to reduce overhead when you send something over the network. To check it, I took Pride and Prejudice from Gutenberg and checked how it can be compressed:
The result was:
I say it’s pretty impressive!
Yes! There is more hidden in Ruby stdlib. Have I missed something? Do you think something is even more interesting? Let me know.