Hidden jewels of Ruby stdlib

by Paweł Świątkowski
06 Jun 2018

Ruby comes distributed with a vast standard library. We only use a fraction of it, usually. Everyone knows date, set, json, maybe even csv. But there are many more things hidden there.

Some time ago a discussion somewhere (Reddit perhaps?) prompted me to take a deep plunge into Ruby stdlib and in this post I described what I found. Not all things were new to me, some of them were simply forgotten. I chose ones I found most entertaining, interested or standing out in any other way.

While reading through those and asking yourself “why would I use it in a web app?”, please bear in mind that Ruby was not designed to be a language powering one of the most important web frameworks in history. Things listed here are more suitable for system scripting etc.

Parsing command-line options

These days when we write a Ruby script it usually comes as a Rake task. But even if it’s a standalone file, it is usually steered in a similar way: via environment variables or just by positional parameters accessed via ARGV array. However, in stdlib we can find two libraries for handling more complex input.

GetoptLong

One of them is GetoptLong. Let’s see it in action:

require 'getoptlong'

opts = GetoptLong.new(
    [ '--url', GetoptLong::REQUIRED_ARGUMENT ],
    [ '--count', '-c', GetoptLong::OPTIONAL_ARGUMENT ],
    [ '--verbose', GetoptLong::NO_ARGUMENT ]
)

opts.each do |option, value|
    p [option, value]
end

As you see, I defined three options:

url - it is a required argument
count - which is optional
verbose which serves as a flag

After that there is code that for each option prints its name and value. So when I test it with ruby getoptlong.rb -c 5 --verbose --url http://github.com I get:

["--count", "5"]
["--verbose", ""]
["--url", "http://github.com"]

There are few interesting quirks with that. For example, if I omit url totally, nothing happens. Only if I use it as a flag (ruby getoptlong.rb --url), I get an exception. Also, if I use some option that is not defined, it throws an error as well.

You can find docs for GetoptLong here.

OptionParser

This solution is much more robust and advanced. Let’s see it in action with a similar example:

require 'optparse'

OptionParser.new do |opts|
    opts.banner = 'OptionParser example script'

    opts.on('--url URL') do |url|
        puts "url: #{url}"
    end

    opts.on('-c N', '--count N') do |n|
        puts "#{n} times"
    end

    opts.on('--verbose') do
        puts 'Verbose mode ON'
    end

    opts.on('-h', '--help') do
        puts opts
    end
end.parse!

The code is much more idiomatic here. The result is as expected. Behaviour regarding extra options etc. is the same as with GetoptLong. One thing we get for (almost) free here is a help message. Try it with ruby optparse.rb -h:

OptionParser example script
        --url URL
    -c, --count N
        --verbose
    -h, --help

But there’s much more to OptionParser than that - coercing types, something called conventions etc. Read more in the docs.

Simple persistent key-value store

When we, Ruby developers, think about a key-value store, we usually have some kind of server-based solution, such as Redis or Riak. However, when writing simple application it’s usually more reasonable to use embedded stores. Lately, RocksDB from Facebook became famous as one of such solutions. But with Ruby, we are lucky to have embedded key-value store right in the standard library.

And, there’s more… It’s not one KV store. It’s three of them: DBM, GDBM and SDBM. They are really similar to one another, so I will only quickly outline differences:

DBM relies on what’s installed on your system. It can use many things under the hood and most of the times it will be incompatible between different machines (or even on the same machine when system configuration changes). Therefore it’s not well-suited for a persistent storage but is good for temporary applications.
GDBM is based on one particular implementation of KV store call, not surprisingly, GDBM. Aforementioned DBM may, in some cases, choose to use GDBM as it’s underlying storage. It should be compatible between different systems.
SDBM’s code, contrary to previous ones, is shipped with Ruby, so it should be same for all machines.

How do we use it? For example with SDBM (because we don’t need to install anything extra to have it):

require 'sdbm'

SDBM.open 'fruits' do |db|
  db['apple'] = 'fruit'
  db['pear'] = 'fruit'
  db['carrot'] = 'vegetable'
  db['tomato'] = 'vegetable'

  db.update('peach' => 'fruit', 'tomato' => 'fruit')

  db.each do |key, value|
    puts "Key: #{key}, Value: #{value}"
  end
end

This creates two files in current directory. fruits.dir is empty (I really don’t know why), but real data is in fruits.pag. You can peek into it with hexdump -C fruits.pag:

00000000  08 00 fb 03 f6 03 f2 03  ed 03 e7 03 de 03 d8 03  |................|
00000010  cf 03 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000003c0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 76  |...............v|
000003d0  65 67 65 74 61 62 6c 65  74 6f 6d 61 74 6f 76 65  |egetabletomatove|
000003e0  67 65 74 61 62 6c 65 63  61 72 72 6f 74 66 72 75  |getablecarrotfru|
000003f0  69 74 70 65 61 72 66 72  75 69 74 61 70 70 6c 65  |itpearfruitapple|
00000400

The data is actually there.

Usefulness of this solution is probably quite limited. You can use it when you want to persist some state between script runs. Or when you really care about memory. Having some big hashes loaded in RAM all the time can slow down your program. With (S/G)DBM you can dump data which is unused for a while to disk and pick it up later when you need it.

Persisting whole objects hierarchy with PStore

Speaking of persisting… In examples above we could only use strings. That’s ok in many cases, but not always. What if you want to save part of your application state - with objects, their states, and relations?

Ruby stdlib has you covered! PStore is exactly what you are looking for. In this example we are going to create some very simple Finite-State-Machine-like structure with states connected via named edges to each other:

class State
    def initialize(name)
        @name = name
        @edges = {}
    end

    def connect_to(word, state)
        @edges[word] = state
    end

    def traverse(indent = 0)
        tab = " " * indent
        puts "#{tab}State #{@name}:"
        @edges.each do |word, state|
            puts "#{tab}  '#{word}':"
            state.traverse(indent + 4)
        end
    end
end

A traverse method simply displays connections from start node to the end (watch out, we don’t handle loops!). So now let’s create some structure and traverse it:

s0 = State.new('start')
s1 = State.new('first')
s2 = State.new('second')
s3 = State.new('third')
s4 = State.new('fourth')

s0.connect_to('aa', s1)
s0.connect_to('aaa', s2)
s1.connect_to('b', s3)
s3.connect_to('c', s4)
s2.connect_to('d', s4)

s0.traverse

What we got is:

State start:
  'aa':
    State first:
      'b':
        State third:
          'c':
            State fourth:
  'aaa':
    State second:
      'd':
        State fourth:

Now let’s save it using PStore to a file on disk:

require "pstore"

storage = PStore.new('fsm.pstore')
storage.transaction do
    storage['start'] = s0
end

And then, in a different script, we load and traverse:

class State
    # omitting, definition same as above
end

require "pstore"

storage = PStore.new('fsm.pstore')
storage.transaction do
    start = storage['start']
    start.traverse
end

And output is exactly the same! If you’re curious, like me, you can peek into fsm.pstore file using hexdump again:

00000000  04 08 7b 06 49 22 0a 73  74 61 72 74 06 3a 06 45  |..{.I".start.:.E|
00000010  54 6f 3a 0a 53 74 61 74  65 07 3a 0a 40 6e 61 6d  |To:.State.:.@nam|
00000020  65 49 22 0a 73 74 61 72  74 06 3b 00 54 3a 0b 40  |eI".start.;.T:.@|
00000030  65 64 67 65 73 7b 07 49  22 07 61 61 06 3b 00 54  |edges{.I".aa.;.T|
00000040  6f 3b 06 07 3b 07 49 22  0a 66 69 72 73 74 06 3b  |o;..;.I".first.;|
00000050  00 54 3b 08 7b 06 49 22  06 62 06 3b 00 54 6f 3b  |.T;.{.I".b.;.To;|
00000060  06 07 3b 07 49 22 0a 74  68 69 72 64 06 3b 00 54  |..;.I".third.;.T|
00000070  3b 08 7b 06 49 22 06 63  06 3b 00 54 6f 3b 06 07  |;.{.I".c.;.To;..|
00000080  3b 07 49 22 0b 66 6f 75  72 74 68 06 3b 00 54 3b  |;.I".fourth.;.T;|
00000090  08 7b 00 49 22 08 61 61  61 06 3b 00 54 6f 3b 06  |.{.I".aaa.;.To;.|
000000a0  07 3b 07 49 22 0b 73 65  63 6f 6e 64 06 3b 00 54  |.;.I".second.;.T|
000000b0  3b 08 7b 06 49 22 06 64  06 3b 00 54 40 13        |;.{.I".d.;.T@.|
000000be

Useful? Perhaps not, but maybe? I can see the potential to save a state of some simple game this way, for example.

Observer pattern

Usage of Ruby’s Observable was actually part of the first (?) book from which I learned Ruby back in 2008 (?). So it’s not new to me, but it’s worth reminding that we have such thing built-in. It actually can make the code cleaner in some cases.

To illustrate how it works, I’m going to implement yet another FizzBuzz (it will be a bit incorrect though because will print a number every time):

require 'observer'

class Incrementor
    include Observable

    def initialize
        @number = 0
    end

    def runto(num)
        loop do
            @number += 1
            changed # note this!
            print "#{@number} "
            notify_observers(@number)
            puts ""
            break if @number >= num
        end
    end
end

class FizzObserver
    def update(num)
        print "Fizz" if num % 3 == 0
    end
end

class BuzzObserver
    def update(num)
        print "Buzz" if num % 5 == 0
    end
end

inc = Incrementor.new
inc.add_observer(FizzObserver.new)
inc.add_observer(BuzzObserver.new)
inc.runto(30)

If you run this code, you’ll se it works. There are just two things to remember: call changed to indicate that the object has changed and calling notify_observers when you want to emit new values.

Why useful? You can abstract some things (such as logging) outside of your main class. Note, however, that abusing it will lead to callback hell, which would be hard to debug and understand. Just like ActiveRecord callbacks.

DRb

DRb or dRuby is a real gem in the standard library. Described simply as “distributed object system for Ruby”, it can give you a lot of fun. To see it live, I decided to go with something really useful: a service that prints random number from 0 to @max_num every @interval seconds. Here the code, with DRb included:

require 'drb/drb'

class RandomService
    def initialize
        set_max_num(100)
        set_interval(1)
    end

    def run
        while @should_stop.nil?
            puts rand(@max_num)
            sleep(@interval)
        end
    end

    def set_max_num(num)
        @max_num = num
    end

    def set_interval(time)
        @interval = time
    end

    def stop!
        @should_stop = true
    end
end

service = RandomService.new
DRb.start_service('druby://localhost:9394', service)
service.run

The class itself is really straightforward and I’m not going into details about it. The only (hopefully) unfamiliar thing here is the call to DRb, where we wrap our service in dRuby protocol. Basically what it does is exposing our interface on localhost on port 9394. Now, remembering it, I recommend to start the service and split your terminal in two (iTerm can do it on Mac, I recommend Tilix for Linux).

Now, when we have our little service running, fire up irb in second terminal and type:

irb(main):001:0> require 'drb/drb'
=> true
irb(main):002:0> service = DRbObject.new_with_uri('druby://localhost:9394')
=> #<DRb::DRbObject:0x007fd51a8072c0 @uri="druby://localhost:9394", @ref=nil>

When it’s done, you can start to play by calling methods on service. Decrease interval to 0.1, set max_num to 1000 – whatever you want. Finally, stop the show by running service.stop!. All that you have done is reflected immediately in the process you’re running in a completely different process in a different terminal! Needless to say, you can also do it over the network, if you wish.

You may think right now that this is just a nice toy. But I’ve actually seen things like that used in practice. Probably most notable example was an IRC bot where from Ruby console you could do many things, starting from temporary adding admins to some array usually populated on start (so, no downtime for restart required!), ending by defining completely new methods and commends to test them out before actually putting them in the code. I can also imagine exposing such interface to, for example, manipulate the size of some workers pool etc. Actually, the sky is pretty much the limit here.

Other

There are many more things in stdlib. I’m going to mention few of them but without such details descriptions.

tsort

I had a bit of trouble understanding what tsort is really for. What it does is a topological sorting of directed acyclic graphs. If this sounds pretty specific, that’s because it is. This kind of sorting is mostly useful in dependency sorting, when you have a graph of dependencies (A depends on B and C, B depends on D, E depends on A) and you need to determine an order of installing those dependencies, so that every item has its dependencies already installed when being installed.

There is a great article by Lawson Kurtz explaining how it’s used in Bundler.

Math

Some math-related classed in Ruby standard library:

Matrix has methods for matrix operations, such as (but not limited to): conjugate, determinant, eigensystem, inverse and many more (see the docs)
Prime represents an infinite set of all prime numbers. You don’t need to implement this Eratosthenes sieve yourself!
[sidenote] I was surprised that there is no Complex class in stdlib, especially after I learned that it used to be there, but was removed. It turns out that it actually made it to core (so it is automatically required). Check this out by firing up your irb and writing: (2 + 3i) * (-6i) (spoiler: it won’t be a NameError because fo undefined i)

abbrev

This is probably more of a toy that really useful tool, but in case you need it, it’s there. Abbrev module has one method abbrev that takes a list of strings and returns possible abbreviations that are non-ambiguous. For example:

Abbrev.abbrev(%w[ruby rubic russia])
#=> {"ruby"=>"ruby", "rubic"=>"rubic", "rubi"=>"rubic", "russia"=>"russia", "russi"=>"russia", "russ"=>"russia", "rus"=>"russia"}

So, you know you can’t use ru as an abbreviation.

zlib

Last but not least, there is zlib. To quote:

Zlib is designed to be a portable, free, general-purpose, legally unencumbered – that is, not covered by any patents – lossless data-compression library for use on virtually any computer hardware and operating system.

For me, it sounds quite good. Compared to gzip:

The zlib format was designed to be compact and fast for use in memory and on communications channels. The gzip format was designed for single-file compression on file systems, has a larger header than zlib to maintain directory information, and uses a different, slower check method than zlib.

So zlib could actually be a good choice to reduce overhead when you send something over the network. To check it, I took Pride and Prejudice from Gutenberg and checked how it can be compressed:

require 'zlib'
source = File.read('path/to/pride-and-prejudice.txt')
compressed = Zlib::Deflate.deflate(source)
decompressed = Zlib::Inflate.inflate(compressed)

puts "Source size:  #{source.bytesize}"
puts "Compressed:   #{compressed.bytesize}"
puts "Decompressed: #{decompressed.bytesize}"
puts "Compression:  #{(1 - (compressed.bytesize.to_f / source.bytesize)).round(4)}"

The result was:

Source size:  724725
Compressed:   260549
Decompressed: 724725
Compression:  0.6405

I say it’s pretty impressive!

More?

Yes! There is more hidden in Ruby stdlib. Have I missed something? Do you think something is even more interesting? Let me know.