The WegoWise Development Blog

Sort Like a Human Boss

| Comments

This post will discuss the ruby gem sort_authority and the problem it aims to solve, which is how to quickly sort mixed number and letter strings in Ruby in a "human" way.

The problem is easiest to explain with images.

Which would you rather see in an application?

This? Wagh!! So bad!
or this? Ah! Meaning is restored!

As Jeff Atwood noted in this blog post from way back in 2007, this is the difference between "ASCIIbetical" and alphabetical sorting. When creating an interface for humans with listings containing a mix of letters and numbers you almost always want the latter and not the former. Yet real alphabetization (also refered to as "natural" or "human" sorting) is typically not included in standard libraries, Ruby's included.

As the developer involved in implementing natural sorting for our app at WegoWise I first turned to the solution presented here, which we'll call sensible_sort. This algorithm works fine1, in that it does in fact sort strings alphabetically. There are also plenty of gems we could have used as well, including naturalsort, naturally2, and natcmp.

Any of these approaches would have gotten the job done, but my coworker Nathan took one look at the pull request and asked me how well the algorithm performed. I shrugged3; I hadn't worried about performance because the lists I was sorting were so small. Then I signed off for the day.

By the next morning Nathan had thrown together some benchmarks that showed that sensible sort could be fairly slow given a long enough list. The other Rubygems didn't perform very well either, compared to Ruby's native ASCIIbetical sort and a C algorithm by Martin Pool called natsort. In fact, Nathan had gone even further and made a wrapper for natsort using FFI, labeled strnatcmp.c below. Behold the damning numbers:

Sorting ['x1'] * 100_000:
user system total real
sort 0.000000 0.000000 0.000000 ( 0.001758)
strnatcmp.c 0.030000 0.000000 0.030000 ( 0.029083)
naturalsort gem 0.060000 0.000000 0.060000 ( 0.061879)
naturally gem 0.990000 0.020000 1.010000 ( 1.009634)
natcmp gem 1.140000 0.010000 1.150000 ( 1.141950)
sensible_sort 2.280000 0.000000 2.280000 ( 2.286274)

Considering that we're talking about 100,000 items maybe 2 seconds isn't that bad. We don't have user-facing lists that have 100,000 elements... yet. Also, I had developed a sentimental attachment to sensible_sort and decided to write a more realistic benchmark involving randomized strings (gist here), resulting in the following numbers:

Sorting 100,000 strings like this: "9 3q1q05_xsfw87i6wf7"
user system total real
sort 0.110000 0.000000 0.110000 (0.112114)
strnatcmp.c 0.560000 0.000000 0.560000 (0.560879)
naturally gem 7.940000 0.020000 7.960000 (7.965125)
sensible_sort 21.230000 0.160000 21.390000 (21.406591)
natcmp gem 51.150000 0.190000 51.340000 (51.364688)
naturalsort gem 161.590000 1.200000 162.790000 (162.883611)
(This benchmark was made a while ago on Ruby 1.9.2 but results hold for later
Rubies)

I was pleased to note that sensible_sort is not the worst approach in this benchmark. But it's also not very good. 21 seconds is pretty much unacceptable for an operation you might want to do inside a page request. Again, we don't often have to sort 100,000 items for humans these days, but this is an extension on the core Enumerable module that might get used in various contexts throughout the application. If we can avoid an inherently slow approach in favor of a painless alternative, why wouldn't we?

And that's how the sort_authority4 gem came about. sort_authority is just a simple wrapper around strnatcmp.c, plus an extension on Enumberable to add #natural_sort and #natural_sort_by methods on arrays and such. If you care about sorting lists in a sane order (and you should) and want it to be almost as fast as regular sort (why not?) then you should give it a try.


  1. Although some changes involving type coercion were required to update the code for Ruby 1.9+, available in this gist).

  2. Which has the distinction of being one of the least googleable Rubygems.

  3. Well, our team is mostly remote so this communication took place in the form of emoticons and animated gifs in our chat room, but same idea.

  4. Jed came up with the awesome name.

Comments