Skip to main content

The Three-Legged Stool – Testing, Refactoring and Design

Feb 2013 - Grand Rapids

“You’ve mentioned testing and refactoring, but there’s a third leg to the stool: that’s design.” - Zach Dennis

Much has been said in the Ruby community about the importance of testing, the power of refactoring to improve design, and the importance of understanding software design itself, but as Michael Feathers pointed out in The Deep Synergy Between Testability and Good Design, they are most powerful – indeed, only really understandable – when treated as a team. Or in Zach’s terms, the three legs of the stool.

I enjoy teaching Ruby, and most of my students are relatively-new to the language. The center of our curriculum is 7 Degrees of FizzBuzz: we squeeze every last drop of knowledge out of it in order to introduce ourselves to the language, to pair-programming, and to thinking like programmers.

Let’s examine the testing section of that exercise through the lens of Zach’s “Three-Legged Stool”, but in the interest of the beginners among us we’re going to use only one example of each. For testing, the unit test. For refactoring, extract method. And for design, the principle of composition.

First, Make a Mess

Here’s the version of FizzBuzz we’re going to use, hacked together in a white heat and without a thought for tomorrow:

def fizzbuzz last_num
  (1..last_num).map do |n|
    if n % 3 && n % 5
      "FizzBuzz"
    elsif n % 5
      "Buzz"
    elsif n % 3
      "Fizz"
    end
  end
end

puts fizzbuzz(30).join(" ")

Beautiful, isn’t it? Clear, elegant – assuming you understand syntax details like ranges, mapping and how the join operator affects an array – and fairly concise. It’s even got a bit of the first leg of our stool: composition.

Composition means assembling something out of component parts. An engine is composed of gears, belts, pistons, etc: these pieces work together to convert gasoline and a bit of electricity into motive power. The basic level of composition in computer programs is the method, and we have one here: fizzbuzz. It takes an argument of the upper bound of our range and returns a set of values for the caller to do things with.

Breaking a problem into small components allows us to think about each one individually, which is generally easier than thinking of the whole construction at once. We’re fine as long as the pieces plug together: piece A generates something that piece B needs in the correct format. Which means we can create, test and debug each part by itself and put them together when we’re confident they work in isolation.

Now that we’ve written our neat little fizzbuzz method we can add it to any application that needs fizzbuzzing. There’s only one problem: the code is wrong.

Run it, and we get this:

FizzBuzz FizzBuzz FizzBuzz FizzBuzz FizzBuzz FizzBuzz FizzBuzz FizzBuzz FizzBuzz FizzBuzz FizzBuzz FizzBuzz FizzBuzz FizzBuzz FizzBuzz FizzBuzz FizzBuzz FizzBuzz FizzBuzz FizzBuzz FizzBuzz FizzBuzz FizzBuzz FizzBuzz FizzBuzz FizzBuzz FizzBuzz FizzBuzz FizzBuzz FizzBuzz

Which brings us to our second leg: testing.

Then Make It Right

Testing means exercising program code in order to prove it functions as expected in all the important ways it might be used or abused. To do this, we employ a test framework which offers a variety of ways to say “under these conditions, this method should behave this way”. The most basic sort of testing is unit testing, which typically exercises a single method.

So how might we write a test for this method? Using RSpec – the most popular Ruby test framework – we could do this:

describe 'fizzbuzz' do
  it 'works to 30' do
    fizzbuzz(30).should == %w{ 1 2 Fizz 4 Buzz Fizz 7 8 Fizz Buzz 11 Fizz 13 14 Buzz 16 17 Fizz 19 Buzz Fizz 22 23 Fizz Buzz 26 Fizz 28 29 FizzBuzz }
  end
end

And if we run it, we’ve proven that the code is misbehaving…

$ rspec stool_spec.rb 
F

Failures:

  1) fizzbuzz works to 30
     Failure/Error: fizzbuzz(30).should == %w{ 1 2 Fizz 4 Buzz Fizz 7 8 Fizz Buzz 11 Fizz 13 14 Buzz 16 17 Fizz 19 Buzz Fizz 22 23 Fizz Buzz 26 Fizz 28 29 FizzBuzz }
       expected: ["1", "2", "Fizz", "4", "Buzz", "Fizz", "7", "8", "Fizz", "Buzz", "11", "Fizz", "13", "14", "Buzz", "16", "17", "Fizz", "19", "Buzz", "Fizz", "22", "23", "Fizz", "Buzz", "26", "Fizz", "28", "29", "FizzBuzz"]
            got: ["FizzBuzz", "FizzBuzz", "FizzBuzz", "FizzBuzz", "FizzBuzz", "FizzBuzz", "FizzBuzz", "FizzBuzz", "FizzBuzz", "FizzBuzz", "FizzBuzz", "FizzBuzz", "FizzBuzz", "FizzBuzz", "FizzBuzz", "FizzBuzz", "FizzBuzz", "FizzBuzz", "FizzBuzz", "FizzBuzz", "FizzBuzz", "FizzBuzz", "FizzBuzz", "FizzBuzz", "FizzBuzz", "FizzBuzz", "FizzBuzz", "FizzBuzz", "FizzBuzz", "FizzBuzz"] (using ==)

So, yeah, we’re testing. But… yuck.

First off, it’s hideously-long, as is the error output: we’re forced to visually-match every single value in both arrays to find out what’s wrong. There’s also a bunch of repetition: we test whether the “Fizz” logic works 8 times!

But it’s a test. When it succeeds, we’ll know the code’s right. And that’s perfectly valid. But we can do better. And we need to do better, because writing tests like that all day long is one of the reasons people give up testing: it’s exhausting.

After a few minutes of looking, we realize we forgot the == 0 after each conditional. Notice that the test itself doesn’t help us at all in this search: it just sits there being broken. We fix our code:

def fizzbuzz last_num
  (1..last_num).map do |n|
    if n % 3 == 0 && n % 5 == 0
      "FizzBuzz"
    elsif n % 5 == 0
      "Buzz"
    elsif n % 3 == 0
      "Fizz"
    end
  end
end

…and run it again.

$ rspec stool_spec.rb 
F

Failures:

  1) fizzbuzz works to 30
       Failure/Error: fizzbuzz(30).should == %w{ 1 2 Fizz 4 Buzz Fizz 7 8 Fizz Buzz 11 Fizz 13 14 Buzz 16 17 Fizz 19 Buzz Fizz 22 23 Fizz Buzz 26 Fizz 28 29 FizzBuzz }
              expected: ["1", "2", "Fizz", "4", "Buzz", "Fizz", "7", "8", "Fizz", "Buzz", "11", "Fizz", "13", "14", "Buzz", "16", "17", "Fizz", "19", "Buzz", "Fizz", "22", "23", "Fizz", "Buzz", "26", "Fizz", "28", "29", "FizzBuzz"]
                   got: [nil, nil, "Fizz", nil, "Buzz", "Fizz", nil, nil, "Fizz", "Buzz", nil, "Fizz", nil, nil, "FizzBuzz", nil, nil, "Fizz", nil, "Buzz", "Fizz", nil, nil, "Fizz", "Buzz", nil, "Fizz", nil, nil, "FizzBuzz"] (using ==)

Hmm… Clearly we fixed something, but not everything. Even through the noise, we can see all those nils poking out. Oops, forgot our else case!

We change the code again:

def fizzbuzz last_num
  (1..last_num).map do |n|
    if n % 3 == 0 && n % 5 == 0
      "FizzBuzz"
    elsif n % 5 == 0
      "Buzz"
    elsif n % 3 == 0
      "Fizz"
    else
      n.to_s
    end
  end
end

…and run it again.

$ rspec stool_spec.rb 
F

Failures:

  1) fizzbuzz works to 30
     Failure/Error: fizzbuzz(30).should == %w{ 1 2 Fizz 4 Buzz Fizz 7 8 Fizz Buzz 11 Fizz 13 14 Buzz 16 17 Fizz 19 Buzz Fizz 22 23 Fizz Buzz 26 Fizz 28 29 FizzBuzz }
       expected: ["1", "2", "Fizz", "4", "Buzz", "Fizz", "7", "8", "Fizz", "Buzz", "11", "Fizz", "13", "14", "Buzz", "16", "17", "Fizz", "19", "Buzz", "Fizz", "22", "23", "Fizz", "Buzz", "26", "Fizz", "28", "29", "FizzBuzz"]
            got: ["1", "2", "Fizz", "4", "Buzz", "Fizz", "7", "8", "Fizz", "Buzz", "11", "Fizz", "13", "14", "FizzBuzz", "16", "17", "Fizz", "19", "Buzz", "Fizz", "22", "23", "Fizz", "Buzz", "26", "Fizz", "28", "29", "FizzBuzz"] (using ==)

And this brings us to the second reason people give up on testing: we have a bug in our test. After monkey-walking through the outputs in parallel, we see that although 15 should appear as “FizzBuzz” I accidentally typed “Buzz” because it was divisible by 5. We fix that, and now our test passes.

There is nothing more infuriating than writing a piece of code in 5 minutes then spending 2 hours beating your head against a testing framework trying to prove you were right all along. This is why your tests should be as simple as possible, because you can’t wrap tests around your tests!

Here’s the test I would like to write to prove our fizzbuzz algorithm works:

describe 'fizzbuzz' do
  it 'a single number' do
    fizzbuzz_a_number(9).should == "Fizz"
    fizzbuzz_a_number(10).should == "Buzz"
    fizzbuzz_a_number(11).should == "11"
    fizzbuzz_a_number(30).should == "FizzBuzz"
  end
end

In fact, If I’d been doing Test-Driven Development I probably would have started here and saved myself a lot of hair-pulling. Live and learn…

The “dream test” proves the algorithm works correctly in each of its four states, without all the repetition and noise. Since I picked multiples of our base numbers, I even covered the bug where we say n == 3 instead of n % 3 == 0 without having to write more test cases. It was a lot easier to write correctly the first time, and the output is more readable. Only one problem: we can’t run this code, because the fizzbuzz_a_number method doesn’t exist.

It doesn’t exist because the fizzbuzz method actually doesn’t have enough composition. It’s doing two distinct things: it’s iterating over an array, and the conditionals are doing the actual fizzbuzzing.

So how do we improve the composition of our method and get to run our dream test? We refactor.

Then Make It Easy

Refactoring means improving the structure of code without changing its external behavior. We typically refactor for one of two reasons: to improve the clarity of the code and make it easier to think about, or to make it easier to add a feature. If it’s hard to add a feature to your codebase, try refactoring your existing code into something that makes the feature easier to implement. Now that we have a test covering fizzbuzz we can be confident our changes won’t break logic that was working when we started.

The feature we want to implement is being able to use our awesome new test, so we’ll refactor to make it possible, using one of the most common refactorings: Extract Method.

Martin Fowler’s book Refactoring – with examples in Java, though there’s now a Ruby version – introduced a disciplined, meticulous approach to changing code, so even this simple one has several steps.

First we copy the relevant bit of code into a new method with an “intention-revealing” name. Good variable and method names go a long way toward making code self-commenting:

def fizzbuzz_a_number
  if n % 3 == 0 && n % 5 == 0
    "FizzBuzz"
  elsif n % 5 == 0
    "Buzz"
  elsif n % 3 == 0
    "Fizz"
  else
    n.to_s
  end
end

Next we look for variables created outside our new method but used inside it. We have only one: n. We can get it into the method by adding it as an argument:

def fizzbuzz_a_number n
  if n % 3 == 0 && n % 5 == 0
    "FizzBuzz"
  elsif n % 5 == 0
    "Buzz"
  elsif n % 3 == 0
    "Fizz"
  else
    n.to_s
  end
end

We then look for any variables modified or created inside this method but used outside it. In this case, the only one is the implicit return value and that takes care of itself.

Now we can run our dream test and it passes right away:

rspec stool_spec.rb 
..

Finished in 0.00051 seconds
2 examples, 0 failures

Naturally the original test still passes because we haven’t touched fizzbuzz yet. If we decided at this point to abandon the effort, we could delete our new method and everything would be as it was: no harm, no foul.

But we like our new code, so we complete the refactoring by replacing the original instance of the code with a call to the new method, passing along the argument:

def fizzbuzz_a_number n
  if n % 3 == 0 && n % 5 == 0
    "FizzBuzz"
  elsif n % 5 == 0
    "Buzz"
  elsif n % 3 == 0
    "Fizz"
  else
    n.to_s
  end
end

def fizzbuzz last_num
  (1..last_num).map do |n|
    fizzbuzz_a_number(n)
  end
end

The tests still run, and all is right with the world.

So big deal: now we have two tests that prove the same thing… Until we re-introduce one of our bugs by accidentally removing one of the == 0:

if n % 3 == 0 && n % 5
  "FizzBuzz"

…and rerun our tests:

$ rspec stool_spec.rb 
FF

Failures:

  1) fizzbuzz a single number
     Failure/Error: fizzbuzz_a_number(9).should == "Fizz"
       expected: "Fizz"
            got: "FizzBuzz" (using ==)
     # ./stool_spec.rb:23:in `block (2 levels) in <top (required)>'

  2) fizzbuzz works to 30
     Failure/Error: fizzbuzz(30).should == %w{ 1 2 Fizz 4 Buzz Fizz 7 8 Fizz Buzz 11 Fizz 13 14 FizzBuzz 16 17 Fizz 19 Buzz Fizz 22 23 Fizz Buzz 26 Fizz 28 29 FizzBuzz }
       expected: ["1", "2", "Fizz", "4", "Buzz", "Fizz", "7", "8", "Fizz", "Buzz", "11", "Fizz", "13", "14", "FizzBuzz", "16", "17", "Fizz", "19", "Buzz", "Fizz", "22", "23", "Fizz", "Buzz", "26", "Fizz", "28", "29", "FizzBuzz"]
            got: ["1", "2", "FizzBuzz", "4", "Buzz", "FizzBuzz", "7", "8", "FizzBuzz", "Buzz", "11", "FizzBuzz", "13", "14", "FizzBuzz", "16", "17", "FizzBuzz", "19", "Buzz", "FizzBuzz", "22", "23", "FizzBuzz", "Buzz", "26", "FizzBuzz", "28", "29", "FizzBuzz"] (using ==)

Notice the difference? The more-specific test shines a spotlight on the code that’s failing – we gave it a 9 and got FizzBuzz – making it a simple matter to fix the problem. The original one still gives me scroll-blindness and is not pulling its diagnostic weight, so I delete it with relief. There are still tests left to write – see the original post for examples – but we’ve made huge strides toward improving this code and its tests.

Deep Synergy

Do you see how the 3 legs collaborate to hold up the stool? We can’t prove our program works without tests, but we can’t write high-quality tests unless we design for testability. We can’t design for testability without understanding basic design concepts and knowing how to refactor to them. All this gives us the tools we need to turn the “first draft” version we hacked together to hit a deadline into code we can live with for the long haul.