Intention-Revealing Methods Can Save Your App

May 2014 - Grand Rapids

Read on if: your methods are huge, buggy, hard to read/fix/change or painful to test.

We've all written code like this:

def process_data
  raw_data = load_data()
  clean_data = raw_data.map do |row|
    %w{ name address phone email }.each do |field|
      if (row[field].nil? || row[field] == "")
        row[field] = "Unknown"
      end
    end
    row
  end
  persist(clean_data)
end

We've got some data, we know we need to clean it up before we can safely persist it, so we just clean it at the point we need it cleaned. It's the simplest thing to do.

The problem is that in order to realize that we're cleaning (the variable name does give a hint), or to understand what "cleaning" means (when we want to change it or fix a bug) we have to analyze the code to discover that "this is the bit that guarantees all required fields have a reasonable value in them."

Also, cleaning in-place obscures the overall behavior of process_data, which is: load, clean, persist.

What if we did this instead?

def process_data
  raw_data = load_data()
  clean_data = raw_data.map { |row| fill_in_required_fields(row) }
  persist(clean_data)
end

def fill_in_required_fields row
  %w{ name address phone email }.each do |field|
    if (row[field].nil? || row[field] == "")
      row[field] = "Unknown"
    end
  end
  row
end

Wrapping the ugly bit in a method whose name explains its purpose makes the code self-documenting and cleans-up the calling method. We call this pattern "Intention-Revealing Method".

Let's not kid ourselves: all the complexity of filling-in fields is still there, but it's been moved to a method that only fills-in fields for a single row, which is easier to think about because we can ignore everything else that process_data is doing.

It gets even better when there are multiple steps to get a "clean" row.

def process_data
  raw_data = load_data()
  clean_data = raw_data.
    select{ |row| not_in_system(row) }.
    select{ |row| is_active(row) }.
    map{ |row| fill_in_required_fields(row) }.
    map{ |row| only_relevant_fields(row) }
  persist(clean_data)
end

# ...all support methods

Imagine how large and difficult-to-read the last example would be if all the logic was implemented at the point of use! Now we can see the "big picture" view of the process_data behavior and if we're interested in any of the specific aspects, we can jump to the method that implements it.

We've also created new testing opportunities: we can pass a sample row object into any of those methods and ask "does this particular bit work?" without worrying about any of the other bits messing up our results.

Let's take the final step.

def process_data
  raw_data = load_data()
  clean_data = DataCleaner.clean(raw_data)
  persist(clean_data)
end

module DataCleaner
  def self.clean data
    data.
      select{ |row| not_in_system(row) }.
      select{ |row| is_active(row) }.
      map{ |row| fill_in_required_fields(row) }.
      map{ |row| only_relevant_fields(row) }
  end

  # ...all the supporting methods
end

Now process_data is distilled down to its essence (load, clean, persist) and the details of what it means to clean data is neatly gathered in a dedicated module called DataCleaner which can be extended and tested independently of process_data.

DataCleaner knows nothing about getting or persisting data and the only thing process_data knows about "cleaning" is that DataCleaner.clean knows how to do it. DataCleaner can be tested with any data we can fake up and doesn't require it to be loaded or persisted, greatly-simplifying our setup and results-checking.

Choosing good names

Of course, it's still possible to ruin all our good work by choosing names that are brittle, misleading or not helpful.

Good names describe what we need done, not how to do it. Keeping this in mind leads us to names that are "future-proofed": they won't have to change if the underlying code changes.

Good name: not_in_system
Bad name: cant_find_id_in_mysql

The bad name breaks if we change how we determine a row is missing, or if we switch from mysql to postgres or redis.

Good name: fill_in_required_fields
Bad name: fill_in_name_and_address

The bad name breaks if we add/remove required fields.

Takeaways

When you realize you need a bit of behavior at a certain spot, write something to do it for you (a method or even a class/module) with a name explaining the behavior, then call that thing from the spot you need it.
When choosing a name, explain the behavior in terms of "what" and not "how" in order to future-proof your code.