Data driven tests

by

I’m not sure if anybody uses the terminology “data driven test” but if you explain what it is, experienced people will tel you that they are bad. Data driven tests are tests with the same code repeating over many different pieces of data.

Let’s show an example. For my startup project Keep on Posting, I have a method that turns a blog url into a feed url. That method is critical for my application and there are many things that can go wrong, so I test it by querying a sample of real blogs. The test would be something like this (this is in Ruby):

class BlogToolsTest
  BLOGS_AND_FEES =>
      "http://blog.sandrafernandez.eu" => "http://blog.sandrafernandez.eu/feed/",
      "http://www.lejanooriente.com" => "http://www.lejanooriente.com/feed/",
      "http://pupeno.com" => "http://pupeno.com/feed/",
      "http://www.avc.com/a_vc" => "http://feeds.feedburner.com/avc",
  }

  def test_blog_to_feed_url
    BLOGS_AND_FEEDS.each_pair do |blog_url, feed_url|
      assert_true feed_url == BlogTools.blog_to_feed(blog_url)
    end
  end
end

Note: I’m using assert_true instead of assert_equal to make a point; these kind of tests tend to user assert_true.

The problem with that is that eventually it’ll fail and it’ll say something like:

false is not true

Oh! so useful. Let’s see at least where the error is happening… and obviously it’ll point to this line:

      assert_true feed_url == BlogTools.blog_to_feed(blog_url)

which is almost as useless as the failure message. That’s the problem with data drive tests. You might be tempted to do this an re-run the tests:

  def test_blog_to_feed_url
    BLOGS_AND_FEEDS.each_pair do |blog_url, feed_url|
      puts blog_url
      puts feed_url
      assert_true feed_url == BlogTools.blog_to_feed(blog_url)
    end
  end

but if your tests take hours to run, like the ones I often found while working at Google, then you are wasting time. Writing good error messages ahead of time help:

  def test_blog_to_feed_url
    BLOGS_AND_FEEDS.each_pair do |blog_url, feed_url|
      assert_true feed_url == BlogTools.blog_to_feed(blog_url), "#{blog_url} should have returned the feed #{feed_url}"
    end
  end

and if half your cases fail and the whole suit takes an hour to run and you have 1000 data sets you’ll spend hours staring at your monitor fixing one test every now and then, because as soon as one case fails, the execution of the tests is halted. If you are coding in a language like Java, that’s as far as you can take it.

With Ruby you can push the boundaries and write it this way (thanks to executable class bodies):

class BlogToolsTest
  BLOGS_AND_FEES =>
      "http://blog.sandrafernandez.eu" => "http://blog.sandrafernandez.eu/feed/",
      "http://www.lejanooriente.com" => "http://www.lejanooriente.com/feed/",
      "http://pupeno.com" => "http://pupeno.com/feed/",
      "http://www.avc.com/a_vc" => "http://feeds.feedburner.com/avc",
  }

  BLOGS_AND_FEEDS.each_pair do |blog_url, feed_url|
    define_method "test_#{blog_url}_#{feed_url}" do
      assert_true feed_url == BlogTools.blog_to_feed(blog_url), "#{blog_url} should have returned the feed #{feed_url}"
    end
  end
end

That will generate one method per item of data, even if one fails, the rest will be executed as they are separate isolated tests. They will also be executed in a potential random order so you don’t have tests depending on tests and even if you don’t get a nice error message, you’ll know which piece of data is the problematic through the method name.

Note: that actually doesn’t work because blog_url and feed_url have characters that are not valid method names, they should be replaced, but I wanted to keep the example concise.

Since I’m using shoulda, my resulting code looks like this:

class BlogToolsTest
  BLOGS_AND_FEES =>
      "http://blog.sandrafernandez.eu" => "http://blog.sandrafernandez.eu/feed/",
      "http://www.lejanooriente.com" => "http://www.lejanooriente.com/feed/",
      "http://pupeno.com" => "http://pupeno.com/feed/",
      "http://www.avc.com/a_vc" => "http://feeds.feedburner.com/avc",
  }

  BLOGS_AND_FEEDS.each_pair do |blog_url, feed_url|
    should "turn blog #{blog_url} into feed #{feed_url}" do
      assert_equal feed_url, BlogTools.blog_to_feed(blog_url), "#{blog_url} did not resolve to the feed #{feed_url}"
    end
  end
end

and running them in RubyMine looks like this:


3 responses to “Data driven tests”

  1. Rory Avatar

    “If you are coding in a language like Java, that’s as far as you can take it.”

    That’s not true. You can programmatically generate test cases in JUnit3, and I understand it’s made even easier in JUnit4. It’s similarly easy in TestNG.

    1. J. Pablo Fernández Avatar

      Oh, I’d like to see some sample code for that…. or is it that thing where you have an XML file associated with a test as test source? That would work although I find it a little bit too contrived.

      1. Rory Avatar

        In JUnit 3 as test case is just an instance of a subclass of TestCase. There are two ways to create them.

        You can define methods named like “testSomething” and have the JUnit infrastructure generate objects automatically, one for each test method. That’s the usual way to create tests that you’re familiar with, but there’s no easy way to loop through some data set and run the same test for each test datum.

        The other way is to define a TestSuite that you populate with TestCase instances yourself. You override the ‘void runTest(TestResult)’ method with your own test logic. So you can define a subclass of TestCase with constructor parameters for, e.g., blog URL and feed URL, and have runTest contain the test logic. Then you construct a suite containing one instance of that test class for each set of test data.

        I hope that makes sense. I didn’t find an example of that kind of code online, and I’m wary of trying to type it from memory into a comment text box. ;)

Leave a Reply

You may also like:

If you want to work with me or hire me? Contact me

You can follow me or connect with me:

Or get new content delivered directly to your inbox.

Join 5,043 other subscribers

I wrote a book:

Stack of copies of How to Hire and Manage Remote Teams

How to Hire and Manage Remote Teams, where I distill all the techniques I've been using to build and manage distributed teams for the past 10 years.

I write about:

announcement blogging book book review book reviews books building Sano Business C# Clojure ClojureScript Common Lisp database Debian Esperanto Git ham radio history idea Java Kubuntu Lisp management Non-Fiction OpenID programming Python Radio Society of Great Britain Rails rant re-frame release Ruby Ruby on Rails Sano science science fiction security self-help Star Trek technology Ubuntu web Windows WordPress

I've been writing for a while:

Mastodon

%d bloggers like this: