The sad truth about testing web applications

December 23, 2009

The sad truth about testing web applications

Pablo Fernandez

HtmlUnit, HTTP, HttpUnit, integration, rant, Selenium, test, testing, web, web applications, Web Driver

There are many ways to test a web application. In the lowest level, we have unit tests; in the highest levels we have HTTP test, those that use the HTTP protocol to talk to running instance of your application (maybe running it on demand, maybe expecting it to be running on a testing server).

There are several ways to write HTTP tests. Two big families: with and without a web browser. Selenium is a popular way to write tests with a browser. A competing product is Web Driver which I understand can use a browser or other methods. If you’ve never seen Selenium before is pretty impressive. You write a tests that says something like:

go to http://…
click here
click there
fill field
fill field
submit form
assert response

and when you run it you actually see a Firefox window pop up and perform that sequence amazingly fast. Well, it’s amazingly fast the first three runs, while you still have two tests or less. After that it’s amazingly slow, tedious, flaky and intrusive.

For the other family of tests, without a web browser, aside of Web Driver we have HttpUnit, HtmlUnit and most of the Ruby on Rails testing frameworks. The headless solution tend to be faster and more solid, but the scenarios are not as realistic (only one JavaScript engine, if you are lucky, no rendering issues, like slowdowns, etc).

When you are testing, as soon as you touch the HTTP protocol everything becomes much harder and less useful. If you want to be totally confident a web application is working you need to test at the HTTP level, but the return-of-investment for those tests is very low: they are hard to write and not very useful.

Hard to write

They are hard to write because you are not calling methods with well-defined interfaces (list of arguments) but essentially calling one method HTTP-request, passing different parameters to get different results. You don’t have any code-completion, you don’t have any formal way to know which arguments to pass. Anything can be valid.

In a unit test you may have something like:

add_user("john");

when in a HTTP test you’ll have something like

http.send_request("/user/create", "username=john");

When you are writing a unit test, figure out the name of the add_user function and its arguments is easy. Some IDEs would autocomplete the name and show you the argument list. And if the name of add_user changes, some refactoring tools will even fix your tests for you.

But “/user/create” and “username=john” are strings. To figure them out you’ll have to know how your application handles routing, and how the parameters are passed and parsed. If your application changes from “/user/create” to “/user/add” the test will just break, and most likely, with a not-very-useful error message. Which takes into the next issue…

They are not very useful

They are not very useful because their failures are cryptic. When you write a test that calls method blah, which calls method bleh, which calls method blih, and then bloh and bluh and bluh divides by zero, you get an exception and a stack trace. Something like:

bluh:123: Division by zero! I can't divide by zero (I'm not Haskell)
bloh:234: bluh(...)
blih:452: bloh(...)
bleh:34: blih(...)
blah:94: bleh(...)
blah_test:754: blah(...)

You know that the test blah_test failed on line 754 when calling blah, which called bleh on line 94, which called blih on line 34, which called bloh on line 452 which called bluh on line 234 which dived by zero on line 123. You jump to bluh, line 123, and you may find something like:

a = i / 0;

where you replace the zero with something else; or most likely:

a = i / j;

where you have to track where j came from. Either it was calculated there or generated from another method and passed as an argument. The stack-trace gives you all the information you need to find where j was generated or where it came from. That’s a very useful test.

When you have HTTP in the middle, tests become much less useful. The stack trace of a failure would look something like:

http_request:123: Time out, server didn't respond.
blah_test:45: http_request(...)

That means that blah_test failed on line 45 making an http request call which failed with a timeout. Did your application divide by 0 and crashed? Did it try to calculate pi and it’s still doing it? Did it failed to connect to the database? Where did it actually fail? You don’t know. The only thing you know is that something went wrong. Time to open the log files and figure it out.

You open the log file and you find there’s not enough information there. You make the application log much, much more. So much that you’ll fill a terabyte in an hour. You run the test again and this time it just passes, no errors.

When you are at the HTTP level there are many, many things that are flaky and can go wrong. Let’s invent one example here: the web server you were using for the tests wants to DNS resolve everything it can. Every host name is resolved to the ip, and every ip is reverse-resolved to a name. When you run the test there was a glitch and your name servers were down. Now they are working correctly and they’ll never fail for another year. Good luck figuring it out from a time-out message.

The other way in which HTTP tests fail is something like this:

blah_test:74: Index out of bound for this array

You go to line 74 and it’s something like:

assert_equal("username", data[0]);

If data[0] caused an out-of-bound error, then the array data is empty. How can it be empty? It contains the response from the server and you know the server is responding with something usable because you are using the app right now.

What happened was that the log in box used to have the id, in HTML, "login" and it is now "log-in". That means the HTML parsing methods on blah_test don’t find the log in box and fail to properly fill the array data. Yet another case of tests exposing bugs, in the tests. And the real-life failures are much, much more complex like this.

My recommendation

All this makes the return of investment of writing HTTP tests quite low. They are very hard to write and they provide very little information when they fail. They do provide good information when they pass: if it works at the HTTP level, probably everything else works too.

I’d recommend any project not to write any HTTP test unless every other possible test, unit and integration, is already written.

I wrote a book:

Stack of copies of How to Hire and Manage Remote Teams

How to Hire and Manage Remote Teams, where I distill all the techniques I’ve been using to build and manage distributed teams for the past 10 years.

I write about:

I’ve been writing for a while:

Mastodon

Pablo Fernandez