Thursday, September 9, 2010

Categorizing automated tests — Make it so!

Image © iStockphoto.com / Evgeny Rannev

I love Continuous Integration. Just as I can't imagine writing software without continually building the code, I can't imagine committing code without a full test suite automatically running. However, the automated test suite sometimes gets stuck on things on which we think it shouldn't get stuck.

We run a Hudson instance as our continuous integration server. Hudson monitors Subversion for check-ins and starts a new build/test cycle within five minutes of a change. At present, a successful full build/test takes just over an hour with most of the time spent in tests.

Ideally we don't want any tests to fail so any test failure should mean a necessary code change to fix the test. But in practice there are some tests that are okay to fail temporarily. The vast majority of the "okay to fail" tests are ones using external sandbox systems. External sandbox systems go down. As long as they go down only occasionally and temporarily and not when we are actively developing code against them, it really doesn't bother us that much. Except that the corresponding tests that try to communicate with them fail.

The problem is that, in the Maven process, failing tests in one module prevent tests running in dependent modules. In our project hierarchy these external systems are used in a project on which many other projects depend. We would like to continue testing these dependent projects especially since most of the tests do not depend on the external system.

There may be a way to restructure Maven projects to reduce the test dependencies, but I think that would lead either to the external tests moving to a different project from the classes they test or to an explosion of projects causing a source control nightmare. I don't like either of these scenarios.

I'm thinking more along the lines of test levels I learned at Pacific Bell many years ago. At Pacific Bell test levels were arranged according to the extent of the system covered by the test. Applying these test levels to our current application:

  1. Unit: the individual method or class using test data in development environment.
  2. One-up/one-down: partner methods and classes using test data in development environment.
  3. End-to-end: full application (backend and UI) using test data in development environment.
  4. Test environment: full application (backend and UI) using test data in test environment.
  5. Production environment: full application (backend and UI) using production data in production environment.

I believe our current automated tests apply to the first four levels. If we categorize our existing tests each will fit sufficiently into one of these levels. For example, a test that verifies a class in isolation is categorized as Level 1. And on the other end of the spectrum, a test that verifies connectivity with a sandbox environment is categorized as Level 4 (while potentially just a one-up/one-down test, an external system that might better be classified as "test environment" rather than "development environment" which is not available in the lower three levels. A simple test that used a mock system could be categorized at Level 2 because a mock system implies not using the external system).

I imagine configuring Maven to run four passes of tests across the projects with each successive pass adding a new level of test. During the first pass only unit tests are run. During the second pass only one-up/one-down tests are run. The third and fourth passes run end-to-end and test environment tests respectively. At any point an failure prevents the later tests from running. However, the key is that a higher-level test failure will not prevent a lower-level test from running no matter how the project dependencies are configured.

This would give us more information about where problems lie before digging into the console output. For example, if a release build fails because of tests in level 2 we know something is very wrong and the release must be stopped. On the other hand, a release build failing in level 4 tests might be okay because the lower three levels passed in all projects: maybe the problem is limited to a specific external sandbox system. We could decide to accept the risk and build the release packages.

And hey, we can start saying things like, "Please run a level 3 diagnostic."

Follow me on Twitter @jsjacobAtWork.

No comments:

Post a Comment