Automated Testing in GNU Wget

This page is for discussing future directions for how automated testing should work in Wget.

1. Issues With Current Automated Testing

Well, the good news is that thanks to Mauro, we now have testing infrastructure in place, and the capability to actually have some semblance of confidence in new changes that we make.

I believe, though, that there is still a lot of work remaining in order to bring the automated testing to where it needs to be. Here are some of the current problems, as I see it, with how testing currently works.

1.1. Not enough coverage

IMO there's not nearly enough coverage of the current code to be confident that new code hasn't introduced regressions, thus undermining one of the major benefits of having automated tests.

1.2. Manual inspection required

The current functionality (.px) test infrastructure is geared exclusively toward testing the immediately user-visible behavior of wget, and its after-effects on the file-system.

This completely leaves out testing of wget's interactions with how wget interacts with servers. For instance, testing that wget only issues a preliminary HEAD in the situations we want it to do so, before issuing the GET request. Or that wget only issues an Authorization header after it has received a challenge from the server, indicating what sort of authentication mechanism should be used.

Currently, the tests for the HEAD functionality assume a human operator is inspecting the log output of the test; this is not acceptable. Support for the Authorization testing was hacked into the test web server implementation by having it return an error code if authentication credits are received before any challenges are issued; this situation is not ideal, though, as there is no means by which to distinguish error responses from the server due to invalid behavior on the part of wget, from legitimate and expected error responses during testing.

1.3. Unit tests should be separate from the tested source

Leaving the conditionally-compiled unit test code in the same file as the main source being tested is not desirable for the following reasons:

it will lead to overly large files (especially if we start getting the test coverage with the unit tests that we want);
I'm trying to move us away from using #ifdefs all over the place, whereas this format moves us in the direction of adding more of these;
I much prefer to avoid any differences, even very minor ones, between "testing" or "debug" code, and "production" code. The more that "test" code differs from "production" code, the less confident we can be that we are testing what we're producing. The risk of introducing behavioral differences simply by adding more functions to a "testing" version may be quite low, but the consequences would be fairly high (could take quite some time to track down an issue we've already mentally ruled-out: the slight difference between the two different object files).

OTOH, as Mauro has pointed out, moving the test code into separate compilation units means we can't test the internal-linkage (static) functions. As I see it, there are a few ways we could deal with this:

we could remove the static specifiers for any functions we desire to test, giving them external linkage;
the static functions could be moved to a separate file, which would be #included by both the main source file and the unit test source file;
rather than linking as separate compilation units, the test source file could #include the main source files.

1.4. Unit tests preferred to high-level functionality tests

I'd like to prefer the writing of unit tests for each separate function and/or facility in the source code, to writing high-level functionality tests. The main reason is that, given the right support for such tests, it would be possible to test a wide variety of behaviors in Wget's code via simulation, rather than requiring real connections to web servers, etc. That is, a connection to one or more HTTP servers could be simulated by overriding the socket library (or, probably better, using an object-oriented interface to connections, and replacing the standard version with a connection simulator), without having to set up a fake test server, etc. And, how would one easily test interactions involving multiple hosts? Rather difficult to do with functionality tests, unless one can be assured of access to the internet and to a couple of test servers; but relatively much easier to do with unit test simulations.