No doubt unit testing is a key part of any programming experience. Everyone knows that it is a best practice to have high coverage of unit testing for a production level code. It is said that:
Code that is not covered by testing is code that we are ok for it to break in production.
Since we will never be happy with any code to break in production, we should aim for 100% coverage even though sometimes this is impractical.
But is it really impractical to aim for 100% coverage? And what do we mean really by the term "coverage"?
Coverage
In this article we will introduce a new definition of coverage. That is "every thing that can happen, will happen and should be tested."
Let's explain this in details, how can we measure this, and how to write our code for better measurement. Also, we will be using Java for any code examples throughout this article.
Measuring coverage
There are several ways to measure coverage. We will briefly mention two approaches that are offered by the vast majority of tools; Instruction usage and mutation coverage.
Instruction usage: Most testing coverage tools rely on this approach by measuring the number of implementation statements reached when running tests versus all statements. That is not sufficient because a single statement can contain a branch. For example, a test that makes implementation executes expression? ifTrue() : ifFalse() does not mean the statement is tested. For that reason, many tools try to include branches in measurement and trace executed branches versus all possible branches.
The calculation of all possible execution paths resulting from all branches can lead to huge number as they grow exponentially with every branch found in code. If we have a switch statement with 3 cases followed an if/else statement, that is 5 branches but 6 possible execution paths. The addition of another if statement doubles the number of possible executions to 12 although we added one branch and so on. Most test coverage tools will count branches to handle a reasonable number of cases. That is also because it is debatable whether covering all possible execution paths is really useful and whether it is the responsibility of the tool not the developer.
But even if we found a tool that will calculate all possible executions, what about different values that can impact the execution, consider if (a && b || c) return true; If we tested the case where a and b are both true but we didn't test the case when c is true, almost all test coverage tools that rely on instruction usage will never report an under coverage for the statement.
Mutation coverage: Not all testing coverage tools offer this due to its complexity. This approach means the tool will make some random changes to the implementation and runs the test against this changed implementation. If the tests passed, this means that either certain code is not tested, or that this code doesn't make any difference in the outcome. In the previous example, if (a && b || c) return true;, if the mutation removed the (|| c) part and tests still pass, it means that we were missing a test case.
Another example is if (a && b || true) return true;, if the mutation removed a && b term from the boolean expression, tests will pass if they used to pass because simply a && b term has no effect.
But how does a tool for mutation coverage know what to change? The answer is simple, they don't. Or at least, they cannot guarantee to make the change that will fail the tests if such change exists. A good mutation coverage tool rely on statistical patterns about what was most likely to be forgotten by experience. But it needs to try other less likely mutations as well, so it tries all mutations it knows with different probabilities.
I think now you got it. Despite mutation coverage can spot things that other tools cannot spot, it is not reliable. That is because every run it tries different mutation so it might run and say that it couldn't detect any gaps although there are some subtle gaps in your test/code.
From the above discussion we can conclude that, code coverage tools can just give you a heads up about potential testing gaps in your code but cannot guarantee a zero coverage. Pretty obvious!