Over the last 10 months, developers at edX have worked hard to increase automated test coverage. Our 2,491 Python unit tests currently cover 87% of the lines in the edx-platform repo. These tests, along with our Selenium acceptance tests and JavaScript unit tests, run on every pull request, allowing us to quickly validate proposed code changes.

How did we increase test coverage from less than 50% to 87% in just 10 months? Part of the answer is a tool called diff-cover.

In a typical workflow, a developer working in a large project might make a pull request that changes a few dozen lines of code. Before the change, the test coverage may have been 72%; afterwards, it could still be 72%. The size of the project makes it difficult to see the effect of a single pull request. Diff-cover lets you focus on the quality metrics of a single pull request instead of the project as a whole.

Diff-cover measures lines of code in a git diff. For a proposed change to the code, it will show you which of the changed lines are missing coverage. This is a simple idea, but it has powerful implications:

  • For developers, diff-cover provides a clear and achievable metric: if you touch a line of code, it should be covered.
  • For code-reviewers, diff-cover makes it easier to verify that developers are writing tests for all code changes.

By focusing on diff coverage, developers can make small, visible steps toward improving global coverage. A particular commit might increase global coverage by only a fraction of a percent but still have 95% diff coverage. Slowly but surely, as developers wrote tests for their code changes, global coverage began increasing as well. As a result, we were able catch certain kinds of bugs sooner.

More importantly, other developers began contributing to diff-cover itself and taking ownership over the tool. For example, Cale generalized the tool to support additional “quality” checks, and Sarina extended it to report pep8 and Pylint violations in a diff. Many other developers provided feedback and suggestions during an initial beta test of diff-cover. The tool became a starting point for a re-examination of our code review and testing standards, which led to a real change in our testing culture.

Of course, coverage measurement still has some important limitations. In particular, high diff coverage does not guarantee bug-free code: in a tightly coupled system, a change to one component could have far-reaching and unintended consequences on other parts of the system — even if the changed code is 100% covered. In such cases, integration tests can catch bugs that unit tests might miss.

If you think diff-cover could be useful to you, check out the project — it’s open-source and available on GitHub. The code is designed to be extensible to other version control systems and quality checkers, so feel free to add features and make a pull request!

Will Daly is a test engineer at edX. When he’s not advocating test-driven development or optimizing a Jenkins cluster, he enjoys running along rivers and minimizing the number of things in his apartment.