# High Dimensionality Algorithm Tester

Innolitics Open Source

## The Problem

At Innolitics, we help our clients develop custom machine learning algorithms, as well as “traditional” image and signal processing algorithms.

Testing image processing algorithms is difficult because

• Their outputs are large (i.e., have high dimensionality) and difficult to compare.
• Small changes to the algorithm usually alter all outputs.
• They are computationally intensive.
• The input datasets are large, and take a lot of space to store.
• There are typically too few test input datasets to cover the space of possible inputs.

## The Solution

We developed a testing methodology and an associated Python framework that helps us address the first two difficulties in this list. This framework is called our “High Dimensionality Algorithm Tester”, or HDAT.

Without using a tool like HDAT, developers must choose between a few sub-optimal options when testing high-dimensionality algorithms:

1. Check the algorithm’s outputs manually each time a change is made.
2. Compare the algorithm’s outputs to expected outputs automatically.
3. Reduce the algorithm’s outputs to lower-dimensional numbers, and compare these to expected results automatically.

Each of these approaches has advantages and disadvantages. Approach (1) requires time and effort, and usually results in developers skipping the tests from time to time. Thus, unless rigorously enforced, approach (1) will have low sensitivity. On the other hand, manually verifying the algorithm’s outputs is usually the gold-standard for verification, and thus has high specificity.

Approach (2) has very high sensitivity, but low specificity. Most changes to the algorithm will cause these tests to fail, and the expected outputs will need to be adjusted. Note that this this is not a simple matter of accounting for rounding errors, or even performing weak comparisons between the input and output.

Approach (3) can be tailored to the particular algorithm in question, and can trade specificity for sensitivity. For example, if an algorithm’s output is a 512x512 image, we could reduce the output to its sum. The test could then check to see whether the sum has changed from its expected value by more than 10%. Clearly, depending on the changes to the algorithm, such an approach could result in false positives or false negatives.

HDAT combines approaches (1) and (3) in a clever way, such that we can nearly achieve the specificity of manual tests, with the sensitivity of automated tests.