Software Testing for Non-Testable Programs
About This Project
Conventional software testing checks whether each output is correct for the set of test inputs. But for some software, it is not known what the correct output should be for some inputs — yet it is still important to detect coding errors in that software, so they can be fixed. This dilemma arises frequently for machine learning, simulation and optimization applications, often “Programs which were written in order to determine the answer in the first place. There would be no need to write such programs, if the correct answer were known.” [Weyuker EJ. On testing non-testable programs. Computer Journal, November 1982; 25(4):465–470.] As these kinds of applications are frequently used in public infrastructure and biomedical research (domains targeted in this research), it is critical to detect and fix errors before a calamity occurs.
In this project, we seek to develop testing techniques that successfully find defects in software without a test oracle (known as “Non-Testable Programs” although that phrase is not intended to be literal). The conventional way to test software is to devise pairs of input data and expected output results, run the program with the prepared input data and then compare the actual results to those expected; alternatively, preparation of expected output results can be omitted if there is some easy way to check whether the actual results are correct. The mechanism to do this is in either case called a test oracle. However, there are large and important classes of software applications – in machine learning, simulation, optimization, etc. domains – where there simply is no test oracle, and for the general case it is impossible (or at least impractical, e.g., it would take years at prohibitive cost, and/or would require unacceptable experimentation with human subjects) to determine whether the actual output is indeed correct for given input. Yet we would still like to be able to devise test cases that enable us to find coding errors. We have investigated several approaches, of which the most successful (at finding both previously unknown bugs and seeded bugs in real-world applications) has been “metamorphic testing”. Fortunately, many such applications reflect ‘metamorphic properties’ that define a relationship between pairs of inputs and outputs, such that for any previous input i with its already known output o, one can easily derive a test input i’ and predict the expected output o’. If the actual output o” is different from o’, then there must be an error in the code.
Metamorphic testing was previously designed by others as a general technique for creating follow-up test cases based on existing ones, particularly those that have not revealed any failure, in order to try to find uncovered flaws. Instead of being an approach for test case selection, it is a methodology of reusing input test data to create additional test cases whose outputs can be predicted. In metamorphic testing, if input x produces an output f(x), the function’s (or application’s) metamorphic properties can then be used to guide the creation of a transformation function t, which can then be applied to the input to produce t(x); this transformation then allows us to predict the output f(t(x)), based on the (already known) value of f(x). If the output is not as expected, then a defect must exist. Of course, this can only show the existence of defects and cannot demonstrate their absence, in the case of software with no test oracle, since the correct output cannot be known in advance (and even if the outputs are as expected, both could be incorrect). But metamorphic testing provides a powerful technique to reveal defects in such non-testable programs by use of a built-in pseudo-oracle.
Our initial work evaluated the feasibility of applying metamorphic testing to machine learning apps, resulting in a paper published at SEKE 2008 in which we categorized six types of metamorphic properties that such applications may have.
We then created a tool called Corduroy to automate the process by allowing developers to specify individual functions’ metamorphic properties using the specification language JML; these properties could then be checked using JML Runtime Assertion Checking. This yielded a paper that was presented at ICST 2009. We further developed a technique called Metamorphic Runtime Checking, and an implementation framework called Columbus, that are described in a tech report.
For system-level testing, we developed a tool called Amsterdam and an approach called Automated Metamorphic System Testing. This allows for checking of the application’s metamorphic properties at runtime, using the real input from actual executions. However, the checks are conducted in parallel (for performance reasons), and any side effects are hidden from the user. Amsterdam was presented at ISSTA 2009, and is presented in more detail in a tech report.
We have recently completed two studies where we applied the metamorphic testing approach to healthcare simulation software with no test oracle: the JSim discrete event simulation system (developed by the University of Massachusetts at Amherst) for modeling the flow of patients through a hospital emergency room, and the GCS Glycemic Control Simulator (developed by the University of Pennsylvania) to simulate the behavior of different closed-loop insulin titration algorithms on a virtual patient. Both studies were in collaboration with the developers of the software under test, and involved bugs seeded through the standard mutation testing technique. Our metamorphic testing approach was able to detect all of the errors in JSim and 60% of the inserted errors in GCS. Our experiments in these and other domains demonstrate that metamorphic testing is more effective at detecting defects in applications without test oracles than using a “partial oracle” (simple input for which the correct output can be known) or runtime assertion checking, although these techniques find some defects not found by metamorphic testing and thus should be used in combination.
We now plan to investigate methodology for determining the metamorphic properties of software and for devising good test cases from which the secondary tests can be derived. The project will extend the inputs/outputs considered in our previous work on metamorphic testing to focus on application state, before and after, rather than just functional parameters and results. The research will also extend the pairwise relations implied by metamorphic properties to ‘semantic similarity’ for nondeterministic applications, applied to profiles from numerous executions, since an exact relation cannot be expected to hold for a single pair of test executions. These extensions will enable treatment of more sophisticated properties that preliminary experiments have shown to reveal defects that were not detected otherwise.
Interested students should view our project student ads.
Contact: Jon Bell (firstname.lastname@example.org)
Prof. Gail Kaiser, kaiser [at] cs.columbia.edu
Jonathan Bell, email@example.com
Christian Murphy, M. S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup Lee, Oleg Sokolsky, Lori Clarke, Leon Osterweil. On Effective Testing of Health Care Simulation Software. 3rd International Workshop on Software Engineering in Health Care, May 2011.
Xiaoyuan Xie, Joshua W. K. Ho, Christian Murphy, Gail Kaiser, Baowen Xu and Tsong Yueh Chen. Testing and Validating Machine Learning Classifiers by Metamorphic Testing. Journal of Systems and Software, Elsevier, 84(4):544-558, April 2011.
Christian Murphy and Gail Kaiser. Automatic Detection of Defects in Applications without Test Oracles. Technical Report CUCS-027-10, Dept. of Computer Science, Columbia University, October 2010. (Amsterdam)
Christian Murphy and Gail Kaiser. Metamorphic Runtime Checking of Non-Testable Programs. Technical Report CUCS-042-09, Dept. of Computer Science, Columbia University, October 2009. (Columbus)
Christian Murphy, Kuang Shen and Gail Kaiser. Automatic System Testing of Programs without Test Oracles. International Symposium on Software Testing and Analysis, July 2009.
Christian Murphy, Kuang Shen and Gail Kaiser. Using JML Runtime Assertion Checking to Perform Metamorphic Testing in Applications without Test Oracles. 2nd IEEE International Conference on Software Testing, Verification and Validation, April 2009.
Christian Murphy, Gail Kaiser, Lifeng Hu and Leon Wu. Properties of Machine Learning Applications for Use in Metamorphic Testing. 20th International Conference on Software Engineering and Knowledge Engineering, July 2008.
Christian Murphy, Gail Kaiser and Marta Arias. Parameterizing Random Test Data According to Equivalence Classes. 2nd ACM International Workshop on Random Testing, November 2007.
Christian Murphy, Gail Kaiser and Marta Arias. An Approach to Software Testing of Machine Learning Applications. 19th International Conference on Software Engineering and Knowledge Engineering, July 2007, pp. 167-172.
Available student project positions:
We are investigating how to detect bugs in so-called non-testable programs, where there is no test oracle to determine whether or not a given output is correct for an arbitrary input. This problem is common in machine learning, data mining, information retrieval, simulation, optimization and scientific computing applications. Our current approach is based on ‘metamorphic testing’, a general technique for creating follow-up test cases based on existing ones, particularly those that have not revealed any failure, in order to try to uncover flaws. Instead of being an approach for test case selection, it is a methodology of reusing input test data to create additional test cases whose outputs can be predicted. In metamorphic testing, if input x produces an output f(x), the function’s metamorphic properties can then be used to guide the creation of a transformation function t, which can then be applied to the input to produce t(x); this transformation then allows us to predict the output f(t(x)), based on the (already known) value of f(x). If the output is not as expected, then a defect must exist. Of course, this can only show the existence of defects and cannot demonstrate their absence, since the correct output cannot be known in advance (and even if the outputs are as expected, both could be incorrect), but metamorphic testing provides a powerful technique to reveal defects in otherwise nontestable programs by use of a built-in pseudo-oracle. We have applied this approach to a wide range of nontestable programs and indeed found many conventional bugs, such as off-by-one errors. We now plan to extend this technique to address application and system state, both in-memory and the file system, in combination with in vivo testing.
Contact: Jon Bell firstname.lastname@example.org