International Journal of Web Services Research 12 (3): 1-24 (2015).

Connecting the Average and the Non-Average:
A Study of the Rates of Fault Detection in Testing WS-BPEL Services

Changjiang Jia 2, 3 , Lijun Mei 4 , W.K. Chan 2 , Y.T. Yu 2 , and T.H. Tse 5

[technical report TR-2015-01]


Many existing studies measure the effectiveness of test case prioritization techniques using the average performance on a set of test suites. However, in each regression test session, a real-world developer may only afford to apply one prioritization technique to one test suite to test a service once, even if this application results in an adverse scenario such that the actual performance in this test session is far below the average result achievable by the same technique over the same test suite for the same application. It indicates that assessing the average performance of such a technique cannot provide adequate confidence for developers to apply the technique. We ask a couple of questions: To what extent does the effectiveness of prioritization techniques in average scenarios correlate with that in adverse scenarios? Moreover, to what extent may a design factor of this class of techniques affect the effectiveness of prioritization in different types of scenarios?

To the best of our knowledge, we report in this paper the first controlled experiment to study these two new research questions through more than 300 million APFD and HMFD data points produced from 19 techniques, eight WS-BPEL benchmarks and 1000 test cases prioritized by each technique 1000 times. A main result reveals a strong and linear correlation between the effectiveness in the average scenarios and that in the adverse scenarios. Another interesting result is that many pairs of levels of the same design factors significantly change their relative strengths of being more effective within the same pairs in handling a wide spectrum of prioritized test suites produced by the same techniques over the same test suite in testing the same benchmarks, and the results obtained from the average scenarios is more similar to that of the more effective end than otherwise. This work provides the first piece of strong evidence for the research community to re-assess how they develop and validate their techniques in the average scenarios and beyond.

Keywords: XML-based artifact, WS-BPEL, test case prioritization, average scenario, adverse scenario, correlation, testing, regression testing, empirical study, controlled experiment

1. This research is supported in part by the Early Career Scheme and the General Research Fund of the Research Grants Council of Hong Kong (project numbers 111313, 125113, 123512, 716612, and 717811).
2. Department of Computer Science, City University of Hong Kong, Tat Chee Avenue, Hong Kong.
3. National University of Defense Technology, Changsha, China.
4. (Corresponding author.)
IBM Research – China, Beijing, China.
5. Department of Computer Science, The University of Hong Kong, Pokfulam, Hong Kong.


  Cumulative visitor count