Journal of Systems and Software 84 (6): 885-905 (2011)

Non-Parametric Statistical Fault Localization 1

Zhenyu Zhang 2 , W.K. Chan 3 , T.H. Tse 4 , Y.Y. Yu 3 , and Peifeng Hu 5

[paper from ScienceDirect | technical report TR-2011-01]


Fault localization is a major activity in program debugging. To automate this time-consuming task, many existing fault-localization techniques compare passed executions and failed executions, and suggest suspicious program elements, such as predicates or statements, to facilitate the identification of faults. To do that, these techniques propose statistical models and use hypothesis testing methods to test the similarity or dissimilarity of proposed program features between passed and failed executions. Furthermore, when applying their models, these techniques presume that the feature spectra come from populations with specific distributions. The accuracy of using a model to describe feature spectra is related to and may be affected by the underlying distribution of the feature spectra, and the use of a (sound) model on inapplicable circumstances to describe real-life feature spectra may lower the effectiveness of these fault-localization techniques. In this paper, we make use of hypothesis testing methods as the core concept in developing a predicate-based fault-localization framework. We report a controlled experiment to compare, within our framework, the efficacy, scalability, and efficiency of applying three categories of hypothesis testing methods, namely, standard non-parametric hypothesis testing methods, standard parametric hypothesis testing methods, and debugging-specific parametric testing methods. We also conduct a case study to compare the effectiveness of the winner of these three categories with the effectiveness of 33 existing statement-level fault-localization techniques. The experimental results show that the use of non-parametric hypothesis testing methods in our proposed predicate-based fault-localization model is the most promising.

Keywords: Fault localization, hypothesis testing, parametric method, non-parametric method

1. The research is supported in part by grants of the National Natural Science Foundations of China (project nos. 61003027 and 61073006), the Research Grants Council of Hong Kong (project nos. 111410, 123206, 123207 and 716507), and City University of Hong Kong (project no. 7002464).
2. State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing, China.
3. (Corresponding author.)
Department of Computer Science, City University of Hong Kong, Tat Chee Avenue, Hong Kong.
4. Department of Computer Science, The University of Hong Kong, Pokfulam, Hong Kong.
5. China Merchants Bank, Central, Hong Kong.


