Bauhaus

A Quantitative Evaluation Framework for Clustering Techniques for Re-Modularization and Component Recovery


 

Information on the Bauhaus project

Overview

Background Information and Goals

Component recovery supports program understanding, architecture recovery, and re-use and is a prerequisite for re-modularization. Recovering static components, i.e., modules and subsystems, has proven useful in a number of ways and many methods to automatically or semi-automatically detect static components have been published in the literature. However, their relative strengths and weaknesses are unknown to a great extent.

Experimental quantitative evaluation of clustering techniques for component recovery is necessary in order to analyze their strengths and weaknesses in comparison to other techniques. However, there has not been any accepted evaluation scheme so far, which would allow to compare different techniques, since Lakhotia and Gravely have pointed out this  grievance in 1995. The goal of our initiative is to establish such an evaluation scheme. The basic idea is to compare components as detected by automatic techniques, so-called candidates, to components manually identified by human software engineers or programmers, so-called references and then to measure precision and recall of the set of proposed candidates with respect to the set of references. To this end, we need:

In our contribution "A Framework for Experimental Evaluation of Clustering Techniques" to the International Workshop on Program Comprehension, 2000, in Limerick, we described a process of setting up reference corpora and ways of measuring recall and precision of automatic clustering techniques in details (slides of the presentation are also available). Here, we will make available our framework for quantitative evaluation of fully automatic clustering techniques for software modules. The framework consists of a specification of a quantitative evaluation method, a tool that supports this method, and manually compiled sets of software modules of diverse systems that are used to benchmark the automatic techniques.

Limerick

In the spirit of Limerick, where our framework was first presented:
She comes up to Limerick with a new way to cluster,
but can we really trust'er?
If she picks up this theme,
if she uses our scheme,
she will be our master.

Quantitative Evaluation Scheme

The quantiative evaluation scheme is described in our paper "A Framework for Experimental Evaluation of Clustering Techniques", Proceedings of the International Workshop on Program Comprehension, 2000, IEEE Computer Society Press.

Freely available tool support

We offer our tool that implements the evaluation, given a set of references and a set of candidates. It can be freely downloaded. However, we expect feedback from users (and we don't give any warranties whatsoever of course). Executables are available for the following platforms (N.B.: Use the "Save Link as" facility of your browser for the following two links; these files are executables): The tool is called with two arguments: the file with the candidates (using -cand) and the file with the references (using -ref), e.g.,

   detection_quality -cand candidates.re -ref references.re

An option -detailed may be used to print the pairs of corresponding components. Use

  detection_quality -help

for a full description of all options.

The input in both files must be according to the following grammar rules (in EBNF; comments start with an % and end with the next new line):



File                         ::= <empty file> | Components .
Components           ::= Component { ',' Component }.
Component             ::= '{' [Component_Name] [Entities] '}' .
Entities                   ::= Entity { ',' Entity } .
Entity                      ::= '<' System_Name ',' File_Name ',' Object_Name '>' .
Component_Name ::= '%NAME' STRING .
System_Name       ::= STRING .
File_Name             ::= STRING .
Object_Name         ::= STRING .

STRING                  ::= '"' <any character except " and newline > '"' .



Here is an example with two components:

{%NAME "ConceptArray"
<"concepts", "concept.h", "ConceptArray">,               % a type
<"concepts", "concept.c", "ConceptSubconcepts">,    % a subprogram
<"concepts", "concept.c", "ConceptLattice">,             % another type
<"concepts", "concept.c", "ConceptArrayDelete">,
<"concepts", "concept.c", "ConceptArrayInit">
},
{%NAME "Concept"
<"concepts", "concept.h", "Concept">,
<"concepts", "concept.c", "ConceptPrint">,
<"concepts", "concept.c", "ConceptDelete">,
<"concepts", "concept.c", "ConceptInit">
}

Two larger examples are:

Benchmark Systems

Altogether, we have four benchmark systems, for which reference components were manually obtained by software engineers:
Aero
A more detailed description on how the reference components for the systems were obtained along with a profile of the human analysts can be found here. These reference components are freely available, but we do expect feedback. Please contact  Rainer Koschke.

Contact

For further information and suggestions, please contact Rainer Koschke.

Some References to Component Recovery Techniques

  1. Belady, L.A., Evangelisti, C.J., `System Partitioning and its Measure', Journal of Systems and Software, 2(1), pp. 23-29, February 1982.
  2. Canfora, G., Cimitile, A., Munro, M., `An Improved Algorithm for Identifying Objects in Code', Journal of Software Practice and Experience, 26(1), pp. 25-48, January 1996.
  3. Choi, S.C., Scacchi, W., `Extracting and Restructuring the Design of Large Systems', IEEE Software, 7(1), pp. 66-71, January 1990.
  4. Doval, D., Mancoridis, S., Mitchel, B.S, Chen, Y., Gansner, E.R., 'Automatic Clustering of Software Systems using a Genetic Algorithm', Proceedings of the International Conference on Software Tools and Engineering Practice, August 1999.
  5. Gall, H., Klösch, R., `Finding Objects in Procedural Programs: An Alternative Approach', Proceedings of the Second Working Conference on Reverse Engineering, pp. 208-216, Toronto, July 1995.
  6. Girard, J.F., Koschke, R., Schied, G., `A Metric-based Approach to Detect Abstract Data Types and Abstract State Encapsulation', Journal on Automated Software Engineering, 6, pp. 357-386, Kluwer 1999.
  7. Girard, J.F., Koschke, R., `A Comparison of Abstract Data Type and Objects Recovery Techniques', Journal Science of Computer Programming, Elsevier, to appear.
  8. Girard, J.F., Koschke, R., `Finding Components in a Hierarchy of Modules: a Step Towards Architectural Understanding', International Conference on Software Maintenance, pp. 58-65, Bari, October 1997.
  9. Hutchens, D.H., Basili, V.R., `System Structure Analysis: Clustering with Data Bindings', IEEE Transactions on Software Engineering, SE-11(8), pp. 749-757, August 1985.
  10. Kazman, R., Carrière, S.J., `Playing detective: reconstructing software architecture from available evidence', Technical Report CMU/SEI-97-TR-010, ESC-TR-97-010, Software Engineering Institute, Pittsburgh, USA, 1997.
  11. Koschke, R., `An Semi-Automatic Method for Component Recovery', Proceedings of the Sixth Working Conference on Reverse Engineering, pp.256-267, Atlanta, October 1999.
  12. Koschke, R. (2000), `Atomic Architectural Component Detection for Program Understanding and System Evolution', Ph.D. thesis. University of Stuttgart, to appear.
  13. Lakhotia, A., `A Unified Framework for Expressing Software Subsystems Classification Techniques', Journal Systems Software, Elsevier Science Publisher, 36, pp. 211-231, 1997
  14. Lakhotia, A., Gravely, J.M., `Toward Experimental Evaluation of Subsystem Classification Recovery Techniques', Proceedings of the Second Working Conference on Reverse Engineering, pp. 262-269, Toronto, July 1995.
  15. Lindig, C., Snelting, G., `Assessing Modular Structure of Legacy Code Based on Mathematical Concept Analysis', Proceedings of the Nineteenth International Conference on Software Engineering, Boston, 1997.
  16. Liu, S.S., Wilde, N., `Identifying Objects in a Conventional Procedural Language: An Example of Data Design Recovery', Proceedings of the International Conference on Software Maintenance, pp. 266-271, November 1990.
  17. Müller, H., Wong, K., Tilley, S., `A Reverse Engineering Environment Based on Spatial and Visual Software Interconnection Models', Proceedings of the Fifth ACM SIGSOFT Symposium on Software Development Environments, pp 88-98, Tyson's Corner, December 1992.
  18. Müller, H.A., Orgun, M.A., Tilley, S.R., Uhl, J.S., `A Reverse Engineering Approach to Subsystem Structure Identification'. Journal of Software Maintenance: Research and Practice, 5(4), pp. 181-204, December 1993.
  19. Ogando, R.M., Yau, S.S., Wilde, N., `An Object Finder for Program Structure Understanding in Software Maintenance', Journal of Software Maintenance, 6(5), pp. 261-83, September-October 1994.
  20. Patel, S., Chu, W., Baxter, R., `A Measure for Composite Module Cohesion', Proceedings of the Fourteenth International Conference on Software Engineering, pp. 38-48, Melbourne, May 1992.
  21. Sahraoui, H., Melo, W, Lounis, H., Dumont, F., `Applying Concept Formation Methods to Object Identfication in Procedural Code', Proceedings of the Twelfth Conference on Automated Software Engineering, pp. 210-218, Nevada, November 1997.
  22. Schwanke, R. W., `An Intelligent Tool for Re-engineering Software Modularity', Proceedings of the International Conference on Software Engineering, pp. 83-92, May 1991.
  23. Siff, M., Reps, T., `Identifying Modules via Concept Analysis', Proceedings of the International Conference on Software Maintenance, pp. 170-179, Bari, October 1997.
  24. Tzerpos, V., Holt, R., `MoJo: A Distance Metric for Software Clusterings', Proceedings of the Sixth Working Conference on Reverse Engineering, pp. 187-193, Atlanta, July 1999.
  25. Valasareddi, R.R., Carver, D.L., `A Graph-based Object Identification Process for Procedural Programs', Proceedings of the Fifth Working Conference on Reverse Engineering, pp. 50-58, Honolulu, October 1998.
  26. Yeh, A.S., Harris, D., Reubenstein, H., `Recovering Abstract Data Types and Object Instances From a Conventional Procedural Language', Proceedings of the Second Working Conference on Reverse Engineering, pp. 227-236, July 1995.

 
koschke@informatik.uni-stuttgart.de

Last modified: Wed Sep  6 12:08:29 MET DST 2000