A FRAMEWORK FOR CONSENSUS-BASED, STABLE CLUSTERING OF SOFTWARE FEATURE REQUESTS IN OPEN SOURCE FORUMS

Horatiu Dumitru,  Chuan Duan,  Jane Cleland-Huang*

DePaul University, Computing, Chicago, IL 60604

jhuang@cs.depaul.edu


Abstract

As software projects continue to increase in size and complexity and involve stakeholders across geographical and organizational boundaries, project managers are becoming progressively more reliant upon the use of open discussion forums to elicit software requirements and to communicate with their stakeholder base. Unfortunately, open forums generally fail to provide adequate support for several fundamental elements of the requirements elicitation process including the timely identification of emerging topics and cross-cutting themes. This failure hinders the exchange of ideas between stakeholders with similar interests and concerns. The work described in this presentation explores the application of data mining and machine learning techniques to automate the process of detecting and managing themes across feature requests. Clustering of feature requests is challenged by a significant amount of background noise such as spelling errors, poor grammar, slang, long-winded comments, use of non-standard abbreviations, redundancies, inconsistent use of terms, and off topic discussions! To address these problems, a new framework is introduced which augments a modified version of the SPK Means clustering algorithm with consensus clustering and user provided feedback in order to deliver cohesive, loosely coupled, and stable discussion threads. The new framework is evaluated through a series of experiments to first compare the quality of the automatically generated clusters against those created by human users in open source forums, and then compare the quality, stability, and performance of the proposed clustering techniques against standard SPK Means.

Download

[Abstract (PDF)]