A Study on Applying Machine Learning approach to Forecast a Software Defect
Keywords:
software metrics, bug, classifier, cross validation, machine learningAbstract
Defects are common in software systems and can potentially cause various problems to software users. Different methods have been developed to quickly predict the most likely locations of defects in large code bases. Most of them focus on designing features (e.g. complexity metrics) that correlate with potentially defective code. Those approaches however do not sufficiently capture the syntax and different levels of semantics of source code, an important capability for building accurate prediction models.In our approach, three supervised machine learning algorithms are considered to build the model and predict the occurrence of the software bugs based on historical data by deploying the classifiers Logistic regression, Naïve Bayes, and Decision Tree. Historical data has been used to predict the future software faults by deploying the classifier algorithms and make the models a better choice for predictions using random forest ensemble classifiers and validating the models with K-Fold cross validation technique which results in the model effectively working for all the scenarios.
References
Chidamber, S.R. and C.F. Kemerer, "A metrics suite for objectoriented design", in IEEE Transaction on Software Engineering., Vol. 20: pp.476-493, 1994
D'Ambros, M. Lanza, and R. Robbes, "An Extensive Comparison of Bug Prediction Approaches", In Proc. IEEE Seventh Working Conf. Mining Software Repositories, pp. 31-41, 2010
Dario Di Nucci, Fabio Palomba ,Giuseppe De Rosa Gabriele Bavota ,Rocco Oliveto, and Andrea De Lucia, “A developer centric bug prediction model", IEEE Transactions on Software Engineering, Vo.l 44, Issue 1, pp. 5-24, 2018
F. Wu et al., "Cross-Project and Within-Project Semi supervised Software Defect Prediction: A Unified approach", IEEE Transactions on Reliability, pp. 1-17, 2018
Gyimothy, T., Ferenc, R. and Siket, I., "Empirical validation of object-oriented metrics on open source software for fault prediction", IEEE Transactions on Software Engineering, 31(10), pp. 897-910, 2005.
John T. Pohlmann and Dennis w. Leitnera "Comparison of Ordinary Least Squares and Logistic Regression", The Ohio Journal of Science. vol. 103, number 5, pp. 118-125, Dec, 2003 SPSS, https://www.ibm.com/analytics/spss-statistics-software
Kumar, Lov, and AshishSureka. "Aging Related Bug Prediction using Extreme Learning Machines.", In Proc. 14th IEEE India Council International Conference (INDICON), pp.1-6, IEEE, 2017.
M. M. Rosli, N. H. I. Teo, N. S. M. Yusop, and N. S. Mohammad, "The design of a software fault prone application using evolutionary algorithm," in Proc. IEEE Conference on Open Systems (ICOS 2011). Los Alamitos, California: IEEE Computer Society, pp. 38-343. 2011
Meiliana, S. Karim, H. L. H. S. Warnars, F. L. Gaol, E. Abdurachman and B. Soewito, "Software metrics for fault prediction using machine learning approaches: A literature review with PROMISE repository dataset", In Proc. IEEE International Conference on Cybernetics and Computational Intelligence(CyberneticsCom), Phuket, pp.19- 23, 2017
Nigam, Ayan, et al. "Classifying the bugs using multi-class semi supervised support vector machine.", In Proc. International Conference, Pattern Recognition, Informatics and Medical Engineering (PRIME), pp.393-397, IEEE, 2012.
Pushphavathi T P, "An Approach for Software Defect Prediction by Combined Soft Computing", In Proc, International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS) pp.3003- 3006, 2017
S. Benlarbi, et al., "Issues in validating object-oriented metrics for early risk prediction," in International Symposium Software Reliability Eng.(ISSRE'99). , Boca Raton, Florida, 1999.