Context: Free and Open Source Software (FOSS) communities’ ability to stay viable and productive over time is pivotal for society as they maintain the building blocks that digital infrastructure, products, and services depend on. Sustainability may, however, be characterized from multiple aspects, and less is known how these aspects interplay and impact community outputs, and software quality specifically. Objective: This study, therefore, aims to empirically explore how the different aspects of FOSS sustainability impact software quality. Method: 16 sustainability metrics across four categories were sampled and applied to a set of 217 OSS projects sourced from the Apache Software Foundation Incubator program. The impact of a decline in the sustainability metrics was analyzed against eight software quality metrics using Bayesian data analysis, which incorporates probability distributions to represent the regression coefficients and intercepts. Results: Findings suggest that selected sustainability metrics do not significantly affect defect density or code coverage. However, a positive impact of community age was observed on specific code quality metrics, such as risk complexity, number of very large files, and code duplication percentage. Interestingly, findings show that even when communities are experiencing sustainability, certain code quality metrics are negatively impacted. Conclusion: Findings imply that code quality practices are not consistently linked to sustainability, and defect management and prevention may be prioritized over the former. Results suggest that growth, resulting in a more complex and large codebase, combined with a probable lack of understanding of code quality standards, may explain the degradation in certain aspects of code quality.
[Context] The continuous inflow of bug reports is a considerable challenge in large development projects. Inspired by contemporary work on mining software repositories, we designed a prototype bug assignment solution based on machine learning in 2011-2016. The prototype evolved into an internal Ericsson product, TRR, in 2017-2018. TRR’s first bug assignment without human intervention happened in April 2019. [Objective] Our study evaluates the adoption of TRR within its industrial context at Ericsson, i.e., we provide lessons learned related to the productization of a research prototype within a company. Moreover, we investigate 1) how TRR performs in the field, 2) what value TRR provides to Ericsson, and 3) how TRR has influenced the ways of working. [Method] We conduct a preregistered industrial case study combining interviews with TRR stakeholders, minutes from sprint planning meetings, and bug-tracking data. The data analysis includes thematic analysis, descriptive statistics, and Bayesian causal analysis. [Results] TRR is now an incorporated part of the bug assignment process. Considering the abstraction levels of the telecommunications stack, high-level modules are more positive while low-level modules experienced some drawbacks. Most importantly, some bug reports directly reach low-level modules without first having passed through fundamental root-cause analysis steps at higher levels. On average, TRR automatically assigns 30% of the incoming bug reports with an accuracy of 75%. Auto-routed TRs are resolved around 21% faster within Ericsson, and TRR has saved highly seasoned engineers many hours of work. Indirect effects of adopting TRR include process improvements, process awareness, increased communication, and higher job satisfaction. [Conclusions] TRR has saved time at Ericsson, but the adoption of automated bug assignment was more intricate compared to similar endeavors reported from other companies. We primarily attribute the difference to the very large size of the organization and the complex products. Key facilitators in the successful adoption include a gradual introduction, product champions, and careful stakeholder analysis.
Software engineering (SE) research should be relevant to industrial practice. There have been regular discussions in the SE community on this issue since the 1980’s, led by pioneers such as Robert Glass. As we recently passed the milestone of “50 years of software engineering”, some recent positive efforts have been made in this direction, e.g., establishing “industrial” tracks in several SE conferences. However, many researchers and practitioners believe that we, as a community, are still struggling with research relevance and utility. The goal of this paper is to synthesize the evidence and experience-based opinions shared on this topic so far in the SE community, and to encourage the community to further reflect and act on the research relevance. For this purpose, we have conducted a Multi-vocal Literature Review (MLR) of 54 systematically-selected sources (papers and non peer-reviewed articles). Instead of relying on and considering the individual opinions on research relevance, mentioned in each of the sources, the MLR aims to synthesize and provide the “holistic” view on the topic. The highlights of our MLR findings are as follows. The top three root causes of low relevance, discussed in the community, are: (1) Researchers having simplistic views (or wrong assumptions) about SE in practice; (2) Lack of connection with industry; and (3) Wrong identification of research problems. The top three suggestions for improving research relevance are: (1) Using appropriate research approaches such as action-research; (2) Choosing relevant (practical) research problems; and (3) Collaborating with industry. By synthesizing all the discussions on this important topic so far, this paper aims to encourage further discussions and actions in the community to increase our collective efforts to improve the research relevance. Furthermore, we raise the need for empirically-grounded and rigorous studies on the relevance problem in SE research, as carried out in other fields such as management science.
Bug report assignment is an important part of software maintenance. In particular, incorrect assignments of bug reports to development teams can be very expensive in large software development projects. Several studies propose automating bug assignment techniques using machine learning in open source software contexts, but no study exists for large-scale proprietary projects in industry. The goal of this study is to evaluate automated bug assignment techniques that are based on machine learning classification. In particular, we study the state-of-the-art ensemble learner Stacked Generalization (SG) that combines several classifiers. We collect more than 50,000 bug reports from five development projects from two companies in different domains. We implement automated bug assignment and evaluate the performance in a set of controlled experiments. We show that SG scales to large scale industrial application and that it outperforms the use of individual classifiers for bug assignment, reaching prediction accuracies from 50 % to 89 % when large training sets are used. In addition, we show how old training data can decrease the prediction accuracy of bug assignment. We advice industry to use SG for bug assignment in proprietary contexts, using at least 2,000 bug reports for training. Finally, we highlight the importance of not solely relying on results from cross-validation when evaluating automated bug assignment.
Context: The term technical debt (TD) describes the aggregation of sub-optimal solutions that serve to impede the evolution and maintenance of a system. Some claim that the broken windows theory (BWT), a concept borrowed from criminology, also applies to software development projects. The theory states that the presence of indications of previous crime (such as a broken window) will increase the likelihood of further criminal activity; TD could be considered the broken windows of software systems. Objective: To empirically investigate the causal relationship between the TD density of a system and the propensity of developers to introduce new TD during the extension of that system. Method: The study used a mixed-methods research strategy consisting of a controlled experiment with an accompanying survey and follow-up interviews. The experiment had a total of 29 developers of varying experience levels completing system extension tasks in already existing systems with high or low TD density. Results: The analysis revealed significant effects of TD level on the subjects’ tendency to re-implement (rather than reuse) functionality, choose non-descriptive variable names, and introduce other code smells identified by the software tool SonarQube, all with at least 95% credible intervals. Coclusions: Three separate significant results along with a validating qualitative result combine to form substantial evidence of the BWT’s existence in software engineering contexts. This study finds that existing TD can have a major impact on developers propensity to introduce new TD of various types during development.
Quality requirements are vital to developing successful software products. However, there exist evidence that quality requirements are managed mostly in an “ad hoc” manner and down-prioritized. This may result in insecure, unstable, slow products, and unhappy customers. We have developed a conceptual model for the scoping process of quality requirements – QREME – and an assessment model – Q-REPM – for companies to benchmark when evaluating and improving their quality requirements practices. Our model balances an upfront forward-loop with a data-driven feedback-loop. Furthermore, it addresses both strategic and operational decisions. We have evaluated the model in a multi-case study at two companies in Sweden and three companies in The Netherlands. We assessed the scoping process practices for quality requirements and provided improvement recommendations for which practices to improve. The study confirms the existence of the constructs underlying QREME. The companies perform, in the median, 24% of the suggested actions in Q-REPM. None of the companies work data-driven with their quality requirements, even though four out of five companies could technically do so. Furthermore, on the strategic level, quality requirements practices are not systematically performed by any of the companies. The conceptual model and assessment model capture a relevant view of the quality requirements practices and offer relevant improvement proposals. However, we believe there is a need for coupling quality requirements practices to internal and external success factors to motive companies to change their ways of working. We also see improvement potential in the area of business intelligence for QREME in selecting data sources and relevant stakeholders.