SEAA 2021 Accepted Papers with Abstracts
Won't Somebody Please Think of the Tests? A Grounded Theory Approach to Industry Challenges in Continuous Practices
Abstract: Continuous integration and delivery are well established paradigms in the software development community. With these continuous practices come many challenges; while some of these challenges are immediate and well documented in literature, others may be revealed only after sustained application of these practices in large-scale and complex contexts. Based on researcher observations and interviews with 22 senior professionals from four companies, all with significant but varied experiences of continuous practices, we present a set of under-reported challenges with continuous practices observed in multiple industry settings. Through a grounded theory approach we construct the Tapco model — Test Automation Progression in Continuous Practices— capturing the patterns of how we observe these challenges emerge and how they may be prevented, derived from the interview responses. This model is then validated by presenting it to the four studied companies, operating in disparate industry segments, and to three additional industry cases, letting them evaluate its relevancy, accuracy and novelty. We find that the model provides industry professionals with essential guidance on how to avoid common pitfalls, as well as an understanding of their causes and possible remediation.
The MaLET Model – Maturity Levels for Exploratory Testing
Abstract: Based on multiple series of interviews and workshops, this paper presents the MaLET model – a representation of the typical evolution path for companies who successfully adopt exploratory testing. The model provides a step-by-step approach to systematically improve exploratory testing over time, and shows how and why capabilities may regress. The MaLET model was validated through a series of interviews with 20 interviewees from eight case study companies in separate industry segments. The interviews also revealed examples on improvement initiatives that had failed in the companies, showing that improvement initiatives tend to fail if they are not planned in an order corresponding to the maturity levels in the model. The MaLET model was well received by the interviewees during the validation, describing the model as sound, relevant and useful in practice.
Event Oriented vs Object Oriented Analysis for Microservice Architecture: An Exploratory Case Study
Abstract: The rapidly developing internet infrastructure together with the advances in software technology has enabled the development of cloud-based modern web applications that are much more responsive, flexible, and reliable compared to traditional monolithic applications. Such modern applications require new software design paradigms and architectures. Microservice-based architecture (MSbA), which aims to create small, isolated, loosely-coupled applications that work in cohesion, becoming widespread as one of these approaches. MSbA allows the developed applications to be deployed and maintained separately, as well as scaled on demand. However, there is no de facto method for the analysis and design of systems for these architectures. In this paper, we compared the usefulness of the object-oriented (OO) and event-oriented (EO) approaches for analyzing and designing MS-based systems. More specifically, we performed an exploratory case study to analyze, design, and implement a software application dealing with the ‘application and evaluation process of graduate students at IzTech’. This paper discusses the results of this case study. We observe that the EO approaches have significant advantages with respect to the OO approaches.
Combining CNN with DS^3 for Detecting Bug-proneModules in Cross-version Projects
Abstract: The paper focuses on Cross-Version Defect Prediction (CVDP) where the classification model is trained on information of the prior version and then tested to predict defects in the components of the last release. To avoid the distribution differences which could negatively impact the performances of machine learning based model, we consider Dissimilarity-based Sparse Subset Selection (DS^3) technique for selecting meaningful representatives to be included in the training set. Furthermore, we employ a Convolutional Neural Network (CNN) to generate structural and semantic features to be merged with the traditional software measures to obtain a more comprehensive list of predictors.
To evaluate the usefulness of our proposal for the CVDP scenario, we perform an empirical study on a total of 20 cross-version pairs from 10 different software projects. To build prediction models we consider Logistic Regression (LR) and Random Forest (RF) and we adopt 3 evaluation criteria (i.e., G-mean, Balance, F-measure) to assess the prediction accuracy. Our results show that the use of CNN with both LR and RF models has a significant impact, with an improvement of ca 20% for each evaluation criteria. Differently, we notice that DS^3 does not impact significantly in improving prediction accuracy.
Are 20% of Classes Responsible for 80% of Refactorings? (Short Paper)
Abstract: The 80-20 rule is well-known in the real-world. When applied to bugs, it suggests that 80% of bugs arise in just 20% of classes. One research question that has yet to be explored is whether the same rule applies to refactoring activity. In other words, do 20% of classes account for 80% of refactorings applied to a system? In this short paper, we explore this question using data from seven open-source systems drawn from two previous studies. In each case, we explore whether the 80-20 rule applies and suggest why. Results showed limited evidence of an 80-20 rule; in the two systems where it was evident, the refactoring profile implied firstly, a large-scale movement of class fields and methods and, secondly, the deliberate aim of collapsing the class hierarchy using inheritance-based refactorings.
Using Natural Language Processing to Build Graphical Abstracts to be used in Studies Selection Activity in Secondary Studies
Abstract: Context: Secondary studies, as Systematic Literature Reviews (SLRs) and Systematic Mappings (SMs), have been providing methodological and structured processes to identify and select research evidence in Computer Science, especially in Software Engineering (SE). One of the main activities of a secondary study process is to read the abstracts to decide on including or excluding studies. This activity is considered costly and time-consuming. In order to speed up the selection activity, some alternatives, such as structured abstracts and graphical abstracts (e.g., Concept Maps - CMs), have been proposed.
Objective: This study presents an approach to automatically build CMs using Natural Language Processing (NLP) to support the selection activity of secondary studies.
Method: First, we proposed an approach composed by two pipelines: (1) perform the triple extraction of concept-relation-concept based on NLP; and (2) attach the extracted triples in a structure used as a template to scientific studies. Second, we evaluated both pipelines conducting experiments.
Results: The preliminary evaluation revealed that CMs extracted are coherent when compared with their source text.
Conclusions: NLP can assist the automatic construction of CMs. In addition, the experiment results show that the approach can be useful to support researchers in the selection of studies in the selection activity of secondary studies.
An Approach for Ranking Feature-based Clustering Methods and its Application in Multi-System Infrastructure Monitoring
Abstract: Companies need to collect and analyze time series data to continuously monitor the behavior of software systems during operation. However, gaining insights into common data patterns in time series is challenging, in particular, when analyzing data concerning different properties and from multiple systems. Surprisingly, clustering approaches have been hardly studied in the context of monitoring data, despite their possible benefits. In this paper, we present a feature-based approach to identify clusters in unlabeled infrastructure monitoring data collected from multiple independent software systems. We introduce time series properties which are grouped into feature sets and combine them with various unsupervised machine learning models to find the methods best suited for our clustering goal. We thoroughly evaluate our approach using two large-scale, industrial monitoring datasets. Finally, we apply one of the top-ranked methods to thousands of time series from hundreds of software systems, thereby showing the usefulness of our approach.
StalkCD: A Model-Driven Framework for Interoperability and Analysis of CI/CD Pipelines
Abstract: Today, most Continuous Integration and Delivery (CI/CD) solutions use infrastructure as code to describe the pipeline-based build and deployment process. Each solution uses its own format to describe the CI/CD pipeline, which hinders the interoperability and the analysis of CI/CD pipelines. In this paper, we propose a model-driven framework for tool-agnostic CI/CD pipeline definition and analysis. It comprises (i) the analysis of the meta-model of the Jenkins pipeline definition language, (ii) the StalkCD domain-specific language providing a base for interoperability and transformation between different formats, and (iii) an extensible set of bidirectional transformations between tool-specific CI/CD definitions and analysis tools. We demonstrate the specific support for Jenkins as a CI/CD tool and BPMN for exploiting analyses from the workflow domain and visualizing the results. We evaluate the DSL and the transformations empirically based on more than 1,000 publicly available Jenkinsfiles. The evaluation shows that our framework supports 70% of these files without information loss.
Parameter tuning for a Markov-based multi-sensor system
Abstract: Multi-sensor systems are the key components of automated driving functions. They enhance the quality of the driving experience and assisting in preventing traffic accidents. Due to the rapid evolution of sensor technologies, sensor data collection errors occur rarely. Nonetheless, according to Safety Of The Intended Functionality (SOTIF), an erroneous interpretation of the sensor data can also cause safety hazards. For example the front-camera may not understand the meaning of a traffic sign. Due to safety concerns it is essential to analyze the system reliability throughout the whole development process. In this work, we present an approach to explore the sensor's features, such as the dependencies between successive sensor detection errors and the correlation between different sensors on the KITTI dataset quantitatively. Besides, we apply the learned parameters to a proven multi-sensor system model, which is based on Discrete-time Markov chains, to estimate the reliability of a hypothetical Stereo camera-LiDAR based sensor system.
An investigation on the availability of contribution information in Open Source Projects
Abstract: Open Source projects commonly receive new feature requests from different types of users. However, the submission of new feature requests and the processes adopted for handling them is not always clear.
In this work, we aim at investigating the availability of the contribution information, and in particular on the new feature requests, on 66 out of the 100 most starred GitHub projects. We examined the contribution guidelines and other documentation from those 66 projects. We particularly searched for whether the projects openly welcomed new contributions, such as feature requests.
Our finding shows that, even the most starred GitHub projects are often not reporting information on how to contribute and, in particular, how new feature requests are managed.
Establishing a Search String to Detect Secondary Studies in Software Engineering
Abstract: Context: Before conducting a secondary study, a tertiary study can be performed to identify related reviews on the topic of interest and avoid rework. However, the elaboration of an appropriate and effective search string to detect secondary studies is challenging for SE researchers. Objective: The main goal of this study is to propose a suitable search string to detect secondary studies in Software Engineering (SE), addressing issues such as quantity of applied terms, relevance and recall. Method: We analyzed seven tertiary studies under two perspectives: (1) structure - strings' terms to detect secondary studies; and (2) field: where searching - titles alone or abstracts alone or titles and abstracts together, among others. We validate our string by performing a two-step validation process. First, we evaluated the capability to retrieve secondary studies over a set of 1537 secondary studies included in 24 tertiary studies in SE. Secondly, we evaluated the general capacity of retrieving secondary studies over an automated search using the Scopus digital library. Results: Our string was capable to retrieve an optimum value over 90\% of the included secondary studies (recall) with a high general precision of almost 60%. Conclusion: The suitable search string for finding secondary studies in SE contains the terms "systematic review", "literature review", "systematic mapping", "mapping study" and "systematic map".
NLP4IP: A Natural Language Processing-based Recommendation Approach for Issues Prioritization
Abstract: In this paper, we propose a recommendation approach for issues (e.g., a story, a bug, or a task) prioritization based on natural language processing, called NLP4IP. The proposed semi-automatic approach takes into account the priority and story points attributes of existing issues defined by the project stakeholders and devises a recommendation model capable of dynamically predicting the rank of newly added or modified issues. NLP4IP was evaluated on 19 projects from 6 repositories employing the JIRA issue tracking software with a total of 29,698 issues. The results of the study showed an average top@3 accuracy of 81% and mean squared error of 2.2 when evaluated on the validation set. The applicability of the proposed approach is demonstrated in the form of a JIRA plug-in illustrating predictions made by the newly developed machine learning model. The dataset has also been made publicly available in order to support other researchers working in this domain.
Ontology-Based Software Graphs for Supporting Code Comprehension During Onboarding
Abstract: Software engineers in modern development settings often face the challenge of contributing to large existing projects. The comprehension of foreign software code presents a time consuming obstacle, especially in contexts like onboarding. New employees have little knowledge of the software project they are supposed to contribute to. Therefore, tools supporting developers with their code comprehension are desirable to help them contribute to the best of their ability as soon as possible. Such tools must be flexible enough to work with any software project, while offering means for adjustments to very specific tasks.
In this paper, we present an approach to visualize source code as node-link diagrams, using expert-designed ontologies to group various source code elements such as classes or methods. We evaluate our approach with nine advanced computer science students simulating an onboarding in a software project consisting of almost 100k lines of code. The results show that our approach supports code comprehension by utilizing expert knowledge of the visualized project, while also pointing to other use cases such as legacy code migration.
Technical Debt Impacting Lead-Times: An Exploratory Study
Abstract: Background: Technical Debt is a consolidated notion in software engineering research and practice. However, the estimation of its impact (interest of the debt) is still imprecise and requires heavy empirical and experimental inquiry.
Objective: We aim at developing a data-driven approach to calculate the interest of Technical Debt in terms of delays in resolving affected tasks.
Method: We conducted a case study to estimate the Technical Debt interest by analyzing its association with the lead time variation of resolving related Jira issues.
Results: Data-driven approaches could significantly change the Technical Debt estimation and improve the removing Technical Debt prioritization. Our case study shows that the presence of Code Technical Debt did not affect the lead time for resolving the issues.
Conclusion: Future works include the further refinement of this approach and its application to a larger data-set and on different type of issues.
A Systematic Mapping of Negative Effects of Gamification in Education/Learning Systems
Abstract: While most research shows positive effects of gamification, the focus on its adverse effects is considerably smaller. Having this in mind, we conducted a systematic mapping study of the negative effects of game design elements on education/learning systems. The study revealed 77 papers reporting undesired effects of game design elements. We found that badges, leaderboards, and points are the game design elements most often reported as causing negative effects. The most cited negative effects concern the lack of effect, lack of engagement, and worsened performance. Motivational and ethical issues, such as cheating, are also often reported. As part of our results, we map the relations between game design elements and the negative effects that they may cause. The results of our mapping study can help gamification designers to make more informed decisions when selecting game design elements to be included in education/learning systems, raising awareness on potential negative effects.
It takes a Flywheel to Fly: Kickstarting and Growing the A/B testing Momentum at Scale
Abstract: Companies run A/B tests to accelerate innovation and make informed data-driven decisions. At Microsoft alone, over twenty thousand A/B tests are ran each year helping decide which features maximize user value. Not all teams and companies succeed in establishing and growing their A/B testing programs. In this paper, we explore multiple-case studies at Microsoft, Outreach, Booking.com, and empirical data collected, and share our learnings for iteratively adopting and growing A/B testing. The main contribution of this paper is the A/B Testing Flywheel. This conceptual model illustrates iteratively navigating the value-investment cycle with the goal to scale A/B testing. In every turn of the flywheel, teams need to invest in order to increase the A/B testing momentum. We describe the investments in software development processes that have been advantageous in getting the flywheel to turn. We also share example metrics that track the progress towards sustainable A/B testing momentum.
The Shift in Remote Working: A Comparative Study in Stack Overflow
Abstract: As the industry is moving towards digitalized solutions and practices, a shift from traditional to remote working has been observed, with companies embracing flexibility for their workforce. Global crises, such as the coronavirus pandemic, have also accelerated this process, transforming the labor market. This trend is reflected in job portals, that contain an increasing number of remote job advertisements. Recognizing this growing change, we perform a thorough study in Stack Overflow, to examine the main characteristics of remote working that discriminate it from its on-site counterpart. By collecting and analyzing 8514 job posts and leveraging text mining and graph theory methodologies, we attempt to pinpoint the primary elements that define each category, from dominant technologies to job positions and top seeking industries. The findings suggest that remote working is indeed steadily gaining ground, being mainly associated with the Software Development sector and with well-known software construction and data analytics technologies.
A Systematic Study as Foundation for a Variability Modeling Body of Knowledge
Abstract: In software product line engineering, engineers and researchers use variability models to explicitly represent commonalities and variability of software systems to foster systematic reuse. Variability modeling has been a field of extensive research for over three decades, including Systematic Literature Reviews (SLRs) and Systematic Mapping Studies (SMSs) to categorize and compare different approaches. Much effort goes into such (secondary) studies, partly because they are often done from scratch and searching for relevant studies for specific research questions is tedious. Systematic reuse of search results would benefit the community by improving the efficiency and quality of such studies. In this paper, we report on creating a curated data set of 78 key SLR/SMS publications and primary studies (e.g., surveys) on variability modeling by conducting a tertiary SMS on variability modeling. When using such a curated paper data set for a secondary study, we estimate researchers can save, up to 50 percent effort in the search phase. We present the publicly available data set, which includes categorization of the studies and provides update mechanisms. We see our data set as a foundation for building a Variability Modeling Body of Knowledge (VMBoK). We illustrate the efficient use of the data set in two SLR examples. We argue that our process and the data set can be useful for various research communities to improve the efficiency and quality of secondary (and tertiary) studies.
Requirements Engineering for Machine Learning: A Systematic Mapping Study
Abstract: Machine learning (ML) has become a core feature for today’s real-world applications, making it a trending topic for the software engineering community. Requirements Engineering (RE) is no stranger to this and its main conferences have included workshops aiming at discussing RE in the context of ML. How- ever, current research on the intersection between RE and ML mainly focuses on using ML techniques to support RE activities rather than on exploring how RE can improve the development of ML-based systems. This paper concerns a systematic mapping study aiming at characterizing the publication landscape of RE for ML-based systems, outlining research contributions and contemporary gaps for future research. In total, we identified 35 studies that met our inclusion criteria. We found several different types of contributions, in the form of analyses, approaches, checklists and guidelines, quality models, and taxonomies. We discuss gaps by mapping these contributions against the RE topics to which they were contributing and their type of empirical evaluation. We also identified quality characteristics that are particularly relevant for the ML context (e.g., data quality, explainability, fairness, safety, and transparency). Main reported challenges are related to the lack of validated RE techniques, the fragmented and incomplete understanding of NFRs for ML, and difficulties in handling customer expectations. There is a need for future research on the topic to reveal best practices and to propose and investigate approaches that are suitable to be used in practice.
ICARUS - Incremental Design and Verification of Software Updates in Safety-Critical Product Lines
Abstract: The lifecycles of software updates for Cyber Physical Systems are significantly decreasing. Especially for safety-critical functions, these must be carefully tested for compatibility to target configurations. In order to formalize the requirements
of the system and to validate software changes in a modular way, contract-based design can be used for formal verification. A contract is defined as a pair of an assumption describing the required conditions for the working environment of a component, and a guarantee, which specifies its expected behavior including timing properties and value ranges of interfaces. In this work, we present a concept for efficient verification of a software update in a contract-based development environment with consideration of several system variants. The concept is based on an incremental refinement verification methodology which uses deltas, i.e. differences between variants, to automatically propagate changes and retest only the incrementally relevant contracts. By applying the methodology in a case study for a network representing a variable Adaptive Cruise Control system, we could demonstrate its applicability and its advantages in reducing the total verification effort for product line evolution.
Assessing Coding Metrics for Parallel Programmingof Stream Processing Programs on Multi-cores
Abstract: From the popularization of multi-core architectures, several parallel APIs (Application Programming Interfaces) have emerged, helping to abstract the programming complexity and increasing productivity in application development. Unfortunately, only a few research efforts in this direction managed to show the usability pay-back of the programming abstraction created, because it is not easy and poses many challenges for conducting empirical software engineering. We believe that coding metrics commonly used in software engineering code measurements can give useful indicators on the programming effort of parallel applications and APIs. These metrics were designed for general purposes, without considering the evaluation of applications from a specific domain. In this study, we aim to evaluate the feasibility of seven coding metrics to be used in the parallel programming domain. To do so, five stream processing applications implemented with different parallel APIs for multicores were considered. Our experiments have shown COCOMO II a promising model for evaluating the productivity of different parallel APIs for stream processing applications on multi-cores
while other metrics are restricted to the code size.
Reducing incidents in microservices by repaying Architectural Technical Debt
Abstract: Introduction: Architectural technical debt (ATD) may create a substantial extra effort in software development, which is called interest. There is little evidence about whether repaying ATD in microservices reduces such interest.
Objectives: We wanted to conduct a first study on investigating the effect of removing ATD on the occurrence of incidents in a microservices architecture.
Method: We conducted a quantitative and qualitative case study of a project with approximately 1000 microservices in a large, international financing services company. We measured and compared the number of software incidents of different categories before and after repaying ATD.
Results: The total number of incidents was reduced by 84%, and the numbers of critical- and high-priority incidents were both reduced by approximately 90% after the architectural refactoring. The number of incidents in the architecture with the ATD was mainly constant over time, but we observed a slight increase of low priority incidents related to inaccessibility and the environment in the architecture without the ATD.
Conclusion: This study shows evidence that refactoring ATDs, such as lack of communication standards, poor management of dead-letter queues, and the use of inadequate technologies in microservices, reduces the number of critical- and high-priority incidents and, thus, part of its interest, although some low priority incidents may increase.
A Systematic Mapping Study on Edge Computing Approaches for Maritime Applications
Abstract: Background: The edge computing paradigm allows to reduce latency and response time of applications by bringing computations and data storage closer to the locations where they are needed. Edge computing is used in different kinds of Internet of Things (IoT) applications. Maritime represents an important application domain for IoT applications and edge computing solutions. Modern vessels employ many different types of sensors, which produce a massive amount of data. Edge computing allows to perform computations and data analyses on-board a vessel or at the edge of the network.
Objective: To present a comprehensive, unbiased overview of the state-of-the-art on edge computing approaches for maritime applications.
Method: A Systematic Mapping Study (SMS) of the existing edge computing approaches for maritime applications.
Results: A taxonomy of 17 papers on edge computing approaches for maritime applications.
Conclusion: The results of the study show that there is a small number of existing edge computing approaches for maritime applications. Most of the existing approaches focus mainly on monitoring and communication functions in vessels. Moreover, several research gaps exist with respect to the types of edge computing approaches, the purposes of using edge computing on vessels, and the data analysis techniques used for edge computing on vessels.
Automated Support for Searching and Selecting Evidence in Software Engineering: A Cross-domainSystematic Mapping
Abstract: Context: Searching and selecting relevant primary evidence is crucial to answering secondary studies' research question(s). The activities of search and selection of studies are labor-intensive, time-consuming and demand automation support.
Objective: Our goal is to identify, classify and evaluate the start of the art on automation support for search and selecting evidence for secondary studies in SE.
Method: We performed a systematic mapping on existing automating support to the activities of search and selection of evidence for secondary studies in SE, expanding our investigation in a cross-domain study addressing advancements from the medicine field.
Results: Our results show that the SE field has a variety of tools and text classification (TC) approaches to automate search and selection activities. However, medicine has more well-established tools with a larger adoption than SE. Cross-validation and case studies are the most adopted approaches to assess TC approaches. Furthermore, recall, precision and F-measure are the most adopted metrics.
Conclusion: Automated approaches for searching and selection studies in SE have not been applied in practice by researchers. Integrated and easy-to-use automated approaches addressing consolidated TC techniques can bring relevant advantages on workload and time saving for SE researchers who conduct secondary studies.
Size matters? Or not: A/B testing with limited sample in automotive embedded software
Abstract: A/B testing is gaining attention in the automotive sector as a promising tool to measure casual effects from software changes. Different from the web-facing businesses, where A/B testing has been well-established, the automotive domain often suffers from limited eligible users to participate in online experiments. To address this shortcoming, we present a method for designing balanced control and treatment groups so that sound conclusions can be drawn from experiments with considerably small sample sizes. While the Balance Match Weighted method has been used in other domains such as medicine, this is the first paper to apply and evaluate it in the context of software development. Furthermore, we describe the Balance Match Weighted method in detail and we conduct a case study together with an automotive manufacturer to apply the group design method in a fleet of vehicles. Finally, we present our case study in the automotive software engineering domain, as well as a discussion on the benefits and limitations of the A/B group design method.
A Method for Modeling Data Anomalies in Practice
Abstract: As technology has allowed us to collect large amounts of industrial data, it has become critical to analyze and understand the data collected. In particular, anomaly analysis allows to detect, analyze and understand anomalous or unusual
data patterns. This is an important activity to understand, for
example, deviations in service which may indicate potential
problems, or differing customer behavior which may reveal
new business opportunities. Much previous work has focused on
anomaly detection, in particular using machine learning. Such
approaches allow clustering data patterns by common attributes,
and although useful, clusters often do not correspond to the
root causes of anomalies, meaning that more manual analysis
is needed. In this paper, we report on a design science study
with two different teams in a partner company which focuses
on modeling and understanding the attributes and root causes
of data anomalies. After iteration, for each team, we have
created general and anomaly-specific UML class diagrams and
goal models to capture anomaly details. We use our experiences
to create an example taxonomy, classifying anomalies by their
root causes, and to create a general method for modeling and
understanding data anomalies. This work paves the way for a
better understanding of anomalies and their root causes, leading
towards creating a training set which may be used for machine
Self-adaptive K8S Cloud Controller for Time-sensitive Applications
Abstract: The paper presents a self-adaptive Kubernetes (K8S) cloud controller for scheduling time-sensitive applications.
The controller allows services to specify timing requirements (response time or throughput) and schedules services on shared cloud resources so as to meet the requirements.
The controller builds and continuously updates an internal performance model of each service and uses it to determine the kind of resources needed by a service, as well as predict potential contention on shared resources, and (re-)deploys services accordingly.
The controller is integrated with our highly-customizable data processing and visualization platform IVIS, which provides a web-based front-end for service deployment and visualization of results.
The controller implementation is open-source and is intended to provide an easy-to-use testbed for experiments focusing on various aspects of adaptive scheduling and deployment in the cloud.
AF-DNDF: Asynchronous Federated Learning of Deep Neural Decision Forests
Abstract: In recent years, with more edge devices being put into use, the amount of data that is created, transmitted and stored is increasing exponentially. Moreover, due to the development of machine learning algorithms, modern software-intensive systems are able to take advantage of the data to further improve their service quality. However, it is expensive and inefficient to transmit large amounts of data to a central location for the purpose of training and deploying machine learning models. Data transfer from edge devices across the globe to central locations may also raise privacy and concerns related to local data regulations. As a distributed learning approach, Federated Learning has been introduced to tackle those challenges. During the training procedure, as Federated Learning will only exchange locally trained machine learning models instead of the whole data set, the technique can not only preserve user data privacy but also enhance model training efficiency. In this paper, we have investigated an advanced machine learning algorithm, Deep Neural Decision Forests (DNDF), which unites classification trees with the representation learning functionality from deep convolutional neural networks. In this paper, we propose a novel algorithm, AF-DNDF which extends DNDF with an asynchronous federated aggregation protocol. Based on the local quality of each classification tree, our architecture can select and combine the optimal groups of decision trees from multiple local devices. The introduction of the asynchronous protocol enables the algorithm to be deployed in the industrial context with heterogeneous hardware settings. Our AF-DNDF architecture is validated in an automotive industrial use case focusing on road objects recognition and demonstrated by an empirical experiment with two different data sets. The experimental results show that our AF-DNDF algorithm significantly reduces the communication overhead and accelerates model training speed without sacrificing model classification performance. The algorithm can reach the same classification accuracy as the commonly used centralized machine learning methods but also greatly improve local edge model quality.
Reliability in Software-intensive Systems: Challenges, Solutions, and Future Perspectives
Abstract: Large software-intensive systems have emerged in several application domains, such as healthcare, transportation, smart environments, and Industry 4.0. Sometimes referred to as Systems-of-Systems (SoS) and resulted from the integration of various heterogeneous and independent software systems, such large systems lead to high dynamism and often operate in critical and uncertain environments. Hence, the reliability of these systems must be a concern, but traditional reliability approaches fail to cope with such high dynamism and uncertainty. Thus, new solutions are required to ensure the SoS reliability. Moreover, there is a lack of studies that investigate SoS reliability. This paper presents the state of the art on the way that the reliability of SoS has been addressed. After investigating the literature, we selected 27 studies to perform a detailed analysis regarding factors that affect the SoS reliability and approaches and metrics to improve it. We found an area still gaining maturity with researchers working in isolation and mainly developing solutions for domain-specific problems. There are still various critical open issues, while SoS have been increasingly adopted as a suitable and integrated solution in critical domains.
Success Factors when Transitioning to Continuous Deployment in Software-Intensive Embedded Systems
Abstract: Continuous Deployment is the practice to deploy software more frequently to customers and learn from their usage. The aim is to introduce new functionality and features in an additive way to customers as soon as possible. While Continuous Deployment is becoming popular among web and cloud-based software development organizations, the adoption of continuous deployment within the software-intensive embedded systems industry is still limited.
In this paper, we conducted a case study at a multinational telecommunications company focusing on the Third Generation Radio Access Network (3G RAN) embedded software. The organization has transitioned to Continuous Deployment where the software's deployment cycle has been reduced to 4 weeks from 24 weeks. The objective of this paper is to identify what does success means when transitioning to continuous deployment and the success factors that companies need to attend to when transitioning to continuous deployment in a large-scale embedded software.
On the impact of Performance Antipatterns in multi-objective software model refactoring optimization
Abstract: Software quality estimation is a challenging and time-consuming activity, and models are crucial to face the complexity of such activity on modern software applications.
One main challenge is that the improvement of distinctive quality attributes may require contrasting refactoring actions on an application, as for trade-off between performance and reliability. In such cases, multi-objective optimization can provide the designer with a wider view on these trade-offs and, consequently, can lead to identify suitable actions that take into account independent or even competing objectives.
In this paper, we present an approach that exploits the \nsga multi-objective evolutionary algorithm to search optimal Pareto solution frontiers for software refactoring while considering as objectives: i) performance variation, ii) reliability, iii) amount of performance antipatterns, and iv) architectural distance. The algorithm combines randomly generated refactoring actions into solutions (i.e., sequences of actions) and compares them according to the objectives.
We have applied our approach on a train ticket booking service case study, and we have focused the analysis on the impact of performance antipatterns on the quality of solutions. Indeed, we observe that the approach finds better solutions when antipatterns enter the multi-objective optimization. In particular, performance antipatterns objective leads to solutions improving the performance by up to 15% with respect to the case where antipatterns are not considered, without waiving the solution quality on other objectives.
Towards a Taxonomy of Bug Tracking Process Smells: A Quantitative Analysis
Abstract: Bug tracking is the process of monitoring and reporting malfunctions or issues found in software. While there is no consensus on a formally specified bug tracking process, some certain rules and best practices for an optimal bug tracking process are accepted by many companies and open-source software (OSS) projects. The primary aim of all these rules and practices is to perform a more efficient bug tracking procedure, despite slight variations between different platforms. Practitioners' non-compliance with the best practices not only impedes the benefits of the bug tracking process but also negatively affects the other phases of the life cycle of software development.
In this study, based on the results of a multivocal literature review, we analyzed 60 sources in academic and gray literature and propose a taxonomy of 12 bad practices in the bug tracking process, that is bug tracking process smells. To quantitatively analyze these process smells, we inspect bug reports collected from six projects. Among these projects, four of them are Jira-based (MongoDB Core Server, Evergreen, Confluence Server & Data Center, Jira Server & Data Center) and the other two are Bugzilla-based (GCC and Wireshark). We observed that a considerable amount of bug tracking process smells exist in all projects with varying ratios.
Aspect-Oriented Adaptation of Access Control Rules
Abstract: Cyber-physical systems (CPS) and IoT systems are nowadays commonly designed as self-adaptive, endowing them with the ability to dynamically reconfigure to reflect their changing environment. This adaptation concerns also the security, as one of the most important properties of these systems. Though the state of the art on adaptivity in terms of security related to these systems can often deal well with fully anticipated situations in the environment, it becomes a challenge to deal with situations that are not or only partially anticipated. This uncertainty is however omnipresent in these systems due to humans in the loop, open-endedness and only partial understanding of the processes happening in the environment.
In this paper, we partially address this challenge by featuring an approach for tackling access control in face of partially unanticipated situations. We base our solution on special kind of aspects that build on existing access control system and create a second level of adaptation that addresses the partially unanticipated situations by modifying access control rules.
The approach is based on our previous work where we have analyzed and classified uncertainty in security and trust in such systems and have outlined the idea of access-control related situational patterns. The aspects that we present in this paper serve as means for application-specific specialization of the situational patterns. We showcase our approach on a simplified but real-life example in the domain of Industry 4.0 that comes from one of our industrial projects.
From Setting Up Innovation in a Novel ContextTo Discovering Sustainable Business — A Framework for Short-Term Events
Abstract: Short-term events, like hackathons, are commonly
applied to innovate on novel subject areas and on groundbreaking
technologies. While such events have led to significant
successful outcomes for many companies, the big questions facing
organisers of these events are: "where do we start?, what
should be done towards the hosting of a successful event?, and
what should be done when the event is completed?" While
several methods and supporting recommendations have been
introduced for answering these questions, a common framework
for the hosting and management of short-term events is still not
available. This is a limitation, as having such a framework would
allow organisers to carry out systematic analysis for the subject
area, identify the most suitable pre-event analysis and support
functions, combine these with the fitting and justified short-term
event types, and carry the results of the event to fruitful, sustainable
businesses with post-event operations. The opportunity to
progress events into business ventures is particularly noteworthy
given the rapid changes that are typical in the technology
domain, and thus, the need to be both innovative and nimble
in responding to such changes and ensuing opportunities. To fill
this gap, we introduce an evidence-driven framework for hosting
short-term events. Our framework presents several chronological
modules coupled together to form a comprehensive, but agile
set of practices. A review of the state-of-the-art and a partial
empirical trial support the framework’s utility, notwithstanding
the need for further work to enhance and trial the framework
for organising other events.
Automated quality assessment of interrelated modeling artifacts
Abstract: Over the last decade, several repositories have been proposed by the MDE community to enable the reuse of modeling artifacts and foster empirical studies to analyze specifications and tools made available by MDE researchers and practitioners. In this respect, different approaches have been proposed to measure the quality of, e.g., models, metamodels, and transformations, with respect to characteristics defined by quality models. However, when a modeling ecosystem is available, measuring the constituting artifacts singularly might not be enough. This paper proposes a quality assessment approach, which considers the relationships among the artifacts under analysis as part of the quality measurement process. For instance, to assess the quality of model transformations, further than measuring their structural characteristics, users might be interested in quality aspects like coverage and information loss related to the depending metamodels and the way models are consumed by transformations, respectively. The proposed approach is based on weaving models, which permit to link quality definitions of different kinds of artifacts, and it can generate EOL programs by means of a model-to-code transformation to perform the specified quality assessment process.
Understanding the Impact of Edge Cases from Occluded Pedestrians for ML Systems
Abstract: Machine Learning (ML)-enabled approaches are considered to substantially support the
detection of obstacles and traffic participants around a self-driving vehicle. Major
breakthroughs have been demonstrated -- even covering the complete end-to-end data
processing chain from perception, planning, to the inputs for the control algorithms
for steering, braking, and accelerating a vehicle. State-of-the-art networks like
YOLO (you-only-look-once) provide bounding boxes around detected objects including
a classification and a confidence level that can be used for trajectory planning
algorithms for example. The typical training for such neural networks (NN) bases on
high-quality annotations to provide the ground truth data and such annotations typically
cover the complete perimeter of an object. However, if traffic participants like
pedestrians or bicyclists are partially occluded because they are carrying objects for
example, ML-enabled systems trained on full perimeter information may be challenged and
the reported confidence levels might drop or relevant objects may be missed entirely in
the worst case. In this paper, we investigate the impact of systematically challenging
a NN by feeding edge cases caused by partial occlusions to pedestrians in video frames
from the KITTI dataset, for example making pedestrians carrying paper boxes. To
systematically study the effects on the performance of the NN, we firstly trained three
variants on the unmodified KITTI data: (a) Using the full annotations of the complete
body from a pedestrian, (b) using only its upper half annotation, and (c) using only
its lower half annotation. Next, we manipulated the KITTI images by partially
occluding a pedestrian's body with overlay images such as of paper boxes or walls. Then,
we let the three NN predict what they still detect in such frames and compared their
results with the original manual labels as well as their respective performance with each
other. Our findings show that the two NN using only partial information perform similarly
well like the NN for the full body when th full body NN's performance is 0.75 or better.
Furthemore and as expected, the network, which is only trained on the lower half body is
least prone to disturbances from occlusions of the upper half and vice versa. Hence,
we conclude that the training of such ML-enabled systems should firstly also be adjusted
to be more robust to such disturbance effects in practice, secondly, an ML-enabled
system may contain a cascade of multiple NN that are all trained differently to consult
in cases of low confidence of the primary NN, and thirdly, separately trained NN also
contribute to the explainability of NN to get indications why there is a drop in the detection
What Added Value Does a Scrum Master Bring to the Organisation? — A Case Study at Nordea
Abstract: With an increasing number of companies choosing to implement Scrum, the role of the Scrum Master seems to change from how it is described in the Scrum Guide. Companies have challenges in hiring skilled Scrum Masters, and seeing the value of this role after the adoption phase is over. This paper aspires to understand the value of this role by investigating the role in our case company, Nordea, while studying their experiment of connecting one Scrum Master to multiple teams.
We collected data by 11 semi-structured interviews and six observation sessions. Our findings show, that the real value of a Scrum Master is based on their ability to understand people. They contribute to team dynamics and well-being, help the team to connect and coordinate and challenge the team to higher performance. Finally, connecting an experienced Scrum Master to two teams can lead to improved knowledge-sharing, cooperation, and alignment, but adding the number of teams to higher than that may risk the value this role provides.
Predicting Software Defect Severity Level using Sentence Embedding and Ensemble Learning
Abstract: Bug tracking is one of the prominent activities during the maintenance phase of software development. The severity of the bug acts as a key indicator of its criticality and impact towards planning evolution and maintenance of various types of software products. This indicator measures how negatively the bug may affect the system functionality. This helps in determining how quickly the development team need to address the bug for successful execution of the software system. Due to a large number of bugs reported every day, the developers find it really difficult to assign the severity level to bugs accurately. Assigning incorrect severity level results in delaying the bug resolution process. Thus automated systems were developed which will assign a severity level using various machine learning techniques. In this work, five different types of sentence embedding techniques have been applied on bugs description to convert the description comments to an n-dimensional vector. These computed vectors are used as an input of the software defect severity level prediction models and ensemble techniques like Bagging, Random Forest classifier, Extra Trees classifier, AdaBoost, and Gradient Boosting have been used to trained these models. We have also considered different variants of Synthetic Minority Oversampling Technique (SMOTE) to handle the class imbalance problem as the considered datasets are not evenly distributed. The experimental results on six projects highlight that the usage of sentence embedding, ensemble techniques, and different variants of SMOTE techniques helps in improving the predictive ability of defect severity level prediction models.
programming languages, used for a variety of purposes.
third-party components to acquire various functionalities. In
this paper we isolate popular reused components and explore
the type of functionality that is mostly being reused.
Additionally, we examine whether the client applications
adopt to the most recent versions of the reused components,
and further study the reuse intensity of pairs of components
that coexist in client applications. For this purpose, we
performed a case study on 9389 components reused by 430
that Compiler and Testing Units are the most common types of
functionality being reused, while the majority of client
applications tend to adopt the recent versions of the reused
Software Ecosystems Governance – An Analysis of SAP and GNOME Platforms
Abstract: Software ecosystems play an important strategy in the IT industry. They involve the interaction of a set of actors on a common technological platform that results in a number of software solutions or services. Software ecosystems face intense competition and scrutiny of stakeholders and society as a whole. Therefore, they need to be aligned with principles of ethics, accountability and transparency. The governance of software ecosystems presents strategic procedures and processes to control, maintain, and evolve the platform. This article presents the application of a conceptual model for software ecosystem governance in the context of GNOME and SAP platforms. The case studies results suggest that governance of these platforms involve the need to manage software licenses, to define the type of collaboration in relation to governance (shared or closed), and to address social, technical and business risks in software ecosystems.
A Systematic Mapping Study on the Use of Software Engineering Practices to Develop MVPs
Abstract: [Background] Many startup environments and even traditional software companies have embraced the use of MVPs (Minimum Viable Products) to allow quickly experimenting solution options. The MVP concept has influenced the way in which development teams apply Software Engineering~(SE) practices. However, the overall understanding of this influence of MVPs on SE practices is still poor.
[Objective] Our goal is to characterize the publication landscape on practices that have been used in the context of software MVPs.
[Method] We conducted a systematic mapping study using a hybrid search strategy that consists of a database search and parallel forward and backward snowballing.
[Results] We identified 33 papers, published between 2013 and 2020. We observed some trends related to MVP ideation and evaluation practices. For instance, regarding ideation, we found six different approaches (e.g., Design Thinking, Lean Inception) and mainly informal end-user involvement practices (e.g., workshops, interviews). For evaluation there is an emphasis on end-user validations based on practices such as usability tests, A/B testing, and usage data analysis. However, there is still limited research related to MVP technical feasibility assessment and effort estimation. We also observed a lack of scientific rigor in many of the identified studies.
[Conclusion] Our analysis suggests that there are opportunities for solution proposals to address gaps concerning technical feasibility assessment and effort estimation. Also, more effort needs to be invested into empirically evaluating the existing MVP-related practices.
Assessing the Suitability of Semi-Supervised Learning Datasets using Item Response Theory
Abstract: In practice, supervised learning algorithms require fully labeled datasets to achieve the high accuracy demanded by current modern applications. However, in industrial settings supervised learning algorithms can perform poorly because of few labeled instances. Semi-supervised learning (SSL) is an automatic labeling approach that utilizes complete labels to infer missing labels in partially complete datasets. The high number of available SSL algorithms and the lack of systematic comparison between them leaves practitioners without guidelines to select the appropriated one for their application. Moreover, each SSL algorithm is often validated and evaluated in a small number of common datasets. However, there is no research that examines what datasets are suitable for comparing different SSL algorihtms. The purpose of this paper to empirically evaluate the suitability of the datasets commonly used to evaluate and compare different SSL algorithms. We performed a simulation study using twelve datasets of three different datatypes (numerical, text, image) on thirteen different SSL algorithms.
The contributions of this paper is two-fold. First, we propose the use of Bayesian congeneric item response theory model to assess the suitability of commonly used datasets. Second, we compare the different SSL algorithms using these datasets. The results show that with except of three datasets, the others have very low discrimination factors and are easily solved by the current algorithms. Additionally, the SSL algorithms similarly under a 90% credible interval.
The paper concludes suggesting that researchers and practitioners should better consider the choice of datasets used for comparing SSL algorithms.
Toward a Technical Debt Relationship with the Pivoting of Growth Phase Startups
Abstract: Context: Pivoting enables software startups to turn an idea into a product, measure its effect, and learn from the results. Pivoting is bound to direct or indirect feedback from external customers and industry practitioners. However, internally to the startup, we have yet to discover how technical debt affects pivoting in growth phase startups and what technical debt patterns can be observed in different pivoting scenarios. Aim: Our goal is to evaluate how technical debt influences pivoting in growth phase startups. Methodology: We conducted a pilot study guided by semi-structured interviews from multiple software startup cases. Results: We identified three manners that technical debt influences pivoting: (1) direct, indirect, and no-influence. Managing and avoiding technical debt significantly reduces the likelihood of technology pivoting and restrains indirect effects on other pivoting types. Contribution: Our study will allow practitioners to address the influence of technical debt on pivoting in growth-phase software startups. Future researchers can benefit from our findings by conducting exploratory studies and providing educated recommendations.
A Structured Analysis of the Video Degradation Effects on the Performance of a Machine Learning-enabled Pedestrian Detector
Abstract: Machine Learning (ML)-enabled software systems have been incorporated in many
public demonstrations for automated driving (AD) systems. Such solutions have
also been considered as a crucial approach to aim at SAE Level 5 systems, where
the passengers in such vehicles do not have to interact with the system at all
anymore. Already in 2016, Nvidia demonstrated a complete end-to-end approach
for training the complete software stack covering perception, planning and
decision making, and the actual vehicle control. While such approaches show
the great potential of such ML-enabled systems, there have also been demonstrations
where already changes to single pixels in a video frame can potentially lead to
completely different decisions with dangerous consequences in the worst case.
In this paper, a structured analysis has been conducted to explore video
degradation effects on the performance of an ML-enabled pedestrian detector.
Firstly, a baseline of applying "You only look once" (YOLO) to 1,026 frames
with pedestrian annotations in the KITTI Vision Benchmark Suite has been
established. Next, video degradation candidates for each of these frames
were generated using the leading video compression codecs libx264, libx265,
Nvidia HEVC, and AV1: 52 frames for the various compression presets for color
frames, and 52 frames for gray-scale frames resulting in 104 degradation candidates
per original KITTI frame and in 426,816 images in total. YOLO was applied to
each image to compute the intersection-over-union (IoU) metric to compare the
performance with the original baseline. While aggressively lossy compression settings
result in significant performance drops as expected, it was also observed that some
configurations actually result in slightly better IoU results compared to the
baseline. Hence, while related work in literature demonstrated the potentially
negative consequences of even simple modifications to video data when using
ML-enabled systems, the findings from this work show that carefully chosen
lossy video configurations preserve a decent performance of particular ML-enabled
systems while allowing for substantial savings when storing or transmitting data.
Such aspects are of crucial importance when, for example, video data needs to be
collected from multiple vehicles wirelessly, where lossy video codecs are required
to cope with bandwidth limitations for example.
Towards MLOps: A Framework and a Maturity model
Abstract: The adoption of continuous Software Engineering practices like DevOps in business operations has contributed to significantly shorter software development and deployment cycles. Recently, MLOps has received increasing interest as a practice that brings together data scientist teams and operations. However, the adoption of MLOps in practice is still in its early stages and there are few common guidelines for how to effectively integrate these practices into existing software development practices. In this paper, we conduct a Systematic Literature Review and a Grey Literature Review to better understand MLOps. Based on our literature reviews, we derive a framework that identifies the activities involved when adopting MLOps and the stages in which companies evolve as they gain maturity and become more advanced. We validate this framework in three software-intensive embedded systems companies and highlight how they have managed to adopt and integrate MLOps into their software development organizations. The contribution of this paper is three-fold. First, we review contemporary literature to provide a state-of-art overview on MLOps. Based on this overview, we derive a framework in which we detail the activities involved in the continuous development of machine learning models (MLOps). Second, we present a maturity model in which we outline the different steps companies take when evolving their MLOps practices. Third, we validate our framework in three embedded systems companies and we map the case companies to stages in the maturity model.
Technical Debt Prioritization Methods: A Systematic Mapping Study
Abstract: Technical debt is the metaphor for shortcuts in software development that bring short-term benefits, but long-term consequences hinder the process of maintaining and developing software. It is important to manage these technical debt items, as not all of them need to be paid. Having a list of prioritized debts is an essential step in decision-making in the management process. This work aims at finding technical debt prioritization methods, providing a classification of them. That is, methods to identify whether and when a technical debt should be paid off. We performed a systematic mapping review to find and analyze the main papers of the area, covering the main bases. We selected 112 studies, resulting in 51 unique papers. We classified the methods in a two-level taxonomy containing 10 categories according to their different possible outcomes. In addition, we have identified three methods results: boolean, category and/or ordered list. Finally, we have also identified practical technical characteristics and requirements for a method to prioritize technical debt items in real projects. Although several methods have been found in literature, none of them are adaptive to the context and are language-independent, nor cover several technical debt types. Moreover, there is a clear lack of tools to use them. So, in conclusion, the research on technical debt prioritization is still wide open. And from this study, a combination of the techniques used in these methods can be tested and automated to assist in the decision-making process on which debts should be paid.
Probabilistic Program Performance Analysis
Abstract: We introduce a tool-supported method for the formal analysis of timing, resource use, cost and other quality aspects of computer programs. The new method synthesises a Markov-chain model of the analysed code, computes this quantitative model's transition probabilities using information from program logs, and employs probabilistic model checking to evaluate the performance properties of interest. Importantly, the probabilistic model can be reused to accurately predict how the program performance would change if the code ran on a different hardware platform, used a new function library, or had a different usage profile. We show the effectiveness of our method by using it to analyse the performance of Java code from the Apache Commons Math library, the Android messaging app Telegram, and an implementation of the knapsack algorithm.
Migrating Monoliths to Microservices-based Customizable Multi-tenant Cloud-native Apps
Abstract: It was common that software vendors sell licenses to their clients to use software products, e.g., Enterprise Resource Planning, which are deployed on clients’ premises. Moreover, many clients, especially big organizations, often require software products to be customized for their specific needs before deployment on premises. While software vendors are trying to migrate their (monolithic) software products to become Cloudnative Software-as-a-Service (SaaS), they face two big challenges that this paper aims to address: 1) How to migrate their monoliths to multi-tenant Cloud-native SaaS; and 2) How to enable tenant-specific customization for multi-tenant Cloud-native SaaS. This paper suggests an approach for migrating monoliths to microservices-based Cloud-native SaaS, providing customers with a flexible customization possibility, while taking advantage of the economics of scale that the Cloud and multi-tenancy provide. Our approach shows not only the migration to microservices but also how to introduce the necessary infrastructure to support the new services and enable tenant-specific customization. We have demonstrated the application of our approach on migrating a reference application of Microsoft called SportStore.
A Preliminary Evaluation of CPDP Approaches on Just-in-Time Software Defect Prediction
Abstract: CONTEXT: Just-in-Time defect prediction is to specify the suspicious code commits that might make a product cause defects. Building JIT defect prediction models require a commit history and their fixed defect records.
The shortage of commits of new projects motivated research of JIT cross-project defect prediction (CPDP). CPDP approaches proposed for component-level defect prediction were barely evaluated under JIT CPDP.
OBJECTIVE: To explore the effects of CPDP approaches for component-level defect prediction where JIT CPDP is adopted.
METHOD: A case study was conducted through two commit dataset suites provided in past studies for JIT defect prediction. JIT defect predictions with and without 21 CPDP approaches were compared regarding the classification performance using AUC. The CPDP approaches were also compared with each other.
RESULTS: Most CPDP approaches changed the prediction performance of a baseline that simply combined all CP data. A few CPDP approaches could improve the prediction performance significantly. Not a few approaches worsened the performance significantly. The results based on the two suites could specify two CPDP approaches safer than the baseline. The results were inconsistent with a previous study.
CONCLUSIONS: CPDP approaches for component-level might be effective for JIT CPDP. Further evaluations were needed to bring a firm conclusion.
An Architecture to Integrate Experimentation into the Software Development Infrastructure
Abstract: Available platforms for online controlled experimentation primarily focus on the technical execution of experiments and are isolated from the remaining software development infrastructure. The platform-independent experimentation infrastructure separates the experiment definition from its execution and focuses on the experimentation process. However, it is still not integrated into the remaining infrastructure.
In this paper, we extend the platform-independent experimentation infrastructure about interfaces to ease its integration into the software development infrastructure. The proposed solution is evaluated using a mixed-method research design to assess its usefulness, ease of use, and strengths as well as weaknesses. The results indicate that the proposed solution represents an adaptable, platform-independent, and cross-domain experimentation infrastructure that is perceived to be easy to use and useful.