Publications
Here is the list of publications grouped by year. You can filter them by using the buttons below.
2024
- EDTConfEngineering a Digital Twin for Diagnosis and Treatment of Multiple SclerosisGiordano D’Aloisio, Alessandro Di Matteo, Alessia Cipriani, Daniele Lozzi, Enrico Mattei, Gennaro Zanfardino, Antinisca Di Marco, and Giuseppe PlacidiIn Proceedings of the ACM/IEEE 27th International Conference on Model Driven Engineering Languages and Systems, 2024
Multiple sclerosis (MS) is a complex, chronic, and heterogeneous disease of the central nervous system that affects 3 million people globally. The multifactorial nature of MS necessitates an adaptive and personalized approach to diagnosis, monitoring, and treatment. This paper proposes a novel Digital Twin for Multiple Sclerosis (DTMS) designed to integrate diverse data sources, including Magnetic resonance imaging (MRI), clinical biomarkers, and digital health metrics, into a unified predictive model. The DTMS aims to enhance the precision of MS management by providing real-time, individualized insights into disease progression and treatment efficacy. Through a federated learning approach, the DTMS leverages explainable AI to offer reliable and personalized therapeutic recommendations, ultimately striving to delay disability and improve patient outcomes. This comprehensive digital framework represents a significant advancement in the application of AI and digital twins in the field of neurology, promising a more tailored and effective management strategy for MS.
- ESEMExploring LLM-Driven Explanations for Quantum AlgorithmsGiordano d’Aloisio, Sophie Fortz, Carol Hanna, Daniel Fortunato, Avner Bensoussan, Eñaut Mendiluze Usandizaga, and Federica SarroIn Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, 2024
Background Quantum computing is a rapidly growing new programming paradigm that brings significant changes to the design and implementation of algorithms. Understanding quantum algorithms requires knowledge of physics and mathematics, which can be challenging for software developers. Aims In this work, we provide a first analysis of how LLMs can support developers’ understanding of quantum code. Method We empirically analyse and compare the quality of explanations provided by three widely adopted LLMs (Gpt3.5, Llama2, and Tinyllama) using two different human-written prompt styles for seven state-of-the-art quantum algorithms. We also analyse how consistent LLM explanations are over multiple rounds and how LLMs can improve existing descriptions of quantum algorithms. Results Llama2 provides the highest quality explanations from scratch, while Gpt3.5 emerged as the LLM best suited to improve existing explanations. In addition, we show that adding a small amount of context to the prompt significantly improves the quality of explanations. Finally, we observe how explanations are qualitatively and syntactically consistent over multiple rounds. Conclusions This work highlights promising results, and opens challenges for future research in the field of LLMs for quantum code explanation. Future work includes refining the methods through prompt optimisation and parsing of quantum code explanations, as well as carrying out a systematic assessment of the quality of explanations.
- JSSUncovering gender gap in academia: A comprehensive analysis within the software engineering communityAndrea D’Angelo, Giordano d’Aloisio, Francesca Marzi, Antinisca Di Marco, and Giovanni StiloJournal of Systems and Software, 2024
Gender gap in education has gained considerable attention in recent years, as it carries profound implications for the academic community. However, while the problem has been tackled from a student perspective, research is still lacking from an academic point of view. In this work, our main objective is to address this unexplored area by shedding light on the intricate dynamics of gender gap within the Software Engineering (SE) community. To this aim, we first review how the problem of gender gap in the SE community and in academia has been addressed by the literature so far. Results show that men in SE build more tightly-knit clusters but less global co-authorship relations than women, but the networks do not exhibit homophily. Concerning academic promotions, the Software Engineering community presents a higher bias in promotions to Associate Professors and a smaller bias in promotions to Full Professors than the overall Informatics community.
- SSBSEGreenStableYolo: Optimizing Inference Time and Image Quality of Text-to-Image GenerationJingzhi Gong, Sisi Li, Giordano d’Aloisio, Zishuo Ding, Yulong Ye, William B Langdon, and Federica SarroIn International Symposium on Search Based Software Engineering, 2024Challenge Track Winner
Tuning the parameters and prompts for improving AI-based text-to-image generation has remained a substantial yet unaddressed challenge. Hence we introduce GreenStableYolo, which improves the parameters and prompts for Stable Diffusion to both reduce GPU inference time and increase image generation quality using NSGA-II and Yolo. Our experiments show that despite a relatively slight trade-off (18%) in image quality compared to StableYolo (which only considers image quality), GreenStableYolo achieves a substantial reduction in inference time (266% less) and a 526% higher hypervolume, thereby advancing the state-of-the-art for text-to-image generation.
- ICPEGrammar-Based Anomaly Detection of Microservice Systems Execution TracesAndrea D’Angelo, and Giordano d’AloisioIn Companion of the 15th ACM/SPEC International Conference on Performance Engineering, 2024Best Data Challenge Award
Microservice architectures are a widely adopted architectural pattern for large-scale applications. Given the large adoption of these systems, several works have been proposed to detect performance anomalies starting from analysing the execution traces. However, most of the proposed approaches rely on machine learning (ML) algorithms to detect anomalies. While ML methods may be effective in detecting anomalies, the training and deployment of these systems as been shown to be less efficient in terms of time, computational resources, and energy required.In this paper, we propose a novel approach based on Context-free grammar for anomaly detection of microservice systems execution traces. We employ the SAX encoding to transform execution traces into strings. Then, we select strings encoding anomalies, and for each possible anomaly, we build a Context-free grammar using the Sequitur grammar induction algorithm. We test our approach on two real-world datasets and compare it with a Logistic Regression classifier. We show how our approach is more effective in terms of training time of 15 seconds with a minimum loss in effectiveness of 5% compared to the Logistic Regression baseline.
- PreprintHow fair are we? From conceptualization to automated assessment of fairness definitionsGiordano d’Aloisio, Claudio Di Sipio, Antinisca Di Marco, and Davide Di RuscioarXiv preprint arXiv:2404.09919, 2024
Fairness is a critical concept in ethics and social domains, but it is also a challenging property to engineer in software systems. With the increasing use of machine learning in software systems, researchers have been developing techniques to automatically assess the fairness of software systems. Nonetheless, a significant proportion of these techniques rely upon pre-established fairness definitions, metrics, and criteria, which may fail to encompass the wide-ranging needs and preferences of users and stakeholders. To overcome this limitation, we propose a novel approach, called MODNESS, that enables users to customize and define their fairness concepts using a dedicated modeling environment. Our approach guides the user through the definition of new fairness concepts also in emerging domains, and the specification and composition of metrics for its evaluation. Ultimately, MODNESS generates the source code to implement fair assessment based on these custom definitions. In addition, we elucidate the process we followed to collect and analyze relevant literature on fairness assessment in software engineering (SE). We compare MODNESS with the selected approaches and evaluate how they support the distinguishing features identified by our study. Our findings reveal that i) most of the current approaches do not support user-defined fairness concepts; ii) our approach can cover two additional application domains not addressed by currently available tools, i.e., mitigating bias in recommender systems for software engineering and Arduino software component recommendations; iii) MODNESS demonstrates the capability to overcome the limitations of the only two other Model-Driven Engineering-based approaches for fairness assessment.
2023
- FASEDemocratizing Quality-Based Machine Learning Development through Extended Feature ModelsGiordano d’Aloisio, Antinisca Di Marco, and Giovanni StiloIn Fundamental Approaches to Software Engineering, 2023
ML systems have become an essential tool for experts of many domains, data scientists and researchers, allowing them to find answers to many complex business questions starting from raw datasets. Nevertheless, the development of ML systems able to satisfy the stakeholders’ needs requires an appropriate amount of knowledge about the ML domain. Over the years, several solutions have been proposed to automate the development of ML systems. However, an approach taking into account the new quality concerns needed by ML systems (like fairness, interpretability, privacy, and others) is still missing.
- IP&MDebiaser for Multiple Variables to enhance fairness in classification tasksGiordano d’Aloisio, Andrea D’Angelo, Antinisca Di Marco, and Giovanni StiloInformation Processing & Management, 2023
Nowadays assuring that search and recommendation systems are fair and do not apply discrimination among any kind of population has become of paramount importance. This is also highlighted by some of the sustainable development goals proposed by the United Nations. Those systems typically rely on machine learning algorithms that solve the classification task. Although the problem of fairness has been widely addressed in binary classification, unfortunately, the fairness of multi-class classification problem needs to be further investigated lacking well-established solutions. For the aforementioned reasons, in this paper, we present the Debiaser for Multiple Variables (DEMV), an approach able to mitigate unbalanced groups bias (i.e., bias caused by an unequal distribution of instances in the population) in both binary and multi-class classification problems with multiple sensitive variables. The proposed method is compared, under several conditions, with a set of well-established baselines using different categories of classifiers. At first we conduct a specific study to understand which is the best generation strategies and their impact on DEMV’s ability to improve fairness. Then, we evaluate our method on a heterogeneous set of datasets and we show how it overcomes the established algorithms of the literature in the multi-class classification setting and in the binary classification setting when more than two sensitive variables are involved. Finally, based on the conducted experiments, we discuss strengths and weaknesses of our method and of the other baselines.
- IJDRRThe toolkit disaster preparedness for pre-disaster planningDonato Di Ludovico, Chiara Capannolo, and Giordano d’AloisioInternational Journal of Disaster Risk Reduction, 2023
The University of L’Aquila “Territori Aperti” (Open Territories) project deals with the topics of prevention and management of natural disasters and the reconstruction and development processes in the affected areas. One of its tasks is developing research on the Toolkit Disaster Preparedness (TDP) aimed at Pre-Disaster Planning. The TDP is structured in this study as a support for the construction of Recovery Strategies and Actions, and concerns the collection and analysis of good practices on post-disaster reconstruction management (Experience Sheets (ESs)), their elaboration into Disaster Preparedness Recommendation Sheets (DPRSs), and the transposition of these into the Recovery Plan. The methodology for the construction of the Recovery Plan was structured in two macro-activities. The first concerns structuring the Toolkit and the related set of sheets (ESs→DPRSs). The second concerns the transfer of the DPRSs to the Recovery Strategies, so that the recommendations and success measures of the former become the actions of the latter. The Toolkit methodology was applied to the case studies of the Abruzzo 2009 earthquake and the Central Italy 2016-17 earthquake. The next steps of the research will concern testing the methodology in the second macro-activity, i.e. the construction of the Recovery Plan, again in the territorial context of the two aforementioned areas.
- RRRRA Decision Tree to Shepherd Scientists through Data RetrievabilityAndrea Bianchi, Giordano d’Aloisio, Francesca Marzi, and Antinisca Di MarcoIn Second Workshop on Reproducibility and Replication of Research Results, 2023
Reproducibility is a crucial aspect of scientific research that involves the ability to independently replicate experimental results by analysing the same data or repeating the same experiment. Over the years, many works have been proposed to make the results of the experiments actually reproducible. However, very few address the importance of data reproducibility, defined as the ability of independent researchers to retain the same dataset used as input for experimentation. Properly addressing the problem of data reproducibility is crucial because often just providing a link to the data is not enough to make the results reproducible. In fact, also proper metadata (e.g., preprocessing instruction) must be provided to make a dataset fully reproducible. In this work, our aim is to fill this gap by proposing a decision tree to sheperd researchers through the reproducibility of their datasets. In particular, this decision tree guides researchers through identifying if the dataset is actually reproducible and if additional metadata (i.e., additional resources needed to reproduce the data) must also be provided. This decision tree will be the foundation of a future application that will automate the data reproduction process by automatically providing the necessary metadata based on the particular context (e.g., data availability, data preprocessing, and so on). It is worth noting that, in this paper, we detail the steps to make a dataset retrievable, while we will detail other crucial aspects for reproducibility (e.g., dataset documentation) in future works.
2022
- BIAS@ECIREnhancing Fairness in Classification Tasks with Multiple Variables: A Data- and Model-Agnostic ApproachGiordano d’Aloisio, Giovanni Stilo, Antinisca Di Marco, and Andrea D’AngeloIn Advances in Bias and Fairness in Information Retrieval, 2022
Nowadays assuring that search and recommendation systems are fair and do not apply discrimination among any kind of population has become of paramount importance. Those systems typically rely on machine learning algorithms that solve the classification task. Although the problem of fairness has been widely addressed in binary classification, unfortunately, the fairness of multi-class classification problem needs to be further investigated lacking well-established solutions. For the aforementioned reasons, in this paper, we present the Debiaser for Multiple Variables, a novel approach able to enhance fairness in both binary and multi-class classification problems. The proposed method is compared, under several conditions, with the well-established baseline. We evaluate our method on a heterogeneous data set and prove how it overcomes the established algorithms in the multi-classification setting, while maintaining good performances in binary classification. Finally, we present some limitations and future improvements.
- ICSE-DSQuality-Driven Machine Learning-based Data Science Pipeline Realization: a software engineering approachGiordano d’AloisioIn 2022 IEEE/ACM 44th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), 2022
The recently wide adoption of data science approaches to decision making in several application domains (such as health, business and even education) open new challenges in engineering and implementation of this systems. Considering the big picture of data science, Machine learning is the wider used technique and due to its characteristics, we believe that a better engineering methodology and tools are needed to realize innovative data-driven systems able to satisfy the emerging quality attributes (such as, debias and fariness, explainability, privacy and ethics, sustainability). This research project will explore the following three pillars: i) identify key quality attributes, formalize them in the context of data science pipelines and study their relationships; ii) define a new software engineering approach for data-science systems development that assures compliance with quality requirements; iii) implement tools that guide IT professionals and researchers in the realization of ML-based data science pipelines since the requirement engineering. Moreover, in this paper we also presents some details of the project showing how the feature models and model-driven engineering can be leveraged to realize our project.
- PreprintModeling Quality and Machine Learning Pipelines through Extended Feature ModelsGiordano d’Aloisio, Antinisca Di Marco, and Giovanni Stilo2022arXiv:2207.07528 [cs]
The recently increased complexity of Machine Learning (ML) methods, led to the necessity to lighten both the research and industry development processes. ML pipelines have become an essential tool for experts of many domains, data scientists and researchers, allowing them to easily put together several ML models to cover the full analytic process starting from raw datasets. Over the years, several solutions have been proposed to automate the building of ML pipelines, most of them focused on semantic aspects and characteristics of the input dataset. However, an approach taking into account the new quality concerns needed by ML systems (like fairness, interpretability, privacy, etc.) is still missing. In this paper, we first identify, from the literature, key quality attributes of ML systems. Further, we propose a new engineering approach for quality ML pipeline by properly extending the Feature Models meta-model. The presented approach allows to model ML pipelines, their quality requirements (on the whole pipeline and on single phases), and quality characteristics of algorithms used to implement each pipeline phase. Finally, we demonstrate the expressiveness of our model considering the classification problem.
2021
- ILOGSismaDL: an ontology to represent post-disaster regulationFrancesca Caroccia, Damiano D’Agostino, Giordano d’Aloisio, Antinisca Di Marco, and Giovanni StiloIn 12th Workshop on Information Logistics and Digital Transformation, 2021
The emergency caused by a natural disaster must be tackled promptly by public institutions. In this situation, Governments enact specific laws (i.e., decrees) to handle the emergency and the reconstruction of destroyed areas. As it happened in 2009 and 2016 when the Italian Government issued several, very different, decrees to face respectively the earthquakes of L’Aquila and Centro Italia. In this work, we propose SismaDL, a LKIF based ontology, that models the laws in the domain of natural disasters. SismaDL has been used to model the aforementioned laws to build a knowledge base useful to reason about why one regulation is less effective and efficient than the other. SismaDL is the first step of a wider project whose aims are: i) compare laws in the domain of natural disaster; ii) integrate such laws in the Semantic Web; iii) evaluate the effectiveness of a post-disaster reconstruction law; iv) identify good practices to build a reference normative model of the natural disaster regulation. This project is a founding step towards the development of accurate and timely IT systems for efficient and high quality disaster management and reconstruction services to support Governments and local institutions in case of natural disasters.
2023
- RRRRA Decision Tree to Shepherd Scientists through Data RetrievabilityAndrea Bianchi, Giordano d’Aloisio, Francesca Marzi, and Antinisca Di MarcoIn Second Workshop on Reproducibility and Replication of Research Results, 2023
Reproducibility is a crucial aspect of scientific research that involves the ability to independently replicate experimental results by analysing the same data or repeating the same experiment. Over the years, many works have been proposed to make the results of the experiments actually reproducible. However, very few address the importance of data reproducibility, defined as the ability of independent researchers to retain the same dataset used as input for experimentation. Properly addressing the problem of data reproducibility is crucial because often just providing a link to the data is not enough to make the results reproducible. In fact, also proper metadata (e.g., preprocessing instruction) must be provided to make a dataset fully reproducible. In this work, our aim is to fill this gap by proposing a decision tree to sheperd researchers through the reproducibility of their datasets. In particular, this decision tree guides researchers through identifying if the dataset is actually reproducible and if additional metadata (i.e., additional resources needed to reproduce the data) must also be provided. This decision tree will be the foundation of a future application that will automate the data reproduction process by automatically providing the necessary metadata based on the particular context (e.g., data availability, data preprocessing, and so on). It is worth noting that, in this paper, we detail the steps to make a dataset retrievable, while we will detail other crucial aspects for reproducibility (e.g., dataset documentation) in future works.
2022
- BIAS@ECIREnhancing Fairness in Classification Tasks with Multiple Variables: A Data- and Model-Agnostic ApproachGiordano d’Aloisio, Giovanni Stilo, Antinisca Di Marco, and Andrea D’AngeloIn Advances in Bias and Fairness in Information Retrieval, 2022
Nowadays assuring that search and recommendation systems are fair and do not apply discrimination among any kind of population has become of paramount importance. Those systems typically rely on machine learning algorithms that solve the classification task. Although the problem of fairness has been widely addressed in binary classification, unfortunately, the fairness of multi-class classification problem needs to be further investigated lacking well-established solutions. For the aforementioned reasons, in this paper, we present the Debiaser for Multiple Variables, a novel approach able to enhance fairness in both binary and multi-class classification problems. The proposed method is compared, under several conditions, with the well-established baseline. We evaluate our method on a heterogeneous data set and prove how it overcomes the established algorithms in the multi-classification setting, while maintaining good performances in binary classification. Finally, we present some limitations and future improvements.
2021
- ILOGSismaDL: an ontology to represent post-disaster regulationFrancesca Caroccia, Damiano D’Agostino, Giordano d’Aloisio, Antinisca Di Marco, and Giovanni StiloIn 12th Workshop on Information Logistics and Digital Transformation, 2021
The emergency caused by a natural disaster must be tackled promptly by public institutions. In this situation, Governments enact specific laws (i.e., decrees) to handle the emergency and the reconstruction of destroyed areas. As it happened in 2009 and 2016 when the Italian Government issued several, very different, decrees to face respectively the earthquakes of L’Aquila and Centro Italia. In this work, we propose SismaDL, a LKIF based ontology, that models the laws in the domain of natural disasters. SismaDL has been used to model the aforementioned laws to build a knowledge base useful to reason about why one regulation is less effective and efficient than the other. SismaDL is the first step of a wider project whose aims are: i) compare laws in the domain of natural disaster; ii) integrate such laws in the Semantic Web; iii) evaluate the effectiveness of a post-disaster reconstruction law; iv) identify good practices to build a reference normative model of the natural disaster regulation. This project is a founding step towards the development of accurate and timely IT systems for efficient and high quality disaster management and reconstruction services to support Governments and local institutions in case of natural disasters.
2024
- EDTConfEngineering a Digital Twin for Diagnosis and Treatment of Multiple SclerosisGiordano D’Aloisio, Alessandro Di Matteo, Alessia Cipriani, Daniele Lozzi, Enrico Mattei, Gennaro Zanfardino, Antinisca Di Marco, and Giuseppe PlacidiIn Proceedings of the ACM/IEEE 27th International Conference on Model Driven Engineering Languages and Systems, 2024
Multiple sclerosis (MS) is a complex, chronic, and heterogeneous disease of the central nervous system that affects 3 million people globally. The multifactorial nature of MS necessitates an adaptive and personalized approach to diagnosis, monitoring, and treatment. This paper proposes a novel Digital Twin for Multiple Sclerosis (DTMS) designed to integrate diverse data sources, including Magnetic resonance imaging (MRI), clinical biomarkers, and digital health metrics, into a unified predictive model. The DTMS aims to enhance the precision of MS management by providing real-time, individualized insights into disease progression and treatment efficacy. Through a federated learning approach, the DTMS leverages explainable AI to offer reliable and personalized therapeutic recommendations, ultimately striving to delay disability and improve patient outcomes. This comprehensive digital framework represents a significant advancement in the application of AI and digital twins in the field of neurology, promising a more tailored and effective management strategy for MS.
- ESEMExploring LLM-Driven Explanations for Quantum AlgorithmsGiordano d’Aloisio, Sophie Fortz, Carol Hanna, Daniel Fortunato, Avner Bensoussan, Eñaut Mendiluze Usandizaga, and Federica SarroIn Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, 2024
Background Quantum computing is a rapidly growing new programming paradigm that brings significant changes to the design and implementation of algorithms. Understanding quantum algorithms requires knowledge of physics and mathematics, which can be challenging for software developers. Aims In this work, we provide a first analysis of how LLMs can support developers’ understanding of quantum code. Method We empirically analyse and compare the quality of explanations provided by three widely adopted LLMs (Gpt3.5, Llama2, and Tinyllama) using two different human-written prompt styles for seven state-of-the-art quantum algorithms. We also analyse how consistent LLM explanations are over multiple rounds and how LLMs can improve existing descriptions of quantum algorithms. Results Llama2 provides the highest quality explanations from scratch, while Gpt3.5 emerged as the LLM best suited to improve existing explanations. In addition, we show that adding a small amount of context to the prompt significantly improves the quality of explanations. Finally, we observe how explanations are qualitatively and syntactically consistent over multiple rounds. Conclusions This work highlights promising results, and opens challenges for future research in the field of LLMs for quantum code explanation. Future work includes refining the methods through prompt optimisation and parsing of quantum code explanations, as well as carrying out a systematic assessment of the quality of explanations.
- SSBSEGreenStableYolo: Optimizing Inference Time and Image Quality of Text-to-Image GenerationJingzhi Gong, Sisi Li, Giordano d’Aloisio, Zishuo Ding, Yulong Ye, William B Langdon, and Federica SarroIn International Symposium on Search Based Software Engineering, 2024Challenge Track Winner
Tuning the parameters and prompts for improving AI-based text-to-image generation has remained a substantial yet unaddressed challenge. Hence we introduce GreenStableYolo, which improves the parameters and prompts for Stable Diffusion to both reduce GPU inference time and increase image generation quality using NSGA-II and Yolo. Our experiments show that despite a relatively slight trade-off (18%) in image quality compared to StableYolo (which only considers image quality), GreenStableYolo achieves a substantial reduction in inference time (266% less) and a 526% higher hypervolume, thereby advancing the state-of-the-art for text-to-image generation.
- ICPEGrammar-Based Anomaly Detection of Microservice Systems Execution TracesAndrea D’Angelo, and Giordano d’AloisioIn Companion of the 15th ACM/SPEC International Conference on Performance Engineering, 2024Best Data Challenge Award
Microservice architectures are a widely adopted architectural pattern for large-scale applications. Given the large adoption of these systems, several works have been proposed to detect performance anomalies starting from analysing the execution traces. However, most of the proposed approaches rely on machine learning (ML) algorithms to detect anomalies. While ML methods may be effective in detecting anomalies, the training and deployment of these systems as been shown to be less efficient in terms of time, computational resources, and energy required.In this paper, we propose a novel approach based on Context-free grammar for anomaly detection of microservice systems execution traces. We employ the SAX encoding to transform execution traces into strings. Then, we select strings encoding anomalies, and for each possible anomaly, we build a Context-free grammar using the Sequitur grammar induction algorithm. We test our approach on two real-world datasets and compare it with a Logistic Regression classifier. We show how our approach is more effective in terms of training time of 15 seconds with a minimum loss in effectiveness of 5% compared to the Logistic Regression baseline.
2023
- FASEDemocratizing Quality-Based Machine Learning Development through Extended Feature ModelsGiordano d’Aloisio, Antinisca Di Marco, and Giovanni StiloIn Fundamental Approaches to Software Engineering, 2023
ML systems have become an essential tool for experts of many domains, data scientists and researchers, allowing them to find answers to many complex business questions starting from raw datasets. Nevertheless, the development of ML systems able to satisfy the stakeholders’ needs requires an appropriate amount of knowledge about the ML domain. Over the years, several solutions have been proposed to automate the development of ML systems. However, an approach taking into account the new quality concerns needed by ML systems (like fairness, interpretability, privacy, and others) is still missing.
2022
- ICSE-DSQuality-Driven Machine Learning-based Data Science Pipeline Realization: a software engineering approachGiordano d’AloisioIn 2022 IEEE/ACM 44th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), 2022
The recently wide adoption of data science approaches to decision making in several application domains (such as health, business and even education) open new challenges in engineering and implementation of this systems. Considering the big picture of data science, Machine learning is the wider used technique and due to its characteristics, we believe that a better engineering methodology and tools are needed to realize innovative data-driven systems able to satisfy the emerging quality attributes (such as, debias and fariness, explainability, privacy and ethics, sustainability). This research project will explore the following three pillars: i) identify key quality attributes, formalize them in the context of data science pipelines and study their relationships; ii) define a new software engineering approach for data-science systems development that assures compliance with quality requirements; iii) implement tools that guide IT professionals and researchers in the realization of ML-based data science pipelines since the requirement engineering. Moreover, in this paper we also presents some details of the project showing how the feature models and model-driven engineering can be leveraged to realize our project.
2024
- JSSUncovering gender gap in academia: A comprehensive analysis within the software engineering communityAndrea D’Angelo, Giordano d’Aloisio, Francesca Marzi, Antinisca Di Marco, and Giovanni StiloJournal of Systems and Software, 2024
Gender gap in education has gained considerable attention in recent years, as it carries profound implications for the academic community. However, while the problem has been tackled from a student perspective, research is still lacking from an academic point of view. In this work, our main objective is to address this unexplored area by shedding light on the intricate dynamics of gender gap within the Software Engineering (SE) community. To this aim, we first review how the problem of gender gap in the SE community and in academia has been addressed by the literature so far. Results show that men in SE build more tightly-knit clusters but less global co-authorship relations than women, but the networks do not exhibit homophily. Concerning academic promotions, the Software Engineering community presents a higher bias in promotions to Associate Professors and a smaller bias in promotions to Full Professors than the overall Informatics community.
2023
- IP&MDebiaser for Multiple Variables to enhance fairness in classification tasksGiordano d’Aloisio, Andrea D’Angelo, Antinisca Di Marco, and Giovanni StiloInformation Processing & Management, 2023
Nowadays assuring that search and recommendation systems are fair and do not apply discrimination among any kind of population has become of paramount importance. This is also highlighted by some of the sustainable development goals proposed by the United Nations. Those systems typically rely on machine learning algorithms that solve the classification task. Although the problem of fairness has been widely addressed in binary classification, unfortunately, the fairness of multi-class classification problem needs to be further investigated lacking well-established solutions. For the aforementioned reasons, in this paper, we present the Debiaser for Multiple Variables (DEMV), an approach able to mitigate unbalanced groups bias (i.e., bias caused by an unequal distribution of instances in the population) in both binary and multi-class classification problems with multiple sensitive variables. The proposed method is compared, under several conditions, with a set of well-established baselines using different categories of classifiers. At first we conduct a specific study to understand which is the best generation strategies and their impact on DEMV’s ability to improve fairness. Then, we evaluate our method on a heterogeneous set of datasets and we show how it overcomes the established algorithms of the literature in the multi-class classification setting and in the binary classification setting when more than two sensitive variables are involved. Finally, based on the conducted experiments, we discuss strengths and weaknesses of our method and of the other baselines.
- IJDRRThe toolkit disaster preparedness for pre-disaster planningDonato Di Ludovico, Chiara Capannolo, and Giordano d’AloisioInternational Journal of Disaster Risk Reduction, 2023
The University of L’Aquila “Territori Aperti” (Open Territories) project deals with the topics of prevention and management of natural disasters and the reconstruction and development processes in the affected areas. One of its tasks is developing research on the Toolkit Disaster Preparedness (TDP) aimed at Pre-Disaster Planning. The TDP is structured in this study as a support for the construction of Recovery Strategies and Actions, and concerns the collection and analysis of good practices on post-disaster reconstruction management (Experience Sheets (ESs)), their elaboration into Disaster Preparedness Recommendation Sheets (DPRSs), and the transposition of these into the Recovery Plan. The methodology for the construction of the Recovery Plan was structured in two macro-activities. The first concerns structuring the Toolkit and the related set of sheets (ESs→DPRSs). The second concerns the transfer of the DPRSs to the Recovery Strategies, so that the recommendations and success measures of the former become the actions of the latter. The Toolkit methodology was applied to the case studies of the Abruzzo 2009 earthquake and the Central Italy 2016-17 earthquake. The next steps of the research will concern testing the methodology in the second macro-activity, i.e. the construction of the Recovery Plan, again in the territorial context of the two aforementioned areas.