Publications

Security bug reports classification using fasttext

Published in , 2023

Abstract: Software developers and maintainers must address security bug reports (SBRs) before they are publicly disclosed, and their system is left vulnerable to attack. Bug tracking systems may contain securities-related reports which are unlabeled as SBRs, which makes it hard for developers to identify them. Therefore, finding unlabeled SBRs is an essential to help security expert developers identify these security issues fast and accurately. The goal of this paper is to aid software developers to better classify bug reports that identify security vulnerabilities as security bug reports through fasttext classifier. Previous work has applied text analytics and machine learning learners to classify which bug reports are security related. We improve on that work, as shown by our analysis of five open-source projects. We first collected a dataset of 45,940 bug reports from five software repositories (e.g., the work of Peters et al. and Shu et al.). Second, we conducted an experiment throughout the classification of SBRs using machine learning technique; particularly, we built fasttext classifiers. Finally, we investigated the accuracy of our built fasttext classifiers in identifying SBRs. Our experiment results show that our fasttext classifier can achieve an average F1 score of 0.81 when used to identify SBRs. Furthermore, we examined the generalizability of identifying SBRs by applying cross-project validation, and our results showed that the fasttext classifier is able to achieve an average F1 score values of 0.65. Finally, we made our data and results available at Alqahtani (fasttext implementation, 2023. https://github.com/isultane/fasttext_classifications) to help the replication of our work.

Download here

VrT: A CWE-Based Vulnerability Report Tagger - Machine Learning Driven Cybersecurity Tool for Vulnerability Classification

Published in , 2023

Abstract: vulnerability reports play an important role in the software maintenance domain. Disclosing vulnerabilities that attackers can exploit depends on the time of mitigating that software vulnerability. The information on vulnerability reports reported by several security scanning software tools facilitates vulnerability management, trends, and secure software development automation. Tagging of vulnerability reports with vulnerability type has thus far been performed manually. Therefore, human-induced errors and scalability issues suffered due to the shortage of security experts. This paper introduces a tool called Vulnerability Report Tagger (VrT), which leverages machine-learning strategies on vulnerability descriptions for automatically labeling NVD vulnerability reports. VrT automatically predicts the cybersecurity labels to assign to vulnerability text to encourage the use of labeling mechanisms in vulnerability reporting systems to facilitate the vulnerability management and prioritization process. Along with the presentation of the tool architecture and usage, we also evaluate our tool effectiveness in performing the vulnerabilities classification (i.e., tagging) process. Link to the tool: https://rb.gy/cz7hwa.

Download here

A Unified Framework for Automating Software Security Analysis in DevSecOps

Published in , 2022

Abstract: The Development and Operations (DevOps) methodology is a set of practices and cultural values. Its main objectives are to shorten the software development lifecycle, produce quality software, and eliminate software evolution barriers. The increased demand for secure software applications has led to a new field version of DevOps called Development, Security, and Operations (DevSecOps), which attempts to integrate security practices into the DevOps process. In this paper, we outline the current challenges in securing DevOps applications, such as the lack of automated software security testing tools, insufficient integration of security tools, a lack of security knowledge between developers, and false-positive results produced by many vulnerability scanner tools. Therefore, we introduce a unified framework for automating software security analysis in the DevSecOps paradigm that serves as a middle development process between software applications’ Continuous Integration (CI) and Continuous Delivery (CD) pipelines and application security services. We have shown the framework’s high-level architecture, and one case study is presented to illustrate the applicability of our proposed approach.

Download here

A study on the use of vulnerabilities databases in software engineering domain

Published in , 2022

Abstract: Over the last decade several software vulnerability databases have been introduced to guide researchers and developers in developing more secure and reliable software. While the Software Engineering research community is increasingly becoming aware of these vulnerabilities databases, no comprehensive literature survey exists that studies how they are used in software development. The objective of our survey is to provide insights on how the software vulnerability database (SVDBs) research landscape has evolved over the past 17 years and outline some open challenges associated with their use in non-security domain. More specifically, we introduce a semi-automated methodology based on topic modeling, to discover relevant topics from our dataset of 99 relevant SE research articles. We find 24 topics discussing the use of SVDBs in SE domain. The results shows that i) topics describing the use of SVDBs range from security empirical (case) studies to tools for generating security test cases; ii) the majority of the surveyed papers cover a limited number of software engineering contributions or activities (e.g., maintenance) and iii) that most of the surveyed articles rely on only one SVDB as their knowledge source. Dataset and results are available at https://github.com/isultane/svdbs_dataset

Download here

Automated Extraction of Security Concerns from Bug Reports

Published in , 2019

Abstract: Issue tracker repositories contain a wealth of textual information including bug reports that capture information, often implicit, about Information Security (IS) concerns and vulnerabilities associated with certain issues. Deriving an approach to extract such security concerns from bug reports can yield several benefits, such as bug management (e.g., prioritization) or bug triage. Existing research on Information Extraction (IE) for extracting knowledge from bug reports has mainly focused on supervised learning, which requires a significant amount of human labor in preparing a training corpus. In this paper, we explore a fully automated approach that can extract security concepts (tags) from bug reports without the need for manual training data. This approach can automatically identify and classify bug reports based on their security concepts and textual similarities. In addition, we further enrich these tags with meaningful and representative security names derived from the security domain.

Download here

Semantic modeling approach for software vulnerabilities data sources

Published in , 2019

Abstract: Data sources describing software security vulnerabilities are commonly used by software engineers not only increase the security of software systems but also enhance software productivity and reduce maintenance costs. However, with the constantly growing amount of available security vulnerability information and this information being spread across heterogeneous resources, software developers are struggling in taking full advantage of these resources. The Semantic Web and its supporting technology stack have been widely promoted to support the modeling, reuse and interoperability among heterogeneous data sources. In our research we present a Semantic Web enabled knowledge model which provides a formal and semi-automated approach for unifying vulnerability information resources. As part of this knowledge modeling approach, we also take advantage of Formal Concept Analysis (FCA) to identify vulnerability related knowledge concepts and model them at various abstraction levels. We illustrate the applicability and flexibility of our approach through several usage examples that take advantage of our unified knowledge model and Semantic Web inference services to provide new types of vulnerability analysis.

Download here

API trustworthiness: an ontological approach for software library adoption

Published in , 2019

Abstract: The globalization of the software industry has led to an emerging trend where software systems depend increasingly on the use of external open-source external libraries and application programming interfaces (APIs). While a significant body of research exists on identifying and recommending potentially reusable libraries to end users, very little is known on the potential direct and indirect impact of these external library recommendations on the quality and trustworthiness of a client’s project. In our research, we introduce a novel Ontological Trustworthiness Assessment Model (OntTAM), which supports (1) the automated analysis and assessment of quality attributes related to the trustworthiness of libraries and APIs in open-source systems and (2) provides developers with additional insights into the potential impact of reused libraries and APIs on the quality and trustworthiness of their project. We illustrate the applicability of our approach, by assessing the trustworthiness of libraries in terms of their API breaking changes, security vulnerabilities, and license violations and their potential impact on client projects.

Download here

An ontology-based approach to automate tagging of software artifacts

Published in , 2017

Abstract: Context: Software engineering repositories contain a wealth of textual information such as source code comments, developers’ discussions, commit messages and bug reports. These free form text descriptions can contain both direct and implicit references to security concerns. Goal: Derive an approach to extract security concerns from textual information that can yield several benefits, such as bug management (e.g., prioritization), bug triage or capturing zero-day attack. Method: Propose a fully automated classification and tagging approach that can extract security tags from these texts without the need for manual training data. Results: We introduce an ontology based Software Security Tagger Framework that can automatically identify and classify cybersecurity-related entities, and concepts in text of software artifacts. Conclusion: Our preliminary results indicate that the framework can successfully extract and classify cybersecurity knowledge captured in unstructured text found in software artifacts.

Download here

Recovering semantic traceability links between APIs and security vulnerabilities: An ontological modeling approach

Published in , 2017

Abstract: Over the last decade, a globalization of the software industry took place, which facilitated the sharing and reuse of code across existing project boundaries. At the same time, such global reuse also introduces new challenges to the software engineering community, with not only components but also their problems and vulnerabilities being now shared. For example, vulnerabilities found in APIs no longer affect only individual projects but instead might spread across projects and even global software ecosystem borders. Tracing these vulnerabilities at a global scale becomes an inherently difficult task since many of the existing resources required for such analysis still rely on proprietary knowledge representation. In this research, we introduce an ontology-based knowledge modeling approach that can eliminate such information silos. More specifically, we focus on linking security knowledge with other software knowledge to improve traceability and trust in software products (APIs). Our approach takes advantage of the Semantic Web and its reasoning services, to trace and assess the impact of security vulnerabilities across project boundaries. We present a case study, to illustrate the applicability and flexibility of our ontological modeling approach by tracing vulnerabilities across project and resource boundaries.

Download here

Enhancing Trust–Software Vulnerability Analysis Framework

Published in , 2017

Abstract: Open source projects and the globalization of the software industry have been a driving force in reuse of system components across traditional system boundaries. As a result, vulnerabilities and security concerns are no longer only impact individual but now also global software ecosystems. Known vulnerabilities and security concerns are reported in specialized vulnerability databases, which often remain information silos. In my PhD research, I introduce a modeling approach, which eliminates these information silos by linking the security knowledge with other software artifacts to improve traceability and trust in software products.

Download here

SV-AF—a security vulnerability analysis framework

Published in , 2016

Abstract: The globalization of the software industry has introduced a widespread use of system components across traditional system boundaries. Due to this global reuse, also vulnerabilities and security concerns are no longer limited in their scope to individual systems but instead can now affect global software ecosystems. While known vulnerabilities and security concerns are reported in specialized vulnerability databases, these repositories often remain information silos. In this research, we introduce a modeling approach, which eliminates these silos by linking security knowledge with other software artifacts to improve traceability and trust in software products. In our approach, we introduce a Security Vulnerabilities Analysis Framework (SV-AF) to support evidence based vulnerability detection. Two case studies are presented to illustrate the applicability of our presented approach. In these case studies, we link the NVD vulnerability databases and the Maven build repository to trace vulnerabilities across repository and project boundaries. In our analysis, we identify that 750 Maven project releases are directly affected by known security vulnerabilities and by considering transitive dependencies, an additional 415604 Maven projects can be identified as potentially affected by these vulnerabilities.

Download here

Tracing known security vulnerabilities in software repositories–A Semantic Web enabled modeling approach

Published in , 2016

Abstract: The introduction of the Internet has revolutionized not only our society but also transformed the software industry, with knowledge and information sharing becoming a central part of software development processes. The resulting globalization of the software industry has not only increased software reuse, but also introduced new challenges. Among the challenges, arising from the knowledge sharing is Information Security, which has emerged to become a major threat to the software development community, since not only source code but also its vulnerabilities are shared across project boundaries. Developers are unaware of such security vulnerabilities in their projects, often until a vulnerability is either exploited by attackers or made publicly available by independent security advisory databases. In this research, we present a modeling approach, which takes advantage of Semantic Web technologies, to establish traceability links between security advisory repositories and other software repositories. More specifically, we establish a unified ontological representation, which supports bi-directional traceability links between knowledge captured in software build repositories and specialized vulnerability database. These repositories can be considered trusted information silos that are typically not directly linked to other resources, such as source code repositories containing the reported instances of these problems. The novelty of our approach is that it allows us to overcome some of these traditional information silos and transform them into information hubs, which promote sharing of knowledge across repository boundaries. We conducted several experiments to illustrate the applicability of our approach by tracing existing vulnerabilities to projects which might directly or indirectly be affected by vulnerabilities inherited from other projects and libraries.

Download here

Sultan Alqahtani

Publications