4 results
Search Results
Now showing 1 - 4 of 4
Item Using Machine Learning and Graph Mining Approaches to Improve Software Requirements Quality: An Empirical Investigation(North Dakota State University, 2019) Singh, ManinderSoftware development is prone to software faults due to the involvement of multiple stakeholders especially during the fuzzy phases (requirements and design). Software inspections are commonly used in industry to detect and fix problems in requirements and design artifacts, thereby mitigating the fault propagation to later phases where the same faults are harder to find and fix. The output of an inspection process is list of faults that are present in software requirements specification document (SRS). The artifact author must manually read through the reviews and differentiate between true-faults and false-positives before fixing the faults. The first goal of this research is to automate the detection of useful vs. non-useful reviews. Next, post-inspection, requirements author has to manually extract key problematic topics from useful reviews that can be mapped to individual requirements in an SRS to identify fault-prone requirements. The second goal of this research is to automate this mapping by employing Key phrase extraction (KPE) algorithms and semantic analysis (SA) approaches to identify fault-prone requirements. During fault-fixations, the author has to manually verify the requirements that could have been impacted by a fix. The third goal of my research is to assist the authors post-inspection to handle change impact analysis (CIA) during fault fixation using NL processing with semantic analysis and mining solutions from graph theory. The selection of quality inspectors during inspections is pertinent to be able to carry out post-inspection tasks accurately. The fourth goal of this research is to identify skilled inspectors using various classification and feature selection approaches. The dissertation has led to the development of automated solution that can identify useful reviews, help identify skilled inspectors, extract most prominent topics/keyphrases from fault logs; and help RE author during the fault-fixation post inspection.Item Increasing the Predictive Potential of Machine Learning Models for Enhancing Cybersecurity(North Dakota State University, 2021) Ahsan, Mostofa KamrulNetworks have an increasing influence on our modern life, making Cybersecurity an important field of research. Cybersecurity techniques mainly focus on antivirus software, firewalls and intrusion detection systems (IDSs), etc. These techniques protect networks from both internal and external attacks. This research is composed of three different essays. It highlights and improves the applications of machine learning techniques in the Cybersecurity domain. Since the feature size and observations of the cyber incident data are increasing with the growth of internet usage, conventional defense strategies against cyberattacks are getting invalid most of the time. On the other hand, the applications of machine learning tasks are getting better consistently to prevent cyber risks in a timely manner. For the last decade, machine learning and Cybersecurity have converged to enhance risk elimination. Since the cyber domain knowledge and adopting machine learning techniques do not align on the same page in the case of deployment of data-driven intelligent systems, there are inconsistencies where it is needed to bridge the gap. We have studied the most recent research works in this field and documented the most common issues regarding the implementation of machine learning algorithms in Cybersecurity. According to these findings, we have conducted research and experiments to improve the quality of service and security strength by discovering new approaches.Item Detecting Insider and Masquerade Attacks by Identifying Malicious User Behavior and Evaluating Trust in Cloud Computing and IoT Devices(North Dakota State University, 2019) Kambhampaty, Krishna KanthThere are a variety of communication mediums or devices for interaction. Users hop from one medium to another frequently. Though the increase in the number of devices brings convenience, it also raises security concerns. Provision of platform to users is as much important as its security. In this dissertation we propose a security approach that captures user behavior for identifying malicious activities. System users exhibit certain behavioral patterns while utilizing the resources. User behaviors such as device location, accessing certain files in a server, using a designated or specific user account etc. If this behavior is captured and compared with normal users’ behavior, anomalies can be detected. In our model, we have identified malicious users and have assigned trust value to each user accessing the system. When a user accesses new files on the servers that have not been previously accessed, accessing multiple accounts from the same device etc., these users are considered suspicious. If this behavior continues, they are categorized as ingenuine. A trust value is assigned to users. This value determines the trustworthiness of a user. Genuine users get higher trust value and ingenuine users get a lower trust value. The range of trust value varies from zero to one, with one being the highest trustworthiness and zero being the lowest. In our model, we have sixteen different features to track user behavior. These features evaluate users’ activities. From the time users’ log in to the system till they log out, users are monitored based on these sixteen features. These features determine whether the user is malicious. For instance, features such as accessing too many accounts, using proxy servers, too many incorrect logins attribute to suspicious activity. Higher the number of these features, more suspicious is the user. More such additional features contribute to lower trust value. Identifying malicious users could prevent and/or mitigate the attacks. This will enable in taking timely action against these users from performing any unauthorized or illegal actions. This could prevent insider and masquerade attacks. This application could be utilized in mobile, cloud and pervasive computing platforms.Item Extracting Useful Information and Building Predictive Models from Medical and Health-Care Data Using Machine Learning Techniques(North Dakota State University, 2020) Kabir, Md FaisalIn healthcare, a large number of medical data has emerged. To effectively use these data to improve healthcare outcomes, clinicians need to identify the relevant measures and apply the correct analysis methods for the type of data at hand. In this dissertation, we present various machine learning (ML) and data mining (DM) methods that could be applied to the type of data sets that are available in the healthcare area. The first part of the dissertation investigates DM methods on healthcare or medical data to find significant information in the form of rules. Class association rule mining, a variant of association rule mining, was used to obtain the rules with some targeted items or class labels. These rules can be used to improve public awareness of different cancer symptoms and could also be useful to initiate prevention strategies. In the second part of the thesis, ML techniques have been applied in healthcare or medical data to build a predictive model. Three different classification techniques on a real-world breast cancer risk factor data set have been investigated. Due to the imbalance characteristics of the data set various resampling methods were used before applying the classifiers. It is shown that there was a significant improvement in performance when applying a resampling technique as compared to applying no resampling technique. Moreover, super learning technique that uses multiple base learners, have been investigated to boost the performance of classification models. Two different forms of super learner have been investigated - the first one uses two base learners while the second one uses three base learners. The models were then evaluated against well-known benchmark data sets related to the healthcare domain and the results showed that the SL model performs better than the individual classifier and the baseline ensemble. Finally, we assessed cancer-relevant genes of prostate cancer with the most significant correlations with the clinical outcome of the sample type and the overall survival. Rules from the RNA-sequencing of prostate cancer patients was discovered. Moreover, we built the regression model and from the model rules for predicting the survival time of patients were generated.