Computer Science Doctoral Work
Permanent URI for this collectionhdl:10365/32551
Browse
Recent Submissions
Item AI approaches in personalized meal planning for a multi criteria problem(North Dakota State University, 2024) Amiri, MaryamFood is one of the necessities of life. The food we consume every day provides us with the nutrition we need to have energy. However, food plays a more significant role in life. There is a relationship between food, culture, family, and society [1]. Since ancient civilization, people have realized the correlation between food and healthiness. Earlier, Physicians were treating people by prescribing special recipes. Last century, assorted studies investigated the impact meals have on human nutritional intake and the different diseases connected to it. There have been numerous other studies that focused on the required nutritional intake to ensure a good amount of energy for well-being in humans. A person who advises individuals on their food and nutrition is known as a dietarian and nutritionist. Nowadays nutritionists are experts in the use of food and nutrition to promote health and manage disease. They suggest several diet rules and food recommendations to assist people in living a healthy life. Due to technological advancements, previous time-consuming issues that required human attention are now being solved by utilizing automated procedures machines. Meal planning is one of the attractive domains that recently has received great notation by researchers who are using machine learning techniques in it. In general, those studies were performed to use extracted nutrition knowledge and food information for designing an automated meal planning system. However, in the majority of published research, the user’s preferences were an ignored feature. In this research, my journey through developing automated meal planning systems unfolds across distinct projects, each building upon the insights and advancements of its predecessors. Starting with a focus on incorporating user preferences, the exploration evolved through successive iterations, seeking to mirror the complexities of real-world decision-making more accurately. This progression led to the integration of advanced methodologies spanning artificial intelligence, optimization, multi-criteria decision making, and fuzzy logic. The ultimate aim was to refine and enhance the systems to not only align with users’ dietary restrictions and preferences but also to adapt to user feedback, thereby continually improving their efficacy and personalization. Through this comprehensive approach, the research endeavors to contribute novel solutions to the nuanced challenges of personalized meal planning.Item Computational Methods for Bulk and Single-cell Chromatin Interaction Data(North Dakota State University, 2024) Bulathsinghalage, ChanakaChromatin interactions occur when the physical regions of chromatin in close proximity interact with each other inside the nucleus. Analyzing chromatin interactions plays a crucial role in deciphering the spatial organization of the genome. Identifying the significant interactions and their functionalities reveals great insights on gene expressions, gene regulations and genetic diseases such as cancer. In addition, single cell chromatin interaction data is important to understand the chromatin structure changes, diversity among individual cells, and the genomics differences between different cell types. In recent years, Hi-C, chromosome conformation capture with high throughput sequencing, has gained widespread popularity for its ability to map genome-wide chromatin interactions in a single experiment and it is capable of extracting both single cell and bulk chromatin interaction data. With the evolution of experimental methods like Hi-C, computational tools are essential to efficiently and accurately process the vast amount of genomic data. Since the experiment costs are notably higher, optimized computational tools and methods are needed to extract most possible information from the data. Moreover, processing single cell Hi-C data imposes number of challenges due to its sparseness and limited interaction counts. So the development of computational methods and tools to process data from both single cell Hi-C and bulk Hi-C technologies are focused in this work and those are proven to be enhancing the efficiency and accuracy of Hi-C data processing pipelines. In this dissertation, each chapter consists of a single individual method or a tool to enhance chromatin interaction processing pipelines and the final chapter focuses on the interplay between epigenetic data and chromatin interactions data. The studies that are focused on building computational methods include increasing data read accuracy for bulk Hi-C, identifying statistically significant interactions at single cell Hi-C data, and imputation of single cell Hi-C data to improve quality and quantity of raw reads. It is anticipated that the utilization of the tools and methods outlined in these studies will significantly enhance the workflows of future research on chromatin organization and its correlation with cellular functions and genetic diseases.Item Virtual-Experiment-Driven Process Model (VEDPM)(North Dakota State University, 2010) Lua, Chin AikComputer simulations are the last resort for many complex problems such as swarm applications. However, to the best of the author's knowledge, there is no convincing work in proving ''What You Simulate ls What You See'' (WYSIWYS). Many models are built on long, subjective code that is prone to abnormalities, which are about corrupted virtual scientific laws rather than software bugs. Thus, the task of validating scientific simulations is very difficult, if not impossible. This dissertation provides a new process methodology for solving the problems above: Virtual-Experiment-Driven Process Model (VEDPM). VEDPM employs simple yet sound virtual experiments for verifying simple, short virtual laws. The proven laws, in turn, are utilized for developing valid models that can achieve real goals. The resulted simulations (or data) from proven models arc WYS1WYS. Two complex swarm applications have been developed rigorously and successfully via VEDPM--proving that VEDPM is workable. In addition, the author also provides innovative constructs for developing autonomous unmanned vehicles--swarm software architecture and a modified subsumption control scheme, and their design philosophies. The constructs are used repeatedly to enable unmanned vehicles to switch behaviors autonomously via a simple control signal.Item Blockchain-Based Trust Model: Alleviating the Threat of Malicious Cyber-Attacks(North Dakota State University, 2020) Bugalwi, Ahmed YoussefOnline communities provide a unique environment where interactions performed among its subscribers who have shared interest. Members of these virtual communities are typically classified as trustworthy and untrustworthy. Trust and reputation became indispensable properties due to the rapid growth of uncertainty and risk. This risk is a result of cyber-attacks carried out by untrustworthy actors. A malicious attack may produce misleading information making the community unreliable. Trust mechanism is a substantial instrument for empowering safe functioning within a community. Most virtual communities are centralized, which implies that they own, manage, and control trust information without given permission from the legitimate owner. The problem of ownership arises as actors may lose their reputations if the community decided to shut down its business. Sharing information is another valuable feature that aids lessening the impact of dishonest behavior. A new trust model called “TrustMe” was developed in this research as a reliable mechanism that generates precise trust information for virtual communities. TrustMe consists of several factors that aim to confuse untrustworthy actors, and to make the generated trust score is hardly reversed. A blockchain-based trust model is also developed to address the problem of ownership as well as offering a decentralized information sharing mechanism through a distributed application called “DATTC.” The efficiency of the proposed models was identified by conducting various analytic experimental studies. An unsupervised machine learning method (density-based clustering) was applied using two different datasets. Also, graph analysis was conducted to study the evolvement of communities and trust by finding connections between graph metrics and trust scores generated by TrustMe. Finally, a set of simulations using stochastic models to evaluate the accuracy and success rates of TrustMe, and a simulation set mimicked the blockchain-model in alleviating the influence of Sybil attack. The relationships among actors were hypothesized as actors divided into trustworthy and untrustworthy performing cooperative and malicious attacks. The results of the study prove that TrustMe can be promising and support the first hypothesis as TrustMe outperformed other trust models. Additionally, the results confirm that the blockchain-based trust model efficiently mitigates malicious cyber-attack by employing cross-community trust and preserves ownership property.Item Contributing Factors Promoting Success for Females in Computing: A Comparative Study(North Dakota State University, 2022) Gronneberg, BethlehemDespite the growing global demand for Computer Science (CS) professionals, their high earning potential, and diversified career paths (U.S. BLS 2021, UNESCO 2017), a critical gap exists between enrollment and graduation rates among female students in computing fields across the world (Raigoza 2017, Hailu 2018, UNESCO 2017, Bennedsen and Caspersen 2007). The largest dropout point occurs during the first two years of their CS studies (Giannakos, et al., 2017). The purpose of this parallelly convergent mixed-methods research was to comparatively investigate, describe and analyze factors correlated to the experiences and perceptions of female undergraduates as it relates to their persistence in CS/Software Engineering (SE) degrees, conducted in two public universities in the U.S. & Ethiopia. Anchored in Tinto’s theory of retention, the quantitative part of the study examined three possible predictive factors of success for students who were enrolled in the first two CS/SE courses and evaluated differences between genders and institutions on those factors. Pearson’s correlation coefficient tests were applied to test the hypothesis that the perceptions of Degree’s Usefulness (DU), Previously Acquired Knowledge (PAK) and Cognitive Attitude (CA) correlate to the decision to persist for the research participants. The results showed a statistically significant positive correlation between perceptions of DU, the influence of PAK, and the decision to persist. Two sample t-tests revealed gender and institutional differences exhibited in the influence of PAK and CA. The qualitative part of the study reported 12 contributing factors of success for graduating class of females in CS/SE using a unique approach of sentiment analysis and topic modeling from the domain of Natural Language Processing (NLP) through the interpretation of auto transcribed interview responses.Item Pattern Recognition and Quantifying Associations Within Entities of Data Driven Systems for Improving Model Interpretability(North Dakota State University, 2022) Roy, ArighnaDiscovering associations among entities of a system plays an important role in data science. The majority of the data science related problems have become heavily dependent on Machine Learning (ML) since the rise of computation power. However, the majority of the machine learning approaches rely on improving the performance of the algorithm by optimizing an objective function, at the cost of compromising the interpretability of the models. A new branch of machine learning focuses on model interpretability by explaining the models in various ways. The foundation of model interpretability is built on extracting patterns from the behavior of the models and the related entities. Gradually, Machine learning has spread its wing to almost every industry. This dissertation focuses on the data science application to three such domains. Firstly, assisting environmental sustainability by identifying patterns within its components. Machine learning techniques play an important role here in many ways. Discovering associations between environmental components and agriculture is one such topic. Secondly, improving the robustness of Artificial Intelligence applications on embedded systems. AI has reached our day-to-day life through embedded systems. The technical advancement of embedded systems made it possible to accommodate ML. However, embedded systems are susceptible to various types of errors, hence there is a huge scope of recovery systems for ML models deployed on embedded systems. Third, bringing the user communities of the entertainment systems across the globe together. Online streaming of entertainment has already leveraged ML to provide educated recommendations to its users. However, entertainment content can sometimes be isolated due to demographic barriers. ML can identify the hidden aspects of these contents which would not be possible otherwise. In subsequent paragraphs, various challenges concerning these topics will be introduced and corresponding solutions will be followed that can address those challenges.Item Understanding the Patterns of Microservice Intercommunication From A Developer Perspective(North Dakota State University, 2022) Nadeem, AnasMicroservices Architecture is the modern paradigm for designing software. Based on the divide-and-conquer strategy, microservices architecture organizes the application by furnishing it with a fine-level granularity. Each microservice has a well-defined responsibility and multiple microservices communicate with each other toward a common goal. A momentous decision in designing microservices applications is the choice between orchestration or choreography-based modes as the underlying intercommunication pattern. Choreography entails that microservices work autonomously while orchestration entails that a central coordinator directs the interaction between services. We arbitrate this decision from a developer’s perspective by empirically evaluating the properties of a benchmark system mapped into both orchestration and choreographed topologies. In this research, we document our experience from implementing and debugging this system. Our studies demonstrate microservices composed using orchestration exhibit desirable inherent characteristics that make microservice code easier to implement, debug, and scale.Item Software Engineering Methodologies in Developing a Railway Condition Monitoring System(North Dakota State University, 2022) Bhardwaj, BhavanaWith the continuous growth of rail track geometry irregularities due to aging, environmental factors, and wheel loads, rail track requires frequent maintenance. Railroads often rely on the precise and correct localization and identification of track irregularities that significantly destroy infrastructure and create life-threatening environments. Therefore, monitoring the conditions of the railroad tracks is vitally essential for ensuring safety, reliability, and cost-efficiency of operations. Consequently, agencies inspect all tracks twice a week per federal track safety regulations. However, their existing methods of track inspection are expensive, slow, require track closure, and pose a high risk to workers. The technical constraints of these methods impede network-wide scaling to all railroads. More frequent, continuous, and network-wide monitoring to detect and fix irregularities can help to reduce the risk of harm, fatalities, property damages, and possible financial losses. This work introduces and develops a generalized, scalable, affordable inspection and monitoring system called Railway Autonomous Inspection Localization System (RAILS). In particular, the study aims to detect, locate, and characterize track-related issues. The research focuses on designing RAILS architecture, implementing data collection, and building algorithms that include inertial signal feature extraction, data processing, signal alignment, and signal filtering. Case studies validate and characterize system accuracy by estimating the position of detected irregularities based on a linear referencing system. In one case study, the estimated position of the irregularity is compared with the actual position of ground truth data (GTA) observed by a railroad inspector. In another case study, a railroad inspector verifies the estimated position of the irregularity to demonstrate the system’s effectiveness and affordability for practical applications. Therefore, railroad agencies employing the developed methods will benefit from reliable track and equipment conditions to make informed decisions that will lead to resource optimization. The conclusion of this research outlines the significant potential of the proposed system, including limitations and future work for practical, real-time, and autonomous implementation.Item A New Structural Feature for Lysine Post-Translation Modification Prediction Using Machine Learning(North Dakota State University, 2021) Liu, YuanLysine post-translational modification (PTM) plays a vital role in modulating multiple biological processes and functions. Lab-based lysine PTM identification is laborious and time-consuming, which impede large-scale screening. Many computational tools have been proposed to facilitate PTM identification in silico using sequence-based protein features. Protein structure is another crucial aspect of protein that should not be neglected. To our best knowledge, there is no structural feature dedicated to PTM identification. We proposed a novel spatial feature that captures rich structure information in a succinct form. The dimension of this feature is much lower than that of other sequence and structural features that were used in previous studies. When the proposed feature was used to predict lysine malonylation sites, it achieved performance comparable to other state-of-the-art methods that had much higher dimension. The low dimensionality of the proposed feature would be very helpful for building interpretable predictors for various applications involving protein structures. We further attempted to develop a reliable benchmark dataset and evaluate performance of multiple sequence- and structure-based features in prediction. The result indicated that our proposed spatial structure achieved competent performance and that other structural features can also make contribution to PTM prediction. Even though utilizing protein structure in lysine PTM prediction is still in the early stage, we can expect structure-based features to play a more crucial role in PTM site prediction.Item Decision-Making for Self-Replicating 3D Printed Robot Systems(North Dakota State University, 2021) Jones, Andrew BurkhardThis work addresses decision-making for robot systems that can self-replicate. With the advent of 3D printing technology, the development of self-replicating robot systems is more feasible to implement than it was previously. This opens the door to various opportunities in this area of robotics. A major benefit of having robots that are able to make more robots is that the survivability of the multi-robot system increases dramatically. A single surviving robot that has the necessary capabilities to self-replicate could prospectively repopulate an entire ‘colony’ of robots, given sufficient resources and time. This gives robots an opportunity to take more risks in trying to accomplish an objective in missions where robots must be used instead of humans due to distance, environmental, safety and other concerns. Autonomy is key to maximizing the efficacy of this functionality (or allowing this functionality in a communication limited/denied environment) for this type of robotic system. A challenge of analyzing self-replicating robot systems, and the decision-making algorithms for those systems, is that there isn’t currently a standard means to simulate these systems. Thus, for the purpose of this work, a simulation system was developed to do just this. Experiments were conducted using this simulation system and the results are presented. In this dissertation, the configuration and decision-making of self-replicating 3D printed robot systems are analyzed. First, an introduction to the concepts and topics is provided. Second, relevant background information is reviewed. Third, a simulation, used to model self-replicating robot systems to perform the experiments in later chapters, is detailed. Then, experiments are conducted utilizing this simulation model. These include the analysis of the impact of replication categories on system efficacy, the analysis of the comparative performance of multiple decision-making algorithms, and cybersecurity threats for self-replicating robot systems. For each, data is presented and analyzed, and conclusions are drawn. Finally, this dissertation concludes with a summary of the results presented throughout the document and a discussion of the broader findings from the experiments.Item Addressing Challenges in Data Privacy and Security: Various Approaches to Secure Data(North Dakota State University, 2021) Pattanayak, SayanticaEmerging neural networks based machine learning techniques such as deep learning and its variants have shown tremendous potential in many application domains. However, the neural network models raise serious privacy concerns due to the risk of leakage of highly privacy-sensitive data. In this dissertation, we propose various techniques to hide the sensitive information and also evaluate the performance and efficacy of our proposed models. In our first research work we propose a model, which can both encrypt and decrypt a ciphertext. Our model is based on symmetric key encryption and back propagation neural network. Our model takes the decimal values and converts them to ciphertext and then again to decimal values. In our second research work, we propose a remote password authentication scheme using neural network. In this model, we have shown how an user can communicate securely with more than one server. A user registers himself / herself with a trusted authority and gets a user id and a password. The user uses the password and the user id to login to one or multiple servers. The servers can validate the legitimacy of the user. Our experiments use different classifiers to evaluate the accuracy and the efficiency of our proposed model. In our third research paper, we develop a technique to securely send patient information to different organizations. Our technique used different fuzzy membership functions to hide the sensitive information about patients. In our fourth research paper, we introduced an approach to substitute the sensitive attributes with the non-sensitive attributes. We divide the data set into three different subsets: desired, sensitive and non-sensitive subsets. The output of the denoising autoencoder will only be the desired and non-sensitive subsets. The sensitive subsets are hidden by the non-sensitive subsets. We evaluate the efficacy of our predictive model using three different flavors of autoencoders. We measure the F1-score of our model against each of the three autoencoders. As our predictive model is based on privacy, we have also used a Generative Adversarial Neural Network (GAN), which is used to show to what extend our model is secure.Item Assessment of Engineering Methodologies for Increasing CubeSat Mission Success Rates(North Dakota State University, 2021) Alanazi, AbdulazizIn the last twenty years, CubeSat Systems have gained popularity in educational institutions and commercial industries. CubeSats have attracted educators and manufacturers due to their ability to be quickly produced and their low cost, and small sizes and masses. However, while developers can swiftly design and build their CubeSats, with a team of students from different disciplines using COTS parts, this does not guarantee that the CubeSat mission will be successful. Statistics show that mission failure is frequent. For example, out of 270 “university-class” CubeSats, 139 failed in their mission between 2002 and 2016 [1]. Statistics also show that the average failure rate of CubeSat missions is higher in academic and research institutions than in commercial or government organizations. Reasons for failure include power issues, mechanical, communications and system design issues. Some researchers have suggested that the problem lies within the design and development process itself, in that CubeSat developers mainly focus on system and component level designs, while neglecting requirements elicitation and other key system engineering activities [2]. To increase the success rate of CubeSat missions, systems engineering steps and processes need to be implemented in the development cycle. Using these processes can also help CubeSat designs and systems to become more secure, reusable, and modular. This research identifies multiple independent variables and measures their effectiveness for driving CubeSat systems’ mission success. It seeks to increase the CubeSat mission success rate by developing systems engineering methodologies and tools. It also evaluates the benefits of applying systems engineering methodologies and practices, which can be applied at different stages of CubeSat project lifecycle and across different CubeSat missions.Item Trust and Anti-Autonomy Modelling of Autonomous Systems(North Dakota State University, 2020) Rastogi, AakankshaHuman trust in autonomous vehicles is built upon their safe and secure operability in the most ethical, law abiding manner possible. Despite the technological advancements that autonomous vehicles are equipped with, their perplexing operation on roads often give away telltale signs of underlying vulnerabilities to threats and attack strategies which can flag their anti-autonomous traits. Anti-autonomy refers to any conduct of autonomous vehicles that goes against the principles of autonomy and subsequently resulting in their immobilized operations during unexpected roadway situations. The concept of trust is fluid, which is made complicated by anti-autonomous behavior of autonomous vehicles and affects the dimensions of intentionality, human interaction, and adoption of autonomous vehicles. Trust is impacted by intentionality, safety and risks associated with autonomous vehicles and their overall perception by human drivers, pedestrians and bicyclist sharing the roads with them. The presence of collision data involving human drivers of other cars, pedestrian, bicyclists, resulting in injuries and damages poses a significant negative impact on trust in autonomous vehicle technology. This dissertation presents and evaluates a new and innovative anti-autonomy NoTrust Artificial Neural Network model by utilizing collision data reports involving autonomous vehicles provided by California DMV from October 2014 to March 2020, which is the latest reported data. This data was augmented, labelled, classified, pre-processed, and then applied towards creation of the NoTrust ANN model using linear sequential model libraries in Keras over Tensorflow. This model was used to predict trust in autonomous vehicles. The trained model was able to achieve 100% accuracy, which was evident in the results of model compilation and training, plots of validation and training accuracies and losses. Model evaluations and predictions were used to comprehend characteristics of trust, intentionality and anti-autonomy and helped establish a relationship between them and reflected inter-dependencies among trust, intentionality, anti-autonomy, risk, and safety. Additional analyses of collision reports data was performed and the impact of several contributing factors of collisions such as vehicle driving mode, damages sustained by the vehicle, pedestrian and bicyclist involved in collisions, weather conditions, roadway surface, lighting conditions, movement of vehicle preceding collision and type of collisions was illustrated.Item Increasing the Predictive Potential of Machine Learning Models for Enhancing Cybersecurity(North Dakota State University, 2021) Ahsan, Mostofa KamrulNetworks have an increasing influence on our modern life, making Cybersecurity an important field of research. Cybersecurity techniques mainly focus on antivirus software, firewalls and intrusion detection systems (IDSs), etc. These techniques protect networks from both internal and external attacks. This research is composed of three different essays. It highlights and improves the applications of machine learning techniques in the Cybersecurity domain. Since the feature size and observations of the cyber incident data are increasing with the growth of internet usage, conventional defense strategies against cyberattacks are getting invalid most of the time. On the other hand, the applications of machine learning tasks are getting better consistently to prevent cyber risks in a timely manner. For the last decade, machine learning and Cybersecurity have converged to enhance risk elimination. Since the cyber domain knowledge and adopting machine learning techniques do not align on the same page in the case of deployment of data-driven intelligent systems, there are inconsistencies where it is needed to bridge the gap. We have studied the most recent research works in this field and documented the most common issues regarding the implementation of machine learning algorithms in Cybersecurity. According to these findings, we have conducted research and experiments to improve the quality of service and security strength by discovering new approaches.Item Extracting Useful Information and Building Predictive Models from Medical and Health-Care Data Using Machine Learning Techniques(North Dakota State University, 2020) Kabir, Md FaisalIn healthcare, a large number of medical data has emerged. To effectively use these data to improve healthcare outcomes, clinicians need to identify the relevant measures and apply the correct analysis methods for the type of data at hand. In this dissertation, we present various machine learning (ML) and data mining (DM) methods that could be applied to the type of data sets that are available in the healthcare area. The first part of the dissertation investigates DM methods on healthcare or medical data to find significant information in the form of rules. Class association rule mining, a variant of association rule mining, was used to obtain the rules with some targeted items or class labels. These rules can be used to improve public awareness of different cancer symptoms and could also be useful to initiate prevention strategies. In the second part of the thesis, ML techniques have been applied in healthcare or medical data to build a predictive model. Three different classification techniques on a real-world breast cancer risk factor data set have been investigated. Due to the imbalance characteristics of the data set various resampling methods were used before applying the classifiers. It is shown that there was a significant improvement in performance when applying a resampling technique as compared to applying no resampling technique. Moreover, super learning technique that uses multiple base learners, have been investigated to boost the performance of classification models. Two different forms of super learner have been investigated - the first one uses two base learners while the second one uses three base learners. The models were then evaluated against well-known benchmark data sets related to the healthcare domain and the results showed that the SL model performs better than the individual classifier and the baseline ensemble. Finally, we assessed cancer-relevant genes of prostate cancer with the most significant correlations with the clinical outcome of the sample type and the overall survival. Rules from the RNA-sequencing of prostate cancer patients was discovered. Moreover, we built the regression model and from the model rules for predicting the survival time of patients were generated.Item Developing and Validating Active Learning Engagement Strategies to Improve Students' Understanding of Programming and Software Engineering Concepts(North Dakota State University, 2020) Brown, Tamaike MarianeIntroductory computer programming course is one of the fundamental courses in computer science. Students enrolled in computer science courses at the college or university have been reported to lack motivation, and engagement when learning introductory programming (CS1). Traditional classrooms with lecture-based delivery of content do not meet the needs of the students that are being exposed to programming courses for the first time. Students enrolled in first year programming courses are better served with a platform that can provide them with a self-paced learning environment, quicker feedback, easier access to information and different level of learning content/assessment that can keep them motivated and engaged. Introductory programming courses (hereafter referred to as CS1 and CS2 courses) also include students from non-STEM majors who struggle at learning basic programming concepts. Studies report that CS1 courses nationally have high dropout rates, ranging from anywhere between 30-40% on an average. Some of the reasons cited by researchers for high dropout rate are lack of resource support, motivation, lack of engagement, lack of motivation, lack of practice and feedback, and confidence. Although the interest to address these issues in computing is expanding, the dropout rate for CS1/CS2 courses remains high. The software engineering industry often believes that the academic community is missing the mark in the education of computer science students. Employers recognize that students entering the workforce directly from university training often do not have the complete set of software development skills that they will need to be productive, especially in large software development companies.Item Development and Validation of Feedback-Based Testing Tutor Tool to Support Software Testing Pedagogy(North Dakota State University, 2020) Cordova, Lucas PascualCurrent testing education tools provide coverage deficiency feedback that either mimics industry code coverage tools or enumerates through the associated instructor tests that were absent from the student’s test suite. While useful, these types of feedback mechanisms are akin to revealing the solution and can inadvertently lead a student down a trial-and-error path, rather than using a systematic approach. In addition to an inferior learning experience, a student may become dependent on the presence of this feedback in the future. Considering these drawbacks, there exists an opportunity to develop and investigate alternative feedback mechanisms that promote positive reinforcement of testing concepts. We believe that using an inquiry-based learning approach is a better alternative (to simply providing the answers) where students can construct and reconstruct their knowledge through discovery and guided learning techniques. To facilitate this, we present Testing Tutor, a web-based assignment submission platform to support different levels of testing pedagogy via a customizable feedback engine. This dissertation is based on the experiences of using Testing Tutor at different levels of the curriculum. The results indicate that the groups using conceptual feedback produced higher-quality test suites (achieved higher average code coverage, fewer redundant tests, and higher rates of improvement) than the groups that received traditional code coverage feedback. Furthermore, students also produced higher quality test suites when the conceptual feedback was tailored to task-level for lower division student groups and self-regulating-level for upper division student groups. We plan to perform additional studies with the following objectives: 1) improve the feedback mechanisms; 2) understand the effectiveness of Testing Tutor’s feedback mechanisms at different levels of the curriculum; and 3) understand how Testing Tutor can be used as a tool for instructors to gauge learning and determine whether intervention is necessary to improve students’ learning.Item Scalable Particle Swarm Optimization and Differential Evolution Approaches Applied to Classification(North Dakota State University, 2019) Al-Sawwa, JamilApplying the nature-inspired methods in the data mining area has been gaining more attention by researchers. Classification is one of the data mining tasks which aims to analyze historical data by discovering hidden relationships between the input and the output that would help to predict an accurate outcome for an unseen input. The classification algorithms based on nature-inspired methods have been successfully used in numerous applications such as medicine and agriculture. However, the amount of data that has been collected or generated in these areas has been increasing exponentially. Thus, extracting useful information from large data requires computational time and consumes memory space. Besides this, many algorithms suffer from not being able to handle imbalanced data. Apache Spark is an in-memory computing big data framework that runs on a cluster of nodes. Apache Spark is more efficient for handling iterative and interactive jobs and runs 100 times faster than Hadoop Map-Reduce for various applications. However, the challenge is to find a scalable solution using Apache Spark for the optimization-based classification algorithms that would scale very well with large data. In this dissertation, we firstly introduce new variants of a centroid-based particle swarm optimization (CPSO) classification algorithm in order to improve its performance in terms of misclassification rate. Furthermore, a scalable particle swarm optimization classification algorithm (SCPSO) is designed and implemented using Apache Spark. Two variants of SCPSO, namely SCPSO-F1 and SCPSO-F2, are proposed based on different fitness functions. The experiments revealed that SCPSO-F1 and SCPSO-F2 utilize the cluster of nodes efficiently and achieve good scalability results. Moreover, we propose a cost-sensitive differential evolution classification algorithm to improve the performance of the differential evolution classification algorithm when applied to imbalanced data sets. The experimental results demonstrate that the proposed algorithm efficiently handles highly imbalanced binary data sets compared to the current variants of differential evolution classification algorithms. Finally, we designed and implemented a parallel version of a cost-sensitive differential evolution classifier using the Spark framework. The experiments revealed that the proposed algorithm achieved good speedup and scaleup results and obtained good performance.Item User-Behavior Trust Modeling in Cloud Security(North Dakota State University, 2019) Alruwaythi, MaryamWith the cloud computing increasing in popularity by providing a massive number of services such as recourses and data center, the number of attacks is increasing. Security is a basic concern in cloud computing, and threats can occur both internally and externally. Users can access the cloud infrastructure for software, operating systems, and network infrastructure provided by the cloud service providers (CSPs). Evaluating users’ behavior in the cloud-computing infrastructure is becoming more important for both cloud users (CSs) and the CSPs that must ensure safety for users accessing the cloud. Because user authentication alone is not enough to ensure the users’ safety and due to the rise of insider threats, the users’ behavior must be monitored. User-behavior trust plays a critical role in ensuring the users’ authenticity as well as safety. To address the research problem, we proposed two models to monitor the users’ behavior in the cloud and then to calculate the users’ trust value. The proposed models improve the current trust models. Our proposed models address the issue of trust fraud with the concept of “slow increase.” The proposed models deal with malicious conduct by constantly aggravating the penalty approach (principle of “fast decline”). The proposed models reflect the users’ latest credibility through excluding the expired trust policy in the trust calculation. The proposed models evaluate users based on a large amount of evidence which ensures that the users’ trust value is stable. We generate a dataset to simulate audit logs containing the designed user-behavior patterns. Thus, we use the dataset to evaluate our proposed models.Item Using Learning Styles to Improve Software Requirements Quality: An Empirical Investigation(North Dakota State University, 2017) Goswami, AnuragThe success of a software organization depends upon its ability to deliver a quality software product within time and budget constraints. To ensure the delivery of quality software, software inspections have proven to be an effective method that aid developers to detect and remove problems from artifacts during the early stages of software lifecycle. In spite of the reported benefits of inspection, the effectiveness of the inspection process is highly dependent on the varying ability of individual inspectors. Software engineering research focused at understanding the factors (e.g., education level, experience) that can positively impact the individual’s and team inspection effectiveness have met with limited success. This dissertation tries to leverage the psychology research on Learning Styles (LS) – a measure of an individuals’ preference to perceive and process information to help understand and improve the individual and team inspection performance. To gain quantitative and qualitative insights into the LSs of software inspectors, this dissertation reports the results from a series of empirical studies in university and industry settings to evaluate the impact of LSs on individual and team inspection performance. This dissertation aims to help software managers create effective and efficient inspection teams based on LSs and reading patterns of individual inspectors thereby improving the software quality.