67 results
Search Results
Now showing 1 - 10 of 67
Item Adaptive Regression Testing Strategies for Cost-Effective Regression Testing(North Dakota State University, 2013) Schwartz, Amanda JoRegression testing is an important but expensive part of the software development life-cycle. Many different techniques have been proposed for reducing the cost of regression testing. To date, much research has been performed comparing regression testing techniques, but very little research has been performed to aid practitioners and researchers in choosing the most cost-effective technique for a particular regression testing session. One recent study investigated this problem and proposed Adaptive Regression Testing (ART) strategies to aid practitioners in choosing the most cost-effective technique for a specific version of a software system. The results of this study showed that the techniques chosen by the ART strategy were more cost-effective than techniques that did not consider system lifetime and testing processes. This work has several limitations, however. First, it only considers one ART strategy. There are many other strategies which could be developed and studied that could be more cost-effective. Second, the ART strategy used the Analytical Hierarchy Process (AHP). The AHP method is subjective to the weights made by the decision maker. Also, the AHP method is very time consuming because it requires many pairwise comparisons. Pairwise comparisons also limit the scalability of the approach and are often found to be inconsistent. This work proposes three new ART strategies to address these limitations. One strategy utilizing the fuzzy AHP method is proposed to address imprecision in the judgment made by the decision maker. A second strategy utilizing a fuzzy expert system is proposed to reduce the time required by the decision maker, eliminate inconsistencies due to pairwise comparisons, and increase scalability. A third strategy utilizing the Weighted Sum Model is proposed to study the performance of a simple, low cost strategy. Then, a series of empirical studies are performed to evaluate the new strategies. The results of the studies show that the strategies proposed in this work are more cost-effective than the strategy presented in the previous study.Item Mining for Significant Information from Unstructured and Structured Biological Data and Its Applications(North Dakota State University, 2012) Al-Azzam, Omar GhaziMassive amounts of biological data are being accumulated in science. Searching for significant meaningful information and patterns from different types of data is necessary towards gaining knowledge from these large amounts of data available to users. However, data mining techniques do not normally deal with significance. Integrating data mining techniques with standard statistical procedures provides a way for mining statistically signi- ficant, interesting information from both structured and unstructured data. In this dissertation, different algorithms for mining significant biological information from both unstructured and structured data are proposed. A weighted-density-based approach is presented for mining item data from unstructured textual representations. Different algorithms in the area of radiation hybrid mapping are developed for mining significant information from structured binary data. The proposed algorithms have different applications in the ordering problem in radiation hybrid mapping including: identifying unreliable markers, and building solid framework maps. Effectiveness of the proposed algorithms towards improving map stability is demonstrated. Map stability is determined based on resampling analysis. The proposed algorithms deal effectively and efficiently with multidimensional data and also reduce computational cost dramatically. Evaluation shows that the proposed algorithms outperform comparative methods in terms of both accuracy and computation cost.Item Heuristic Clustering with Secured Routing in Two Tier Sensor Networks(North Dakota State University, 2013) Gagneja, Kanwalinderjit KaurThis study addresses the management of Heterogeneous Sensor Networks (HSNs) in an area of interest. The use of sensors in our day-to-day life has increased dramatically, and in ten to fifteen years the sensor nodes may cover the whole world, and could be accessed through the Internet. Currently, sensors are in use for such things as vehicular movement tracking, nuclear power plant monitoring, fire incident reporting, traffic controlling, and environmental monitoring. There is vast potential for various applications, such as entertainment, drug trafficking, border surveillance, crisis management, under water environment monitoring, and smart spaces. So this research area has a lot of potential. The sensors have limited resources and researchers have invented methods to deal with the related issues. But security and routing in sensor networks and clustering of sensors are handled separately by past researchers. Since route selection directly depends on the position of the nodes and sets of resources may change dynamically, so cumulative and coordinated activities are essential to maintain the organizational structure of laid out sensors. So for conserving the sensor network energy, it is better if we follow a holistic approach taking care of both clustering and secure routing. In this research, we have developed an efficient key management approach with an improved tree routing algorithm for clustered heterogeneous sensor networks. The simulation results show that this scheme offers good security and uses less computation with substantial savings in memory requirements, when compared with some of other key management, clustering and routing techniques. The low end nodes are simple and low cost, while the high end nodes are costly but provide significantly more processing power. In this type of sensor network, the low end nodes are clustered and report to a high end node, which in turn uses a network backbone to route data to a base station. Initially, we partition the given area into Voronoi clusters. Voronoi diagrams generate polygonal clusters using Euclidian distance. Since sensor networks routing is multi-hopped, we apply a tabu search to adjust some of the nodes in the Voronoi clusters. The Voronoi clusters then work with hop counts instead of distance. When some event occurs in the network, low end nodes gather and forward data to cluster heads using the Secure Improved Tree Routing approach. The routing amongst the low end nodes, high end nodes and the base station is made secure and efficient by applying a 2-way handshaking secure Improved Tree Routing (ITR) technique. The secure ITR data routing procedure improves the energy efficiency of the network by reducing the number of hops utilized to reach the Base Station. We gain robustness and energy efficiency by reducing the vulnerability points in the network by employing alternatives to shortest path tree routing. In this way a complete solution is provided to the travelling data in a two tier heterogeneous sensor network by reducing the hop count, making it secure and energy efficient. Empirical evaluations show how the described algorithm performs with respect to delivery ratio, end to end delays, and energy usage.Item Towards Better Engineering of Enterprise Resource Planning Systems(North Dakota State University, 2016) Asgar, Talukdar SabbirIn spite of their high implementation failure rate, Enterprise Resource Planning (ERP) software remains a popular choice for most businesses. When it succeeds, ERP software provides effective integration of formerly isolated multiple systems. This integration yields significant business efficiencies. Replacing legacy systems with ERP software requires a great many trade-offs. We found that using bipartite graph can facilitate requirements elicitation in ERP procurement. It can have great impact on different activities in subsequent phases. Such activities include product line engineering, domain analysis, test-driven development and knowledge reuse in system development life cycle (SDLC). Bigraph representation of legacy and ERP requirements also helps determining the enhancement needs. It provides a better control for stakeholders to decide the scope for development and testing. ERP streamlines operations and flow of information across an organization. Business functions in ERP system are always master-data driven. Migration of data from legacy to ERP is therefore a critical success factor for ERP procurement projects. Correct data conversion is essential for integration and acceptance testing. It is a complex procedure and requires additional effort because of large volume of data. The architecture and design of data conversion process must ensure the referential integrity among different business modules. Our process for data conversion starts with a test-first approach. Next it conducts execution of conversion programs in parallel. Our parallel approach replaces the old sequential style of execution. Through this process we could correct data mapping errors and data anomalies before conversion. Parallelized execution drastically reduces the time needed for execution of conversion programs and provides more time for testing. We verified the feasibility of our approach by multiple industrial projects.Item Trust and Anti-Autonomy Modelling of Autonomous Systems(North Dakota State University, 2020) Rastogi, AakankshaHuman trust in autonomous vehicles is built upon their safe and secure operability in the most ethical, law abiding manner possible. Despite the technological advancements that autonomous vehicles are equipped with, their perplexing operation on roads often give away telltale signs of underlying vulnerabilities to threats and attack strategies which can flag their anti-autonomous traits. Anti-autonomy refers to any conduct of autonomous vehicles that goes against the principles of autonomy and subsequently resulting in their immobilized operations during unexpected roadway situations. The concept of trust is fluid, which is made complicated by anti-autonomous behavior of autonomous vehicles and affects the dimensions of intentionality, human interaction, and adoption of autonomous vehicles. Trust is impacted by intentionality, safety and risks associated with autonomous vehicles and their overall perception by human drivers, pedestrians and bicyclist sharing the roads with them. The presence of collision data involving human drivers of other cars, pedestrian, bicyclists, resulting in injuries and damages poses a significant negative impact on trust in autonomous vehicle technology. This dissertation presents and evaluates a new and innovative anti-autonomy NoTrust Artificial Neural Network model by utilizing collision data reports involving autonomous vehicles provided by California DMV from October 2014 to March 2020, which is the latest reported data. This data was augmented, labelled, classified, pre-processed, and then applied towards creation of the NoTrust ANN model using linear sequential model libraries in Keras over Tensorflow. This model was used to predict trust in autonomous vehicles. The trained model was able to achieve 100% accuracy, which was evident in the results of model compilation and training, plots of validation and training accuracies and losses. Model evaluations and predictions were used to comprehend characteristics of trust, intentionality and anti-autonomy and helped establish a relationship between them and reflected inter-dependencies among trust, intentionality, anti-autonomy, risk, and safety. Additional analyses of collision reports data was performed and the impact of several contributing factors of collisions such as vehicle driving mode, damages sustained by the vehicle, pedestrian and bicyclist involved in collisions, weather conditions, roadway surface, lighting conditions, movement of vehicle preceding collision and type of collisions was illustrated.Item Scalable Particle Swarm Optimization and Differential Evolution Approaches Applied to Classification(North Dakota State University, 2019) Al-Sawwa, JamilApplying the nature-inspired methods in the data mining area has been gaining more attention by researchers. Classification is one of the data mining tasks which aims to analyze historical data by discovering hidden relationships between the input and the output that would help to predict an accurate outcome for an unseen input. The classification algorithms based on nature-inspired methods have been successfully used in numerous applications such as medicine and agriculture. However, the amount of data that has been collected or generated in these areas has been increasing exponentially. Thus, extracting useful information from large data requires computational time and consumes memory space. Besides this, many algorithms suffer from not being able to handle imbalanced data. Apache Spark is an in-memory computing big data framework that runs on a cluster of nodes. Apache Spark is more efficient for handling iterative and interactive jobs and runs 100 times faster than Hadoop Map-Reduce for various applications. However, the challenge is to find a scalable solution using Apache Spark for the optimization-based classification algorithms that would scale very well with large data. In this dissertation, we firstly introduce new variants of a centroid-based particle swarm optimization (CPSO) classification algorithm in order to improve its performance in terms of misclassification rate. Furthermore, a scalable particle swarm optimization classification algorithm (SCPSO) is designed and implemented using Apache Spark. Two variants of SCPSO, namely SCPSO-F1 and SCPSO-F2, are proposed based on different fitness functions. The experiments revealed that SCPSO-F1 and SCPSO-F2 utilize the cluster of nodes efficiently and achieve good scalability results. Moreover, we propose a cost-sensitive differential evolution classification algorithm to improve the performance of the differential evolution classification algorithm when applied to imbalanced data sets. The experimental results demonstrate that the proposed algorithm efficiently handles highly imbalanced binary data sets compared to the current variants of differential evolution classification algorithms. Finally, we designed and implemented a parallel version of a cost-sensitive differential evolution classifier using the Spark framework. The experiments revealed that the proposed algorithm achieved good speedup and scaleup results and obtained good performance.Item Software Engineering Methodologies in Developing a Railway Condition Monitoring System(North Dakota State University, 2022) Bhardwaj, BhavanaWith the continuous growth of rail track geometry irregularities due to aging, environmental factors, and wheel loads, rail track requires frequent maintenance. Railroads often rely on the precise and correct localization and identification of track irregularities that significantly destroy infrastructure and create life-threatening environments. Therefore, monitoring the conditions of the railroad tracks is vitally essential for ensuring safety, reliability, and cost-efficiency of operations. Consequently, agencies inspect all tracks twice a week per federal track safety regulations. However, their existing methods of track inspection are expensive, slow, require track closure, and pose a high risk to workers. The technical constraints of these methods impede network-wide scaling to all railroads. More frequent, continuous, and network-wide monitoring to detect and fix irregularities can help to reduce the risk of harm, fatalities, property damages, and possible financial losses. This work introduces and develops a generalized, scalable, affordable inspection and monitoring system called Railway Autonomous Inspection Localization System (RAILS). In particular, the study aims to detect, locate, and characterize track-related issues. The research focuses on designing RAILS architecture, implementing data collection, and building algorithms that include inertial signal feature extraction, data processing, signal alignment, and signal filtering. Case studies validate and characterize system accuracy by estimating the position of detected irregularities based on a linear referencing system. In one case study, the estimated position of the irregularity is compared with the actual position of ground truth data (GTA) observed by a railroad inspector. In another case study, a railroad inspector verifies the estimated position of the irregularity to demonstrate the system’s effectiveness and affordability for practical applications. Therefore, railroad agencies employing the developed methods will benefit from reliable track and equipment conditions to make informed decisions that will lead to resource optimization. The conclusion of this research outlines the significant potential of the proposed system, including limitations and future work for practical, real-time, and autonomous implementation.Item User-Behavior Trust Modeling in Cloud Security(North Dakota State University, 2019) Alruwaythi, MaryamWith the cloud computing increasing in popularity by providing a massive number of services such as recourses and data center, the number of attacks is increasing. Security is a basic concern in cloud computing, and threats can occur both internally and externally. Users can access the cloud infrastructure for software, operating systems, and network infrastructure provided by the cloud service providers (CSPs). Evaluating users’ behavior in the cloud-computing infrastructure is becoming more important for both cloud users (CSs) and the CSPs that must ensure safety for users accessing the cloud. Because user authentication alone is not enough to ensure the users’ safety and due to the rise of insider threats, the users’ behavior must be monitored. User-behavior trust plays a critical role in ensuring the users’ authenticity as well as safety. To address the research problem, we proposed two models to monitor the users’ behavior in the cloud and then to calculate the users’ trust value. The proposed models improve the current trust models. Our proposed models address the issue of trust fraud with the concept of “slow increase.” The proposed models deal with malicious conduct by constantly aggravating the penalty approach (principle of “fast decline”). The proposed models reflect the users’ latest credibility through excluding the expired trust policy in the trust calculation. The proposed models evaluate users based on a large amount of evidence which ensures that the users’ trust value is stable. We generate a dataset to simulate audit logs containing the designed user-behavior patterns. Thus, we use the dataset to evaluate our proposed models.Item A New Structural Feature for Lysine Post-Translation Modification Prediction Using Machine Learning(North Dakota State University, 2021) Liu, YuanLysine post-translational modification (PTM) plays a vital role in modulating multiple biological processes and functions. Lab-based lysine PTM identification is laborious and time-consuming, which impede large-scale screening. Many computational tools have been proposed to facilitate PTM identification in silico using sequence-based protein features. Protein structure is another crucial aspect of protein that should not be neglected. To our best knowledge, there is no structural feature dedicated to PTM identification. We proposed a novel spatial feature that captures rich structure information in a succinct form. The dimension of this feature is much lower than that of other sequence and structural features that were used in previous studies. When the proposed feature was used to predict lysine malonylation sites, it achieved performance comparable to other state-of-the-art methods that had much higher dimension. The low dimensionality of the proposed feature would be very helpful for building interpretable predictors for various applications involving protein structures. We further attempted to develop a reliable benchmark dataset and evaluate performance of multiple sequence- and structure-based features in prediction. The result indicated that our proposed spatial structure achieved competent performance and that other structural features can also make contribution to PTM prediction. Even though utilizing protein structure in lysine PTM prediction is still in the early stage, we can expect structure-based features to play a more crucial role in PTM site prediction.Item Fuzzy Reasoning Based Evolutionary Algorithms Applied to Data Mining(North Dakota State University, 2015) Chen, MinData mining and information retrieval are two difficult tasks for various reasons. First, as the volume of data increases tremendously, most of the data are complex, large, imprecise, uncertain or incomplete. Furthermore, information retrieval may be imprecise or subjective. Therefore, comprehensible and understandable results are required by the users during the process of data mining or knowledge discovery. Fuzzy logic has become an active research area because its capability of handling perceptual uncertainties, such as ambiguity or vagueness, and its excellent ability on describing nonlinear system. The study of this dissertation is focused on two main paradigms. The first paradigm focuses on applying fuzzy inductive learning on classification problems. A fuzzy classifier based on discrete particle swarm optimization and a fuzzy decision tree classifier are implemented in this paradigm. The fuzzy classifier based on discrete particle swarm optimization includes a discrete particle swarm optimization classifier and a fuzzy discrete particle swarm optimization classifier. The discrete particle swarm optimization classifier is devised and applied to discrete data. Whereas, the fuzzy discrete particle swarm optimization classifier is an improved version that can handle both discrete and continuous data to manage uncertainty and imprecision. A fuzzy decision tree classifier with a feature selection method is proposed, which is based on the ideas of mutual information and genetic algorithms. The second paradigm is fuzzy cluster analysis. The purpose is to provide efficient approaches to identify similar or dissimilar descriptions of data instances. The shapes of the clusters is either hyper-spherical or hyper-planed. A fuzzy c-means clustering approach based on particle swarm optimization, which clustering prototype is hyper-spherical, is proposed to automatically determine the optimal number of clusters. In addition, a fuzzy c-regression model, which has hyper-planed clusters, has received much attention in recent literature for nonlinear system identification and has been successfully employed in various areas. Thus, a fuzzy c-regression model clustering algorithm is applied for color image segmentation.