Computer Science Masters Theses
Permanent URI for this collectionhdl:10365/32553
Browse
Browsing Computer Science Masters Theses by browse.metadata.program "Computer Science"
Now showing 1 - 20 of 68
- Results Per Page
- Sort Options
Item Adapting Web Page Tables on Mobile Web Browsers: Results from Two Controlled Empirical Studies(North Dakota State University, 2014) Annadi, Ramakanth ReddyDisplaying web page content in mobile screens is a challenging task and users often face difficulty retrieving the relevant data. It can force them to adopt a time-consuming hunt-and-peck strategy. Application of design principles can improve the view of the webpage data content and reduce the time consumption in viewing it. This is especially true with HTML tabular data content. This thesis discusses the background and applications of the gestalt design principle techniques to HTML tabular data content. An empirical study was performed to investigate the usability of two types of the adaptive styles namely, single and multi-layout. This thesis also compared the adaptive styles that use gestalt principles with simple HTML tabular data on mobile screens. A controlled study which involved university students was performed showed that the adaptive layout styles improved the efficiency of finding information in the website by gestalt principles usage and eliminating horizontal scroll.Item Adaptive Regression Testing Strategy: An Empirical Study(North Dakota State University, 2012) Arafeen, Md. JunaidWhen software systems evolve, different amounts of code modifications can be involved in different versions. These factors can affect the costs and benefits of regression testing techniques, and thus, there may be no single regression testing technique that is the most cost-effective technique to use on every version. To date, many regression testing techniques have been proposed, but no research has been done on the problem of helping practitioners systematically choose appropriate techniques on new versions as systems evolve. To address this problem, we propose adaptive regression testing (ART) strategies that attempt to identify the regression testing techniques that will be the most cost-effective for each regression testing session considering organization’s situations and testing environment. To assess our approach, we conducted an experiment focusing on test case prioritization techniques. Our results show that prioritization techniques selected by our approach can be more cost-effective than those used by the control approaches.Item Analysis of Java's Common Vulnerabilities and Exposures in GitHub's Open-Source Projects(North Dakota State University, 2022) Akanmu, SemiuJava developers rely on code reusability because of its time and effort reduction advantage. However, they are exposed to vulnerabilities in publicly available open-source software (OSS) projects. This study employed a multi-stage research approach to investigate the extent to which open-source Java projects are secured. The research process includes text analysis of Java’s Common Vulnerabilities and Exposures (CVE) descriptions and static code analysis using GitHub’s CodeQL. This study found (a) cross-site scripting, (b) buffer overflow (though analyzed as array index out of bounds), (c) data deserialization, (d) input non-validation for an untrusted object, and (e) validation method bypass as the prevalent Java’s vulnerabilities from the MITRE CVEs. The static code analysis of the compatible seven (7) Java projects out of the 100 top projects cloned from GitHub revealed a 71.4% presence of the array index out-of-bounds vulnerability.Item Analysis of SDR to Detect Long Range RFID Badge Cloners(North Dakota State University, 2022) Knecht, BrettThis thesis proposes a way of detecting when radio frequency identification (RFID) badge credentials are being captured through the use of software defined radio (SDR). A method for using SDR to detect when badge cloning technologies are in use on the premises is presented, tested, and analyzed. This thesis presents an overview of the problem with badge systems and a background literature review. Next, the proposed method of detection and its workings are presented. Then, the strategy for evaluating the methods performance. This is discussed by discussion and evaluation of the results. Finally, the thesis concludes with a discussion of the method’s potential benefits and proposed future work.Item Anonymity and Hostile Node Identification in Wireless Sensor Networks(North Dakota State University, 2010) Reindl, Phillip StevenIn many secure wireless network attack scenarios, the source of a data packet is as sensitive as the data it contains. Existing work to provide source anonymity in wireless sensor networks (WSN) are not frugal in terms of transmission overhead. We present a set of schemes to provide secure source anonymity. As the state of the art in WSN advances, researchers increasingly look to heterogeneous network topologies. We leverage high powered cluster head nodes to further reduce transmission overhead and provide excellent scalability. A significant threat to WSN is the insider attack due to the ease of tampering with low-cost sensors. Should a node become compromised and start making malicious collisions, it is desirable to identify the corrupt node and revoke its keys. We present schemes to identify the source of an arbitrary transmission in a reliable and distributed fashion.Item Application of Memory-Based Collaborative Filtering to Predict Fantasy Points of NFL Quarterbacks(North Dakota State University, 2019) Paramarta, Dienul Haq AmbegSubjective expert projections have been traditionally used to predict points in fantasy football, while machine prediction applications are limited. Memory-based collaborative filtering has been widely used in recommender system domain to predict ratings and recommend items. In this study, user-based and item-based collaborative filtering were explored and implemented to predict the weekly statistics and fantasy points of NFL quarterbacks. The predictions from three seasons were compared against expert projections. On both weekly statistics and total fantasy points, the implementations could not make significantly better predictions than experts.However, the prediction from the implementation improved the accuracy of other regression models when used as additional feature.Item An Architecture for the Implementation and Distribution of Multiuser Virtual Environments.(North Dakota State University, 2010) Dischinger, Benjamin JamesJavaMOO is an architecture for creating multiuser virtual environments focusing on domain-specific design and rapid development. JavaMOO components use best practices and extensible design for system configuration, client-server communication, event handling, object persistence, content delivery, and agent control. Application dependencies such as database and web servers are embedded, promoting wide dissemination by decreasing management overhead. The focus of this thesis is the design and implementation of the JavaMOO architecture and how it helps improve the state of multiuser virtual environments.Item An Artificial Immune System Heuristic in a Smart Electrical Grid(North Dakota State University, 2014) Chowdhury, Md. MinhazThe immune system of the human body follows a process that is adaptive and learns via experience. Some algorithms are designed to take advantage of this process to determine solutions for complex problem domains. The collection of these algorithms is known as Artificial Immune Systems. Among this collection, one important algorithm is "The Danger Theory." In this thesis, an application of the algorithm has been implemented to solve an electrical grid problem. This problem of interest is the automatic detection of faulty and failure conditions in the electrical grid. A novel application of the Artificial Immune System algorithm is presented to solve this problem (i.e., to find faults in electrical-grid data in an automated fashion). The methodology treats streams of electrical-grid data as artificial antigens, and uses artificial antibodies to identify and locate potentially harmful conditions in the grid. The results demonstrate that the approach is promising. I believe this approach has a good contribution for the emerging field of Smart Grids.Item Association Rule Mining of Biological Field Data Sets(North Dakota State University, 2017) Shrestha, AnujAssociation rule mining is an important data mining technique, yet, its use in association analysis of biological data sets has been limited. This mining technique was applied on two biological data sets, a genome and a damselfly data set. The raw data sets were pre-processed, and then association analysis was performed with various configurations. The pre-processing task involves minimizing the number of association attributes in genome data and creating the association attributes in damselfly data. The configurations include generation of single/maximal rules and handling single/multiple tier attributes. Both data sets have a binary class label and using association analysis, attributes of importance to each of these class labels are found. The results (rules) from association analysis are then visualized using graph networks by incorporating the association attributes like support and confidence, differential color schemes and features from the pre-processed data.Item An Automated Approach for Discovering Functional Risk-Inducing Flaws in Software Designs(North Dakota State University, 2015) Hassan, Amro Salem SalemFor safety critical applications, it is necessary to ensure that risk-inducing flaws do not exist in the final product. To date, many risk-based testing techniques were proposed. The majority of these techniques address flaws in the implementation. However, since the overhead of software flaws increases the later they are discovered in the development process, it is important to test for these flaws earlier in the development process. Few approaches have addressed the problem of testing for risk-inducing flaws in the design phase. These approaches are manual approaches, which makes them hard to apply on large complicated software designs. To address this problem, we propose an automated approach for testing designs for risk-inducing flaws. To evaluate our approach, we performed an experiment focusing on specifications of safety critical systems. Our results show that the proposed approach could be effective in discovering functional flaws in behavioral designs that is exposing a risk.Item Blood Glucose Prediction Models for Personalized Diabetes Management(North Dakota State University, 2018) Fernando, Warnakulasuriya ChandimaEffective blood glucose (BG) control is essential for patients with diabetes. This calls for an immediate need to closely keep track of patients' BG level all the time. However, sometimes individual patients may not be able to monitor their BG level regularly due to all kinds of real-life interference. To address this issue, in this paper we propose machine-learning based prediction models that can automatically predict patients BG level based on their historical data and known current status. We take two approaches, one for predicting BG level only using individual's data and second is to use a population data. Our experimental results illustrate the effectiveness of the proposed model.Item Chemical Compound Classification Ensemble(North Dakota State University, 2013) Zhu, YaIn the research of health science, scientists often need to screen numerous chemical compounds to find drugs that can treat a disease. The process of testing the functionality of these compounds in the laboratory is very time-consuming. Computational methods have been used to accelerate this process. These computational methods are implemented based on the principle that chemical compounds with similar structure often have similar function. Thus, these methods maintain a database of chemical compounds whose function has been verified using laboratory experiments. The database contains the chemical structural formula of a compound, the 3D coordinate of every atom, and whether it has a certain function, e.g. it can kill a virus. Then, for a new compound, the programs compare its structure with those in the database and predict if it has the function based on the structure similarity. Thus, predicting the function of a compound is a two-class classification problem. In this project, we try to address this two-class classification problem using global and local similarity between compounds. The global similarity measures the overall structural resemblance between two compounds. When a group of compounds have the same function, they usually share some common sub-structures. These common sub-structures may correspond to their functional sites. Local similarity is computed based on the occurrences of common sub-structures between compounds. We built several classification models based on global and local similarity. To improve the classification result, we used an ensemble of those models to predict the function compounds in NCI cancer data sets. We predict whether a compound can inhibit cancer cell growth or not, obtaining AUC higher than 80% for five datasets. We compare our results with other state-of-the-art methods. Our classification result is the best in all five datasets. Our results show that local similarity is more useful than global similarity in predicting compound function. An ensemble method integrating global and local similarity achieves much better performance than single predicting models.Item Classification of LiDar Data Using Window-Based Techniques(North Dakota State University, 2016) Li, ShuhangGiven LiDAR maps, we focus on identifying anthropologically relevant ditches automatically on the map. Archeologists can identify these features visually at the site, but approaches based on remotely sensed data would be preferable. This paper proposes an algorithm that uses window-based technique to read the characteristics of each region from maps, whose ditches are already identified, regressively, and then builds histograms to represent the different characters of each region. A classification model is then built based on the histograms and used to predict future data. The goal is to produce a large training data set using window-based technology and use it to classify future data. We demonstrated our algorithm successfully identifies target regions efficiently on real LiDAR maps.Item Classifying Gene Coexpression Networks Using Discrimination Pattern Mining(North Dakota State University, 2016) Qormosh, Bassam M MSeveral algorithms for graph classi cation have been proposed. Algorithms that map graphs into feature vectors encoding the presence/absence of speci c subgraphs, have shown excellent performance. Most of the existing algorithms mine for subgraphs that appear frequently in graphs belonging to one class label and not so frequently in the other graphs. Gene coexpression networks classi cation attracted a lot of attention in the recent years from researchers in both biology and data mining because of its numerous useful applications. The advances in high-throughput technologies that provide an easy access to large microarray datasets necessitated the development of new techniques that can scale well with large datasets and produce a very accurate results. In this thesis, we propose a novel approach for mining discriminative patterns. We propose two algorithms for mining discriminative patterns and then we use these patterns for graph classi cation. Experiments on large coexpression graphs show that the proposed approach has excellent performance and scales to graphs with millions of edges. We compare our proposed algorithm to two baseline algorithms and we show that our algorithm outperforms the baseline techniques with a very high accurate graph classi cation. Moreover, we perform topological and biological enrichment analysis on the discriminative patterns reported by our mining algorithm and we show that the reported patterns are signi cantly enriched.Item A Closed Form Optimization Model for The Conflict Neutralization Problem(North Dakota State University, 2010) Wang, YanIn this study, we proposed a novel closed form optimization model for the Conflict Neutralization Problem (CKP) and implemented an efficient algorithm for solving the problem. A novel tableau representation of the CNP model was presented and described in detail. We implemented a special structured branch and bound algorithm to solve the problem. Key components of the implementation were described. To test the computation performance of our algorithm, we designed and conducted three sets of experiments. The experiment results were reported and analyzed in this report. The test results showed the efficiency of the algorithm for solving the Conflict Neutralization Problem.Item Context Specific Module Mining from Multiple Co-Expression Graph(North Dakota State University, 2017) Hossain, Md ShakhawatGene co-expression networks can be used to associate genes of unknown function with biological processes or to find genes in a specific context, environment responsible for a disease. We provide an overview of methods and tools used to identify such recurrent patterns across multiple networks, can be used to discover biological modules in co-expression networks constructed from gene expression data and we explain how this can be used to identify genes with a regulatory role in disease. However, existing algorithms are very much costly in terms of time and space. As network size or number increases, mining such modules get much more complex. We have developed an efficient approach to mine such recurrent context specific modules from 35 gene networks. This computationally very difficult problem due to the exponential number of patterns was solved non-exponentially.Item A Data Mining Approach for Identifying Pavement Distress Signatures(North Dakota State University, 2015) Bouret, Megan SueThis work introduces signature-based data mining of pavement distress data. The goal is to understand the factors that influence pavement distress. The presented approach maintains multiple types of flexible pavement distress scores throughout the analysis and considers them as signatures. The signatures are used to establish the relationship between distress score increases and overweight truck characteristics. Hierarchical clustering of pavement distress signatures provides insights into similarities among road segments. The use of signatures, rather than composite distress scores, is consistent with a data mining approach to the pavement distress problem. One set of experiments showed a relationship between the discovered signature groups and a difference between overweight truck traffic. Group validation has been implemented with Fisher's exact test. Future work related to algorithm improvements have been identified and considered.Item Data Replication Strategies in Cloud Computing(North Dakota State University, 2011) Liu, YangData replication is a widely used technique in various systems. For example, it can be employed in large-scale distributed file systems to increase data availability and system reliability, or it can be used in many network models (e.g. data grid, Amazon CloudFront) to reduce access latency and network bandwidth consumption, etc. I study a series of problems that related to the data replication method in Hadoop Distributed File System (HDFS) and in Amazon CloudFront service. Data failure, which is caused by hardware failure or malfunction, software error, human error, is the greatest threat to the file storage system. I present a set of schemes to enhance the efficiency of the current data replication strategy in HDFS thereby improving system reliability and performance. I also study the application replication placement problem based on an Original-Front sever model, and I propose a novel strategy which intends to maximize the profit of the application providers.Item Development and Validation of a Library for Iterative Window-Based Processing of Geospatial Data(North Dakota State University, 2021) Schwartz, David MichaelHigh-resolution spectral images and digital elevation models are widely available. With this quantity of data, it is imperative to develop fast algorithms to extract information. We present a Python library that implements a set of algorithms for aggregating data within sliding windows. The algorithms have O(log(n)) time complexity and maintain the original image resolution. They are vectorized and written with NumPy to create fast code with C-like performance. The library offers several analysis procedures, architected such that additional procedures utilizing sliding windows can easily be added. Slope, aspect, and curvature analyses exist for digital elevation models. Fractal dimensions and correlation analyses are also present to be used on a range of different images. The software architecture of the library is outlined and motivated. It includes visualized comparisons of analyses and unit testing. Testing procedures are implemented using analytical results from Wolfram Mathematica combined with brute-force algorithms.Item Distance-Aware Relay Placement and Scheduling in Wireless Networks.(North Dakota State University, 2011) Bai, ShiThe WiMAX technology and cognitive radio have been active topics in wireless networks. A WiMAX mesh network is able to provide larger wireless coverage, higher network capacity and Non-Line-Of-Sight (NLOS) communications. Cognitive radios enable dynamic spectrum access over a large frequency range. These characteristics make WiMAX mesh networks and cognitive radio networks able to provide users with low-cost, high-speed and long-range wireless communications, as well as better Quality of Service. However, there are still several challenges and problems to be solved in this area, such as relay station placement problems and scheduling problems. In this thesis, I studied a distance-aware relay placement problem and max-min fair scheduling problem in WiMAX mesh networks. To solve these problems, approximation algorithms and heuristic algorithms are proposed. Theoretical analysis and simulation results are provided to evaluate the solutions. I also studied a scheduling problem adopting the idea of cognitive radio technique in wireless networks over water. Two heuristics are presented to solve this unique problem. I provide the numerical results to justify the performance and efficiency of our proposed scheduling algorithms.