William Perrizo - Thesis Committee
Permanent URI for this collectionhdl:10365/32089
Browse
Browsing William Perrizo - Thesis Committee by browse.metadata.department "Computer Science"
Now showing 1 - 12 of 12
- Results Per Page
- Sort Options
Item Capacitated Transshipment Models for Predicting Signaling Pathways(North Dakota State University, 2012) Sahni, RitikaSignal transduction is a process of transmitting signals for controlling biological responses. The protein-protein interaction (PPI) data, containing signal transduction proteins, can be considered as a bi-directional, weighted network with the proteins as nodes, the interactions between them as edges, and the confidence score of the interaction as weights on edges. If the edges of this network are given a capacity of one, and if the starting and ending proteins are the supply and demand nodes, then this problem can be modeled as a capacitated transshipment model with pathways as the solutions. Our application concerns finding the signaling pathways for yeast’s mitogen-activated protein-kinase (MAPK) pheromone response and filamentation growth using the model created in the SAS OPTMODEL. The results demonstrate that the proposed model is easier to understand and interpret, and is applicable to the PPI network to discover signaling pathways efficiently and accurately.Item Foundational Algorithms Underlying Horizontal Processing of Vertically Structured Big Data Using pTrees(North Dakota State University, 2016) Hossain, MohammadFor Big Data, the time taken to process a data mining algorithm is a critical issue. Many reliable algorithms are unusable in the big data environment due to the fact that the processing takes an unacceptable amount of time. Therefore, increasing the speed of processing is very important. To address the speed issue we use horizontal processing of vertically structured data rather than the ubiquitous vertical (scan) processing of horizontal (record) data. pTree technology represents and processes data differently from the traditional horizontal data technologies. In pTree technology, the data is structured column-wise (into bit slices) and the columns are processed horizontally (typically across a few to a few hundred bit level columns), while in horizontal technologies, data is structured row-wise and those rows are processed vertically. pTrees are lossless, compressed and data-mining ready data structures. pTrees are lossless because the vertical bit-wise partitioning that is used in the pTree technology guarantees that all information is retained completely. There is no loss of information in converting horizontal data to this vertical format. pTrees are data-mining ready because the fast, horizontal data mining processes involved can be done without the need to reconstruct the original form of data. This technique has been exploited in various domains and data mining algorithms, ranging from classification, clustering, association rule mining, as well as other data mining algorithms. In this research work, we evaluate and compare the speeds of various foundational algorithms required for using this pTree technology in many data mining tasks.Item Improved Genetic Programming Techniques For Data Classification(North Dakota State University, 2014) Al-Madi, Naila ShikriEvolutionary algorithms are one category of optimization techniques that are inspired by processes of biological evolution. Evolutionary computation is applied to many domains and one of the most important is data mining. Data mining is a relatively broad field that deals with the automatic knowledge discovery from databases and it is one of the most developed fields in the area of artificial intelligence. Classification is a data mining method that assigns items in a collection to target classes with the goal to accurately predict the target class for each item in the data. Genetic programming (GP) is one of the effective evolutionary computation techniques to solve classification problems. GP solves classification problems as an optimization tasks, where it searches for the best solution with highest accuracy. However, GP suffers from some weaknesses such as long execution time, and the need to tune many parameters for each problem. Furthermore, GP can not obtain high accuracy for multiclass classification problems as opposed to binary problems. In this dissertation, we address these drawbacks and propose some approaches in order to overcome them. Adaptive GP variants are proposed in order to automatically adapt the parameter settings and shorten the execution time. Moreover, two approaches are proposed to improve the accuracy of GP when applied to multiclass classification problems. In addition, a Segment-based approach is proposed to accelerate the GP execution time for the data classification problem. Furthermore, a parallelization of the GP process using the MapReduce methodology was proposed which aims to shorten the GP execution time and to provide the ability to use large population sizes leading to a faster convergence. The proposed approaches are evaluated using different measures, such as accuracy, execution time, sensitivity, specificity, and statistical tests. Comparisons between the proposed approaches with the standard GP, and with other classification techniques were performed, and the results showed that these approaches overcome the drawbacks of standard GP by successfully improving the accuracy and execution time.Item Injecting Safety-Critical Certification Into Agile Software Methods(North Dakota State University, 2013) Minot, Scott JamesAgility offers an adaptable and changeable environment within software development. The benefits that agile methods provide for software development are becoming an even greater possibility in safety-critical software programs. These certified programs go through a rigorous process to ensure the safety of all people involved. As the systems become more complex and there is a need for adaptability, the benefits of agile could save companies considerable time and money without sacrificing the safety factor. In this paper, I will provide ways of incorporating certification for highly critical systems by using a form of agility tailored to fit certification requirements. With time reduction and an increasing ability to change safety-critical software, it will show that it can be a viable option to deploy.Item An Investigation of Integration and Performance Issues Related to the Use of Extended Page Sizes in Computationally Intensive Applications(North Dakota State University, 2012) Piehl, Matthew JamesThe combination of increasing fabrication density and corresponding decrease in price has resulted in the ability of commodity platforms to support large memory capacities. Processor designers have introduced support for extended hardware page sizes to assist operating systems with efficiently scaling to these memory capacities. This paper will explore integration strategies the designers of the Linux operating system have used to access this hardware support and the practical performance impact of using this support. This paper also provides a review of common strategies for adding support for this functionality at the application level. These strategies are applied to a sampling representative of common scientific applications to support a practical evaluation of the expected performance impact of extended page size support. An analysis of these results support a finding that a 5% performance improvement can be expected by adding support for extended page sizes to memory intensive scientific applications.Item Metrics and Tools to Guide Design of Graphical User Interfaces(North Dakota State University, 2014) Alemerien, Khalid AliUser interface design metrics assist developers evaluate interface designs in early phase before delivering the software to end users. This dissertation presents a metric-based tool called GUIEvaluator for evaluating the complexity of the user interface based on its structure. The metrics-model consists of five modified structural measures of interface complexity: Alignment, grouping, size, density, and balance. The results of GUIEvaluator are discussed in comparison with the subjective evaluations of interface layouts and the existing complexity metrics-models. To extend this metrics-model, the Screen-Layout Cohesion (SLC) metric has been proposed. This metric is used to evaluate the usability of user interfaces. The SLC metric has been developed based on Aesthetic, structural, and semantic aspects of GUIs. To provide the SLC calculation, a complementary tool has been developed, which is called GUIExaminer. This dissertation demonstrates the potential of incorporating automated complexity and cohesion metrics into the user interface design process. The findings show that a strong positive correlation between the subjective evaluation and both the GUIEvaluator and GUIExaminer, at a significance level 0.05. Moreover, the findings provide evidence of the effectiveness of the GUIEvaluator and GUIExaminer to predict the best user interface design among a set of alternative user interfaces. In addition, the findings show that the GUIEvaluator and GUIExaminer can measure some usability aspects of a given user interface. However, the metrics validation proves the usefulness of GUIEvaluator and GUIExaminer for evaluating user interface designs.Item Mining for Significant Information from Unstructured and Structured Biological Data and Its Applications(North Dakota State University, 2012) Al-Azzam, Omar GhaziMassive amounts of biological data are being accumulated in science. Searching for significant meaningful information and patterns from different types of data is necessary towards gaining knowledge from these large amounts of data available to users. However, data mining techniques do not normally deal with significance. Integrating data mining techniques with standard statistical procedures provides a way for mining statistically signi- ficant, interesting information from both structured and unstructured data. In this dissertation, different algorithms for mining significant biological information from both unstructured and structured data are proposed. A weighted-density-based approach is presented for mining item data from unstructured textual representations. Different algorithms in the area of radiation hybrid mapping are developed for mining significant information from structured binary data. The proposed algorithms have different applications in the ordering problem in radiation hybrid mapping including: identifying unreliable markers, and building solid framework maps. Effectiveness of the proposed algorithms towards improving map stability is demonstrated. Map stability is determined based on resampling analysis. The proposed algorithms deal effectively and efficiently with multidimensional data and also reduce computational cost dramatically. Evaluation shows that the proposed algorithms outperform comparative methods in terms of both accuracy and computation cost.Item Mining Significant Patterns by Integrating Biological Interaction Networks with Gene Profiles(North Dakota State University, 2015) Alroobi, Rami MohammedNowadays, large amounts of high-throughput data are available. Automatic with classical cell biology techniques which are employed in the analysis of cell functions, interactions, and how pathogens can exploit them in disease, are becoming available because of the huge advancements in both Genomics and Proteomics technologies. Analyzing and studying these vast amounts of data will enable researchers to uncover, clarify, and explain some aspects of gene products behavior and characteristics under a very diverse set of conditions. The biological data belong to different types. The integration of several types of data can help reduce the effect of problems each data source has. The focus or our work and among the very important tasks in the bioinformatics field are functional module discovery and discriminative pattern. In functional module discovery, the goal is to find groups of genes that interact to perform different processes in the living organism. Discriminative patterns mining aims at discovering groups of proteins that can be classified as related to a specific phenotype. Understanding what genes, or proteins, are involved in biological phenomena can lead to advancements in related medical and pharmaceutical research. Many research has be done in this area. The two main sources of data used in my work are the gene expression and the protein-protein interaction network. The expression data shows how genes react in several conditions. The interaction network represents real protein cooperations occurring in the living cell. Our research efforts proved to show competitive performance with well established methods as illustrated in this document.Item Project Quality Tool: A Tool for Project Success(North Dakota State University, 2014) Srichinta, PallaviThis paper proposes a solution to the current changing requirements communication problem in an offshore on-site software development model. The proposed model is a web-based tool where the user in a project team can enter the new Requirements, map them to Design, create Test Cases from design, Execute them, and track failed ones by creating Defects. When the requirements change the existing tools available in the market, the changes are not communicated to the entire project team; leaving the Quality Assurance team verifying old (incomplete) requirements which ultimately costs more time, money and delays the project delivery. In this paper, a prototype tool intended to automatically handle the above-mentioned communication problems whenever requirements are changed after the design is in place. The prototype manages the gap between on-site and offshore teams and adds value to the project development by saving time, money, and improving the quality of the final product.Item Smart Grid Optimization Using a Capacitated Transshipment Problem Solver(North Dakota State University, 2013) Lampl, DamianA network flow model known as the capacitated transshipment problem, or CTP, can represent key aspects of a smart grid test network with the goal of finding minimum cost electric power flows using multiple different cost performance metrics. A custom CTP Solver was developed and implemented as an ASP.NET web application in an effort to study these various minimum cost smart grid problems and provide their optimal solutions. The CTP Solver modifies traditional linear programming concepts by introducing object oriented software development practices, as well as an insightful innovation for handling bidirectional arcs, which effectively halves the required disk and memory allocation of fully bidirectional networks. As an initial step toward smart grid optimization problem solutions, the CTP Solver provides a glimpse of how self-healing and possibly other key components of smart grid architecture might be handled in the future.Item Understanding Contextual Factors in Regression Testing Techniques(North Dakota State University, 2016) Anderson, Jeffrey RyanThe software regression testing techniques of test case reduction, selection, and prioritization are widely used and well-researched in software development. They allow for more efficient utilization of scarce testing resources in large projects, thereby increasing project quality at reduced costs. There are many data sources and techniques that have been researched, leaving software practitioners with no good way of choosing which data source or technique will be most appropriate for their project. This dissertation addresses this limitation. First, we introduce a conceptual framework for examining this area of research. Then, we perform a literature review to understand the current state of the art. Next, we performed a family of empirical studies to further investigate the thesis. Finally, we provide guidance to practitioners and researchers. In our first empirical study, we showed that advanced data mining techniques on an industrial product can improve the effectiveness of regression testing techniques. In our next study, we expanded on that research by learning a classification model. This research showed attributes such as complexity and historical failures were the most effective metrics due to a high occurrence of random test failures in the product studied. Finally, we applied the learning from the initial research and the systematic literature survey to develop novel regression testing techniques based on the attributes of an industrial product and showed these new techniques to be effective. These novel approaches included predicting performance faults from test data and customizing regression testing techniques based on usage telemetry. Further, we provide guidance to practitioners and researchers based on the findings from our empirical studies and the literature survey. This guidance will help practitioners and researchers more effectively employ and study regression testing techniques.Item Usability Construct for Mobile Applications: A Clustering based Approach(North Dakota State University, 2015) Kotala, PratapThe growth of mobile applications that run on cell phones and other handheld devices has introduced a broad range of usability challenges that were not faced by the web and standalone PC environments. The current usability models for mobile applications are mostly based on the experience of the usability experts and users that were collected through surveys and field studies. Many usability researchers and practitioners have developed conceptual usability frameworks that utilize either different or overlapping usability attributes. Moreover, the usability frameworks in existence they are limited in scope and do not consider all the usability dimensions. There is no consensus among usability researchers and standard organizations regarding what constitutes a usability model or framework. This research attempts to utilize a novel, computational, linguistic approach in order to identify the semantic relatedness between different usability attributes. We use text-mining and information-extraction techniques to mine for usability attributes in a large collection of published literature about mobile usability. A hierarchical clustering analysis is performed to cluster semantically related usability attributes. The results are utilized to develop a usability taxonomy and a unified usability construct for mobile applications.