Search Results

Now showing 1 - 10 of 12
  • Item
    Mining for Significant Information from Unstructured and Structured Biological Data and Its Applications
    (North Dakota State University, 2012) Al-Azzam, Omar Ghazi
    Massive amounts of biological data are being accumulated in science. Searching for significant meaningful information and patterns from different types of data is necessary towards gaining knowledge from these large amounts of data available to users. However, data mining techniques do not normally deal with significance. Integrating data mining techniques with standard statistical procedures provides a way for mining statistically signi- ficant, interesting information from both structured and unstructured data. In this dissertation, different algorithms for mining significant biological information from both unstructured and structured data are proposed. A weighted-density-based approach is presented for mining item data from unstructured textual representations. Different algorithms in the area of radiation hybrid mapping are developed for mining significant information from structured binary data. The proposed algorithms have different applications in the ordering problem in radiation hybrid mapping including: identifying unreliable markers, and building solid framework maps. Effectiveness of the proposed algorithms towards improving map stability is demonstrated. Map stability is determined based on resampling analysis. The proposed algorithms deal effectively and efficiently with multidimensional data and also reduce computational cost dramatically. Evaluation shows that the proposed algorithms outperform comparative methods in terms of both accuracy and computation cost.
  • Item
    Smart Grid Optimization Using a Capacitated Transshipment Problem Solver
    (North Dakota State University, 2013) Lampl, Damian
    A network flow model known as the capacitated transshipment problem, or CTP, can represent key aspects of a smart grid test network with the goal of finding minimum cost electric power flows using multiple different cost performance metrics. A custom CTP Solver was developed and implemented as an ASP.NET web application in an effort to study these various minimum cost smart grid problems and provide their optimal solutions. The CTP Solver modifies traditional linear programming concepts by introducing object oriented software development practices, as well as an insightful innovation for handling bidirectional arcs, which effectively halves the required disk and memory allocation of fully bidirectional networks. As an initial step toward smart grid optimization problem solutions, the CTP Solver provides a glimpse of how self-healing and possibly other key components of smart grid architecture might be handled in the future.
  • Item
    Metrics and Tools to Guide Design of Graphical User Interfaces
    (North Dakota State University, 2014) Alemerien, Khalid Ali
    User interface design metrics assist developers evaluate interface designs in early phase before delivering the software to end users. This dissertation presents a metric-based tool called GUIEvaluator for evaluating the complexity of the user interface based on its structure. The metrics-model consists of five modified structural measures of interface complexity: Alignment, grouping, size, density, and balance. The results of GUIEvaluator are discussed in comparison with the subjective evaluations of interface layouts and the existing complexity metrics-models. To extend this metrics-model, the Screen-Layout Cohesion (SLC) metric has been proposed. This metric is used to evaluate the usability of user interfaces. The SLC metric has been developed based on Aesthetic, structural, and semantic aspects of GUIs. To provide the SLC calculation, a complementary tool has been developed, which is called GUIExaminer. This dissertation demonstrates the potential of incorporating automated complexity and cohesion metrics into the user interface design process. The findings show that a strong positive correlation between the subjective evaluation and both the GUIEvaluator and GUIExaminer, at a significance level 0.05. Moreover, the findings provide evidence of the effectiveness of the GUIEvaluator and GUIExaminer to predict the best user interface design among a set of alternative user interfaces. In addition, the findings show that the GUIEvaluator and GUIExaminer can measure some usability aspects of a given user interface. However, the metrics validation proves the usefulness of GUIEvaluator and GUIExaminer for evaluating user interface designs.
  • Item
    Usability Construct for Mobile Applications: A Clustering based Approach
    (North Dakota State University, 2015) Kotala, Pratap
    The growth of mobile applications that run on cell phones and other handheld devices has introduced a broad range of usability challenges that were not faced by the web and standalone PC environments. The current usability models for mobile applications are mostly based on the experience of the usability experts and users that were collected through surveys and field studies. Many usability researchers and practitioners have developed conceptual usability frameworks that utilize either different or overlapping usability attributes. Moreover, the usability frameworks in existence they are limited in scope and do not consider all the usability dimensions. There is no consensus among usability researchers and standard organizations regarding what constitutes a usability model or framework. This research attempts to utilize a novel, computational, linguistic approach in order to identify the semantic relatedness between different usability attributes. We use text-mining and information-extraction techniques to mine for usability attributes in a large collection of published literature about mobile usability. A hierarchical clustering analysis is performed to cluster semantically related usability attributes. The results are utilized to develop a usability taxonomy and a unified usability construct for mobile applications.
  • Item
    Understanding Contextual Factors in Regression Testing Techniques
    (North Dakota State University, 2016) Anderson, Jeffrey Ryan
    The software regression testing techniques of test case reduction, selection, and prioritization are widely used and well-researched in software development. They allow for more efficient utilization of scarce testing resources in large projects, thereby increasing project quality at reduced costs. There are many data sources and techniques that have been researched, leaving software practitioners with no good way of choosing which data source or technique will be most appropriate for their project. This dissertation addresses this limitation. First, we introduce a conceptual framework for examining this area of research. Then, we perform a literature review to understand the current state of the art. Next, we performed a family of empirical studies to further investigate the thesis. Finally, we provide guidance to practitioners and researchers. In our first empirical study, we showed that advanced data mining techniques on an industrial product can improve the effectiveness of regression testing techniques. In our next study, we expanded on that research by learning a classification model. This research showed attributes such as complexity and historical failures were the most effective metrics due to a high occurrence of random test failures in the product studied. Finally, we applied the learning from the initial research and the systematic literature survey to develop novel regression testing techniques based on the attributes of an industrial product and showed these new techniques to be effective. These novel approaches included predicting performance faults from test data and customizing regression testing techniques based on usage telemetry. Further, we provide guidance to practitioners and researchers based on the findings from our empirical studies and the literature survey. This guidance will help practitioners and researchers more effectively employ and study regression testing techniques.
  • Item
    Improved Genetic Programming Techniques For Data Classification
    (North Dakota State University, 2014) Al-Madi, Naila Shikri
    Evolutionary algorithms are one category of optimization techniques that are inspired by processes of biological evolution. Evolutionary computation is applied to many domains and one of the most important is data mining. Data mining is a relatively broad field that deals with the automatic knowledge discovery from databases and it is one of the most developed fields in the area of artificial intelligence. Classification is a data mining method that assigns items in a collection to target classes with the goal to accurately predict the target class for each item in the data. Genetic programming (GP) is one of the effective evolutionary computation techniques to solve classification problems. GP solves classification problems as an optimization tasks, where it searches for the best solution with highest accuracy. However, GP suffers from some weaknesses such as long execution time, and the need to tune many parameters for each problem. Furthermore, GP can not obtain high accuracy for multiclass classification problems as opposed to binary problems. In this dissertation, we address these drawbacks and propose some approaches in order to overcome them. Adaptive GP variants are proposed in order to automatically adapt the parameter settings and shorten the execution time. Moreover, two approaches are proposed to improve the accuracy of GP when applied to multiclass classification problems. In addition, a Segment-based approach is proposed to accelerate the GP execution time for the data classification problem. Furthermore, a parallelization of the GP process using the MapReduce methodology was proposed which aims to shorten the GP execution time and to provide the ability to use large population sizes leading to a faster convergence. The proposed approaches are evaluated using different measures, such as accuracy, execution time, sensitivity, specificity, and statistical tests. Comparisons between the proposed approaches with the standard GP, and with other classification techniques were performed, and the results showed that these approaches overcome the drawbacks of standard GP by successfully improving the accuracy and execution time.
  • Item
    Capacitated Transshipment Models for Predicting Signaling Pathways
    (North Dakota State University, 2012) Sahni, Ritika
    Signal transduction is a process of transmitting signals for controlling biological responses. The protein-protein interaction (PPI) data, containing signal transduction proteins, can be considered as a bi-directional, weighted network with the proteins as nodes, the interactions between them as edges, and the confidence score of the interaction as weights on edges. If the edges of this network are given a capacity of one, and if the starting and ending proteins are the supply and demand nodes, then this problem can be modeled as a capacitated transshipment model with pathways as the solutions. Our application concerns finding the signaling pathways for yeast’s mitogen-activated protein-kinase (MAPK) pheromone response and filamentation growth using the model created in the SAS OPTMODEL. The results demonstrate that the proposed model is easier to understand and interpret, and is applicable to the PPI network to discover signaling pathways efficiently and accurately.
  • Item
    An Investigation of Integration and Performance Issues Related to the Use of Extended Page Sizes in Computationally Intensive Applications
    (North Dakota State University, 2012) Piehl, Matthew James
    The combination of increasing fabrication density and corresponding decrease in price has resulted in the ability of commodity platforms to support large memory capacities. Processor designers have introduced support for extended hardware page sizes to assist operating systems with efficiently scaling to these memory capacities. This paper will explore integration strategies the designers of the Linux operating system have used to access this hardware support and the practical performance impact of using this support. This paper also provides a review of common strategies for adding support for this functionality at the application level. These strategies are applied to a sampling representative of common scientific applications to support a practical evaluation of the expected performance impact of extended page size support. An analysis of these results support a finding that a 5% performance improvement can be expected by adding support for extended page sizes to memory intensive scientific applications.
  • Item
    Foundational Algorithms Underlying Horizontal Processing of Vertically Structured Big Data Using pTrees
    (North Dakota State University, 2016) Hossain, Mohammad
    For Big Data, the time taken to process a data mining algorithm is a critical issue. Many reliable algorithms are unusable in the big data environment due to the fact that the processing takes an unacceptable amount of time. Therefore, increasing the speed of processing is very important. To address the speed issue we use horizontal processing of vertically structured data rather than the ubiquitous vertical (scan) processing of horizontal (record) data. pTree technology represents and processes data differently from the traditional horizontal data technologies. In pTree technology, the data is structured column-wise (into bit slices) and the columns are processed horizontally (typically across a few to a few hundred bit level columns), while in horizontal technologies, data is structured row-wise and those rows are processed vertically. pTrees are lossless, compressed and data-mining ready data structures. pTrees are lossless because the vertical bit-wise partitioning that is used in the pTree technology guarantees that all information is retained completely. There is no loss of information in converting horizontal data to this vertical format. pTrees are data-mining ready because the fast, horizontal data mining processes involved can be done without the need to reconstruct the original form of data. This technique has been exploited in various domains and data mining algorithms, ranging from classification, clustering, association rule mining, as well as other data mining algorithms. In this research work, we evaluate and compare the speeds of various foundational algorithms required for using this pTree technology in many data mining tasks.
  • Item
    Injecting Safety-Critical Certification Into Agile Software Methods
    (North Dakota State University, 2013) Minot, Scott James
    Agility offers an adaptable and changeable environment within software development. The benefits that agile methods provide for software development are becoming an even greater possibility in safety-critical software programs. These certified programs go through a rigorous process to ensure the safety of all people involved. As the systems become more complex and there is a need for adaptability, the benefits of agile could save companies considerable time and money without sacrificing the safety factor. In this paper, I will provide ways of incorporating certification for highly critical systems by using a form of agility tailored to fit certification requirements. With time reduction and an increasing ability to change safety-critical software, it will show that it can be a viable option to deploy.