University Distinguished Professors

Permanent URI for this communityhdl:10365/32082

Becoming a University Distinguished Professor (UDP) is the highest honor that can be awarded to a faculty member at North Dakota State University. Research from these individuals can be found here. More information about University Distinguished Professors can be found at https://www.ndsu.edu/president/honors/distinguished_professors/

Browse

Now showing 1 - 11 of 11

An Application of Association Rule Mining to Unit Test Selection
(North Dakota State University, 2013) Gunderson, Karl Nils
Appropriate selection of unit tests during the software development process is vital when many unit tests exist. The developer may be unfamiliar with some tests and non-obvious relationships between application code and test code may exist. Poor test selection may lead to defects. This is especially true when the application is large and many developers are involved. By the application of association rule mining to the unit test selection process and by comparison with extant selection techniques, we will provide a quantitative analysis of the benefits of heuristic and its limit to development where process patterns are stable.
Capacitated Transshipment Models for Predicting Signaling Pathways
(North Dakota State University, 2012) Sahni, Ritika
Signal transduction is a process of transmitting signals for controlling biological responses. The protein-protein interaction (PPI) data, containing signal transduction proteins, can be considered as a bi-directional, weighted network with the proteins as nodes, the interactions between them as edges, and the confidence score of the interaction as weights on edges. If the edges of this network are given a capacity of one, and if the starting and ending proteins are the supply and demand nodes, then this problem can be modeled as a capacitated transshipment model with pathways as the solutions. Our application concerns finding the signaling pathways for yeast’s mitogen-activated protein-kinase (MAPK) pheromone response and filamentation growth using the model created in the SAS OPTMODEL. The results demonstrate that the proposed model is easier to understand and interpret, and is applicable to the PPI network to discover signaling pathways efficiently and accurately.
Foundational Algorithms Underlying Horizontal Processing of Vertically Structured Big Data Using pTrees
(North Dakota State University, 2016) Hossain, Mohammad
For Big Data, the time taken to process a data mining algorithm is a critical issue. Many reliable algorithms are unusable in the big data environment due to the fact that the processing takes an unacceptable amount of time. Therefore, increasing the speed of processing is very important. To address the speed issue we use horizontal processing of vertically structured data rather than the ubiquitous vertical (scan) processing of horizontal (record) data. pTree technology represents and processes data differently from the traditional horizontal data technologies. In pTree technology, the data is structured column-wise (into bit slices) and the columns are processed horizontally (typically across a few to a few hundred bit level columns), while in horizontal technologies, data is structured row-wise and those rows are processed vertically. pTrees are lossless, compressed and data-mining ready data structures. pTrees are lossless because the vertical bit-wise partitioning that is used in the pTree technology guarantees that all information is retained completely. There is no loss of information in converting horizontal data to this vertical format. pTrees are data-mining ready because the fast, horizontal data mining processes involved can be done without the need to reconstruct the original form of data. This technique has been exploited in various domains and data mining algorithms, ranging from classification, clustering, association rule mining, as well as other data mining algorithms. In this research work, we evaluate and compare the speeds of various foundational algorithms required for using this pTree technology in many data mining tasks.
Improved Genetic Programming Techniques For Data Classification
(North Dakota State University, 2014) Al-Madi, Naila Shikri
Evolutionary algorithms are one category of optimization techniques that are inspired by processes of biological evolution. Evolutionary computation is applied to many domains and one of the most important is data mining. Data mining is a relatively broad field that deals with the automatic knowledge discovery from databases and it is one of the most developed fields in the area of artificial intelligence. Classification is a data mining method that assigns items in a collection to target classes with the goal to accurately predict the target class for each item in the data. Genetic programming (GP) is one of the effective evolutionary computation techniques to solve classification problems. GP solves classification problems as an optimization tasks, where it searches for the best solution with highest accuracy. However, GP suffers from some weaknesses such as long execution time, and the need to tune many parameters for each problem. Furthermore, GP can not obtain high accuracy for multiclass classification problems as opposed to binary problems. In this dissertation, we address these drawbacks and propose some approaches in order to overcome them. Adaptive GP variants are proposed in order to automatically adapt the parameter settings and shorten the execution time. Moreover, two approaches are proposed to improve the accuracy of GP when applied to multiclass classification problems. In addition, a Segment-based approach is proposed to accelerate the GP execution time for the data classification problem. Furthermore, a parallelization of the GP process using the MapReduce methodology was proposed which aims to shorten the GP execution time and to provide the ability to use large population sizes leading to a faster convergence. The proposed approaches are evaluated using different measures, such as accuracy, execution time, sensitivity, specificity, and statistical tests. Comparisons between the proposed approaches with the standard GP, and with other classification techniques were performed, and the results showed that these approaches overcome the drawbacks of standard GP by successfully improving the accuracy and execution time.
An Investigation of Integration and Performance Issues Related to the Use of Extended Page Sizes in Computationally Intensive Applications
(North Dakota State University, 2012) Piehl, Matthew James
The combination of increasing fabrication density and corresponding decrease in price has resulted in the ability of commodity platforms to support large memory capacities. Processor designers have introduced support for extended hardware page sizes to assist operating systems with efficiently scaling to these memory capacities. This paper will explore integration strategies the designers of the Linux operating system have used to access this hardware support and the practical performance impact of using this support. This paper also provides a review of common strategies for adding support for this functionality at the application level. These strategies are applied to a sampling representative of common scientific applications to support a practical evaluation of the expected performance impact of extended page size support. An analysis of these results support a finding that a 5% performance improvement can be expected by adding support for extended page sizes to memory intensive scientific applications.
Metrics and Tools to Guide Design of Graphical User Interfaces
(North Dakota State University, 2014) Alemerien, Khalid Ali
User interface design metrics assist developers evaluate interface designs in early phase before delivering the software to end users. This dissertation presents a metric-based tool called GUIEvaluator for evaluating the complexity of the user interface based on its structure. The metrics-model consists of five modified structural measures of interface complexity: Alignment, grouping, size, density, and balance. The results of GUIEvaluator are discussed in comparison with the subjective evaluations of interface layouts and the existing complexity metrics-models. To extend this metrics-model, the Screen-Layout Cohesion (SLC) metric has been proposed. This metric is used to evaluate the usability of user interfaces. The SLC metric has been developed based on Aesthetic, structural, and semantic aspects of GUIs. To provide the SLC calculation, a complementary tool has been developed, which is called GUIExaminer. This dissertation demonstrates the potential of incorporating automated complexity and cohesion metrics into the user interface design process. The findings show that a strong positive correlation between the subjective evaluation and both the GUIEvaluator and GUIExaminer, at a significance level 0.05. Moreover, the findings provide evidence of the effectiveness of the GUIEvaluator and GUIExaminer to predict the best user interface design among a set of alternative user interfaces. In addition, the findings show that the GUIEvaluator and GUIExaminer can measure some usability aspects of a given user interface. However, the metrics validation proves the usefulness of GUIEvaluator and GUIExaminer for evaluating user interface designs.
Mining for Significant Information from Unstructured and Structured Biological Data and Its Applications
(North Dakota State University, 2012) Al-Azzam, Omar Ghazi
Massive amounts of biological data are being accumulated in science. Searching for significant meaningful information and patterns from different types of data is necessary towards gaining knowledge from these large amounts of data available to users. However, data mining techniques do not normally deal with significance. Integrating data mining techniques with standard statistical procedures provides a way for mining statistically signi- ficant, interesting information from both structured and unstructured data. In this dissertation, different algorithms for mining significant biological information from both unstructured and structured data are proposed. A weighted-density-based approach is presented for mining item data from unstructured textual representations. Different algorithms in the area of radiation hybrid mapping are developed for mining significant information from structured binary data. The proposed algorithms have different applications in the ordering problem in radiation hybrid mapping including: identifying unreliable markers, and building solid framework maps. Effectiveness of the proposed algorithms towards improving map stability is demonstrated. Map stability is determined based on resampling analysis. The proposed algorithms deal effectively and efficiently with multidimensional data and also reduce computational cost dramatically. Evaluation shows that the proposed algorithms outperform comparative methods in terms of both accuracy and computation cost.
Mining Significant Patterns by Integrating Biological Interaction Networks with Gene Profiles
(North Dakota State University, 2015) Alroobi, Rami Mohammed
Nowadays, large amounts of high-throughput data are available. Automatic with classical cell biology techniques which are employed in the analysis of cell functions, interactions, and how pathogens can exploit them in disease, are becoming available because of the huge advancements in both Genomics and Proteomics technologies. Analyzing and studying these vast amounts of data will enable researchers to uncover, clarify, and explain some aspects of gene products behavior and characteristics under a very diverse set of conditions. The biological data belong to different types. The integration of several types of data can help reduce the effect of problems each data source has. The focus or our work and among the very important tasks in the bioinformatics field are functional module discovery and discriminative pattern. In functional module discovery, the goal is to find groups of genes that interact to perform different processes in the living organism. Discriminative patterns mining aims at discovering groups of proteins that can be classified as related to a specific phenotype. Understanding what genes, or proteins, are involved in biological phenomena can lead to advancements in related medical and pharmaceutical research. Many research has be done in this area. The two main sources of data used in my work are the gene expression and the protein-protein interaction network. The expression data shows how genes react in several conditions. The interaction network represents real protein cooperations occurring in the living cell. Our research efforts proved to show competitive performance with well established methods as illustrated in this document.
Project Quality Tool: A Tool for Project Success
(North Dakota State University, 2014) Srichinta, Pallavi
This paper proposes a solution to the current changing requirements communication problem in an offshore on-site software development model. The proposed model is a web-based tool where the user in a project team can enter the new Requirements, map them to Design, create Test Cases from design, Execute them, and track failed ones by creating Defects. When the requirements change the existing tools available in the market, the changes are not communicated to the entire project team; leaving the Quality Assurance team verifying old (incomplete) requirements which ultimately costs more time, money and delays the project delivery. In this paper, a prototype tool intended to automatically handle the above-mentioned communication problems whenever requirements are changed after the design is in place. The prototype manages the gap between on-site and offshore teams and adds value to the project development by saving time, money, and improving the quality of the final product.
Smart Grid Optimization Using a Capacitated Transshipment Problem Solver
(North Dakota State University, 2013) Lampl, Damian
A network flow model known as the capacitated transshipment problem, or CTP, can represent key aspects of a smart grid test network with the goal of finding minimum cost electric power flows using multiple different cost performance metrics. A custom CTP Solver was developed and implemented as an ASP.NET web application in an effort to study these various minimum cost smart grid problems and provide their optimal solutions. The CTP Solver modifies traditional linear programming concepts by introducing object oriented software development practices, as well as an insightful innovation for handling bidirectional arcs, which effectively halves the required disk and memory allocation of fully bidirectional networks. As an initial step toward smart grid optimization problem solutions, the CTP Solver provides a glimpse of how self-healing and possibly other key components of smart grid architecture might be handled in the future.
Usability Construct for Mobile Applications: A Clustering based Approach
(North Dakota State University, 2015) Kotala, Pratap
The growth of mobile applications that run on cell phones and other handheld devices has introduced a broad range of usability challenges that were not faced by the web and standalone PC environments. The current usability models for mobile applications are mostly based on the experience of the usability experts and users that were collected through surveys and field studies. Many usability researchers and practitioners have developed conceptual usability frameworks that utilize either different or overlapping usability attributes. Moreover, the usability frameworks in existence they are limited in scope and do not consider all the usability dimensions. There is no consensus among usability researchers and standard organizations regarding what constitutes a usability model or framework. This research attempts to utilize a novel, computational, linguistic approach in order to identify the semantic relatedness between different usability attributes. We use text-mining and information-extraction techniques to mine for usability attributes in a large collection of published literature about mobile usability. A hierarchical clustering analysis is performed to cluster semantically related usability attributes. The results are utilized to develop a usability taxonomy and a unified usability construct for mobile applications.

Browse

Browsing University Distinguished Professors by browse.metadata.program "Computer Science"