Computer Science Doctoral Work

Permanent URI for this collectionhdl:10365/32551

Browse

Now showing 1 - 20 of 67

Virtual-Experiment-Driven Process Model (VEDPM)
(North Dakota State University, 2010) Lua, Chin Aik
Computer simulations are the last resort for many complex problems such as swarm applications. However, to the best of the author's knowledge, there is no convincing work in proving ''What You Simulate ls What You See'' (WYSIWYS). Many models are built on long, subjective code that is prone to abnormalities, which are about corrupted virtual scientific laws rather than software bugs. Thus, the task of validating scientific simulations is very difficult, if not impossible. This dissertation provides a new process methodology for solving the problems above: Virtual-Experiment-Driven Process Model (VEDPM). VEDPM employs simple yet sound virtual experiments for verifying simple, short virtual laws. The proven laws, in turn, are utilized for developing valid models that can achieve real goals. The resulted simulations (or data) from proven models arc WYS1WYS. Two complex swarm applications have been developed rigorously and successfully via VEDPM--proving that VEDPM is workable. In addition, the author also provides innovative constructs for developing autonomous unmanned vehicles--swarm software architecture and a modified subsumption control scheme, and their design philosophies. The constructs are used repeatedly to enable unmanned vehicles to switch behaviors autonomously via a simple control signal.
Vector-Item Pattern Mining Algorithms and their Applications
(North Dakota State University, 2011) Wu, Jianfei
Advances in storage technology have long been driving the need for new data mining techniques. Not only are typical data sets becoming larger, but the diversity of available attributes is increasing in many problem domains. In biological applications for example, a single protein may have associated sequence-, text-, graph-, continuous and item data. Correspondingly, there is growing need for techniques to find patterns in such complex data. Many techniques exist for mapping specific types of data to vector space representations, such as the bag-of-words model for text [58] or embedding in vector spaces of graphs [94, 91]. However, there are few techniques that recognize the resulting vector space representations as units that may be combined and further processed. This research aims to mine important vector-item patterns hidden across multiple and diverse data sources. We consider sets of related continuous attributes as vector data and search for patterns that relate a vector attribute to one or more items. The presence of an item set defines a subset of vectors that may or may not show unexpected density fluctuations. Two types of vector-item pattern mining algorithms have been developed, namely histogram-based vector-item pattern mining algorithms and point distribution vector-item pattern mining algorithms. In histogram-based vector-item pattern mining algorithms, a vector-item pattern is significant or important if its density histogram significantly differs from what is expected for a random subset of transactions, using χ² goodness-of-fit test or effect size analysis. For point distribution vector-item pattern mining algorithms, a vector-item pattern is significant if its probability density function (PDF) has a big KullbackLeibler divergence from random subsamples. We have applied the vector-item pattern mining algorithms to several application areas, and by comparing with other state-of-art algorithms we justify the effectiveness and efficiency of the algorithms.
Mining for Significant Information from Unstructured and Structured Biological Data and Its Applications
(North Dakota State University, 2012) Al-Azzam, Omar Ghazi
Massive amounts of biological data are being accumulated in science. Searching for significant meaningful information and patterns from different types of data is necessary towards gaining knowledge from these large amounts of data available to users. However, data mining techniques do not normally deal with significance. Integrating data mining techniques with standard statistical procedures provides a way for mining statistically signi- ficant, interesting information from both structured and unstructured data. In this dissertation, different algorithms for mining significant biological information from both unstructured and structured data are proposed. A weighted-density-based approach is presented for mining item data from unstructured textual representations. Different algorithms in the area of radiation hybrid mapping are developed for mining significant information from structured binary data. The proposed algorithms have different applications in the ordering problem in radiation hybrid mapping including: identifying unreliable markers, and building solid framework maps. Effectiveness of the proposed algorithms towards improving map stability is demonstrated. Map stability is determined based on resampling analysis. The proposed algorithms deal effectively and efficiently with multidimensional data and also reduce computational cost dramatically. Evaluation shows that the proposed algorithms outperform comparative methods in terms of both accuracy and computation cost.
Using Information Retrieval to Improve Integration Testing
(North Dakota State University, 2012) Alazzam, Iyad
Software testing is an important factor of the software development process. Integration testing is an important and expensive level of the software testing process. Unfortunately, since the developers have limited time to perform integration testing and debugging and integration testing becomes very hard as the combinations grow in size, the chain of calls from one module to another grow in number, length, and complexity. This research is about providing new methodology for integration testing to reduce the number of test cases needed to a significant degree while returning as much of its effectiveness as possible. The proposed approach shows the best order in which to integrate the classes currently available for integration and the external method calls that should be tested and in their order for maximum effectiveness. Our approach limits the number of integration test cases. The integration test cases number depends mainly on the dependency among modules and on the number of the integrated classes in the application. The dependency among modules is determined by using an information retrieval technique called Latent Semantic Indexing (LSI). In addition, this research extends the mutation testing for use in integration testing as a method to evaluate the effectiveness of the integration testing process. We have developed a set of integration mutation operators to support development of integration mutation testing. We have conducted experiments based on ten Java applications. To evaluate the proposed methodology, we have created mutants using new mutation operators that exercise the integration testing. Our experiments show that the test cases killed more than 60% of the created mutants.
A Secure and Reliable Interference-Aware Wireless Mesh Network Design
(North Dakota State University, 2012) Kandah, Farah Issa
A wireless mesh network (WMN) is a multihop wireless network consisting of a large number of wireless nodes of which some are called gateway nodes and connected with a wired network. Wireless mesh network have attracted much research attention recently due to its flexibility, low-cost and robustness, which facilitate its usability in many potential applications, including last-mile broadband Internet access, neighborhood gaming, Video-on-Demand (VoD), distributed file backup, video surveillance, etc. The broadcast nature, the lack of infrastructure as well as the flexible deployment nature of wireless mesh networks make it different from wired networks, therefore more attention in designing the wireless mesh network is needed to maintain a good performance of this promising technology. We, in this study, investigate the wireless mesh network design taking into consideration three design factors seeking an improvement in the network performance by reducing the interference influence in the network, improving the network reliability to satisfy more requests, and securing the network against malicious eavesdropping attacks. Our design is presented into three sub-problems; sub-problem (1), which seeks an interference-aware robust topology control scheme, sub-problem (2) which seeks a multipath routing scheme, and sub-problem (3) which seeks a secure key management scheme. Through simulations and comparisons with previous work, we show that our proposed solutions outperform previous schemes in providing a better network performance in terms of reducing the network interference, satisfying more number of requests and increasing the network resistance to malicious eavesdropping attacks.
Towards Change Propagating Test Models In Autonomic and Adaptive Systems
(North Dakota State University, 2012) Akour, Mohammed Abd Alwahab
The major motivation for self-adaptive computing systems is the self-adjustment of the software according to a changing environment. Adaptive computing systems can add, remove, and replace their own components in response to changes in the system itself and in the operating environment of a software system. Although these systems may provide a certain degree of confidence against new environments, their structural and behavioral changes should be validated after adaptation occurs at runtime. Testing dynamically adaptive systems is extremely challenging because both the structure and behavior of the system may change during its execution. After self adaptation occurs in autonomic software, new components may be integrated to the software system. When new components are incorporated, testing them becomes vital phase for ensuring that they will interact and behave as expected. When self adaptation is about removing existing components, a predefined test set may no longer be applicable due to changes in the program structure. Investigating techniques for dynamically updating regression tests after adaptation is therefore necessary to ensure such approaches can be applied in practice. We propose a model-driven approach that is based on change propagation for synchronizing a runtime test model for a software system with the model of its component structure after dynamic adaptation. A workflow and meta-model to support the approach was provided, referred to as Test Information Propagation (TIP). To demonstrate TIP, a prototype was developed that simulates a reductive and additive change to an autonomic, service-oriented healthcare application. To demonstrate the generalization of our TIP approach to be instantiated into the domain of up-to-date runtime testing for self-adaptive software systems, the TIP approach was applied to the self-adaptive JPacman 3.0 system. To measure the accuracy of the TIP engine, we consider and compare the work of a developer who manually identifyied changes that should be performed to update the test model after self-adaptation occurs in self-adaptive systems in our study. The experiments show how TIP is highly accurate for reductive change propagation across self-adaptive systems. Promising results have been achieved in simulating the additive changes as well.
Adaptive Regression Testing Strategies for Cost-Effective Regression Testing
(North Dakota State University, 2013) Schwartz, Amanda Jo
Regression testing is an important but expensive part of the software development life-cycle. Many different techniques have been proposed for reducing the cost of regression testing. To date, much research has been performed comparing regression testing techniques, but very little research has been performed to aid practitioners and researchers in choosing the most cost-effective technique for a particular regression testing session. One recent study investigated this problem and proposed Adaptive Regression Testing (ART) strategies to aid practitioners in choosing the most cost-effective technique for a specific version of a software system. The results of this study showed that the techniques chosen by the ART strategy were more cost-effective than techniques that did not consider system lifetime and testing processes. This work has several limitations, however. First, it only considers one ART strategy. There are many other strategies which could be developed and studied that could be more cost-effective. Second, the ART strategy used the Analytical Hierarchy Process (AHP). The AHP method is subjective to the weights made by the decision maker. Also, the AHP method is very time consuming because it requires many pairwise comparisons. Pairwise comparisons also limit the scalability of the approach and are often found to be inconsistent. This work proposes three new ART strategies to address these limitations. One strategy utilizing the fuzzy AHP method is proposed to address imprecision in the judgment made by the decision maker. A second strategy utilizing a fuzzy expert system is proposed to reduce the time required by the decision maker, eliminate inconsistencies due to pairwise comparisons, and increase scalability. A third strategy utilizing the Weighted Sum Model is proposed to study the performance of a simple, low cost strategy. Then, a series of empirical studies are performed to evaluate the new strategies. The results of the studies show that the strategies proposed in this work are more cost-effective than the strategy presented in the previous study.
Heuristic Clustering with Secured Routing in Two Tier Sensor Networks
(North Dakota State University, 2013) Gagneja, Kanwalinderjit Kaur
This study addresses the management of Heterogeneous Sensor Networks (HSNs) in an area of interest. The use of sensors in our day-to-day life has increased dramatically, and in ten to fifteen years the sensor nodes may cover the whole world, and could be accessed through the Internet. Currently, sensors are in use for such things as vehicular movement tracking, nuclear power plant monitoring, fire incident reporting, traffic controlling, and environmental monitoring. There is vast potential for various applications, such as entertainment, drug trafficking, border surveillance, crisis management, under water environment monitoring, and smart spaces. So this research area has a lot of potential. The sensors have limited resources and researchers have invented methods to deal with the related issues. But security and routing in sensor networks and clustering of sensors are handled separately by past researchers. Since route selection directly depends on the position of the nodes and sets of resources may change dynamically, so cumulative and coordinated activities are essential to maintain the organizational structure of laid out sensors. So for conserving the sensor network energy, it is better if we follow a holistic approach taking care of both clustering and secure routing. In this research, we have developed an efficient key management approach with an improved tree routing algorithm for clustered heterogeneous sensor networks. The simulation results show that this scheme offers good security and uses less computation with substantial savings in memory requirements, when compared with some of other key management, clustering and routing techniques. The low end nodes are simple and low cost, while the high end nodes are costly but provide significantly more processing power. In this type of sensor network, the low end nodes are clustered and report to a high end node, which in turn uses a network backbone to route data to a base station. Initially, we partition the given area into Voronoi clusters. Voronoi diagrams generate polygonal clusters using Euclidian distance. Since sensor networks routing is multi-hopped, we apply a tabu search to adjust some of the nodes in the Voronoi clusters. The Voronoi clusters then work with hop counts instead of distance. When some event occurs in the network, low end nodes gather and forward data to cluster heads using the Secure Improved Tree Routing approach. The routing amongst the low end nodes, high end nodes and the base station is made secure and efficient by applying a 2-way handshaking secure Improved Tree Routing (ITR) technique. The secure ITR data routing procedure improves the energy efficiency of the network by reducing the number of hops utilized to reach the Base Station. We gain robustness and energy efficiency by reducing the vulnerability points in the network by employing alternatives to shortest path tree routing. In this way a complete solution is provided to the travelling data in a two tier heterogeneous sensor network by reducing the hop count, making it secure and energy efficient. Empirical evaluations show how the described algorithm performs with respect to delivery ratio, end to end delays, and energy usage.
Simulating Multi-Agent Decision Making for a Self Healing Smart Grid
(North Dakota State University, 2013) Bou ghosn, Steve Martin
Dynamic real-time power systems like the national power grid operate in continuously changing environments such as adverse weather conditions, power line malfunctions, device failures, etc. These disruptions can lead to different fault conditions in the power system, ranging from a local outage to a cascading failure of global proportions. It is vital to be able to guarantee that all consumers with critical loads won’t be seriously affected when these outages occur, and to also be able to detect potential faults early on, to prevent them from spreading and creating a generalized failure. In order to achieve this, the power grid must be able to perform intelligent behavior to adapt to ever changing conditions and also to self-heal itself in the event that a fault condition occurs. The Smart Grid must continuously monitor its own status and if an abnormal state is detected, it must automatically perform corrective actions to restore the grid to a healthy state. Due to the large scale and complexity of the Smart Grid, anticipating all possible scenarios that lead to performance lapses is difficult [2]. There is a high degree of uncertainty in accurately estimating the impact of disruptions on the reliability, availability and efficiency of the power delivery system. The use of simulation models can promote trust in Smart Grid solutions in safe and cost effective ways. In this work, we first present an innovative framework that can be used as a design basis when implementing agent based simulations of the smart grid. The framework is based on two primary concepts. First, the electrical grid system is separated into semi-autonomous units or micro-grids, each with their own set of hierarchically organized agents. Second, models for automating decision-making in the grid during crisis situations are independently supported, allowing simulations that can test how agents respond to the various scenarios that can occur in the smart grid using different decision models. Advantages of this framework are scalability, modularity, coordinated local and global decision making, and the ability to easily implement and test a large variety of decision models.
A Two-phase Security Mechanism for Anomaly Detection in Wireless Sensor Networks
(North Dakota State University, 2013) Zhao, Jingjun
Wireless Sensor Networks (WSNs) have been applied to a wide range of application areas, including battle fields, transportation systems, and hospitals. The security issues in WSNs are still hot research topics. The constrained capabilities of sensors and the environments in which sensors are deployed, such as hostile and non-reachable areas, make the security more complicated. This dissertation describes the development and testing of a novel two-phase security mechanism for hierarchical WSNs that is capable of defending both outside and inside attacks. For the outside attacks, the attackers are usually malicious intruders that entered the network. The computation and communication capabilities of the sensors restrict them from directly defending the harmful intruders by performing traditionally encryption, authentication, or other cryptographic operations. However, the sensors can assist the more powerful nodes in a hierarchical structured WSN to track down these intruders and thereby prevent further damage. To fundamentally improve the security of a WSN, a multi-target tracking algorithm is developed to track the intruders. For the inside attacks, the attackers are compromised insiders. The intruders manipulate these insiders to indirectly attack other sensors. Therefore, detecting these malicious insiders in a timely manner is important to improve the security of a network. In this dissertation, we mainly focus on detecting the malicious insiders that try to break the normal communication among sensors, which creates holes in the WSN. As the malicious insiders attempt to break the communication by actively using HELLO flooding attack, we apply an immune-inspired algorithm called Dendritic Cell Algorithm (DCA) to detect this type of attack. If the malicious insiders adopt a subtle way to break the communication by dropping received packets, we implement another proposed technique, a short-and-safe routing (SSR) protocol to prevent this type of attack. The designed security mechanism can be applied to different sizes of both static and dynamic WSNs. We adopt a popular simulation tool, ns-2, and a numerical computing environment, MATLAB, to analyze and compare the computational complexities of the proposed security mechanism. Simulation results demonstrate effective performance of the developed corrective and preventive security mechanisms on detecting malicious nodes and tracking the intruders.
Mining Semantic Relationships Between Concepts Across Documents Using Wikipedia Knowledge
(North Dakota State University, 2013) Yan, Peng
The ongoing astounding growth of text data has created an enormous need for fast and efficient Text Mining algorithms. However, the sparsity and high dimensionality of text data present great challenges for representing the semantics of natural language text. Traditional approaches for document representation are mostly based on the Vector Space (VSM) Model which takes a document as an unordered collection of words and only document-level statistical information is recorded (e.g., document frequency, inverse document frequency). Due to the lack of capturing semantics in texts, for certain tasks, especially fine-grained information discovery applications, such as mining relationships between concepts, VSM demonstrates its inherent limitations because of its rationale for computing relatedness between words only based on the statistical information collected from documents themselves. In this dissertation, we present a new framework that attempts to address the above problems by utilizing background knowledge to provide a better semantic representation of any text. This is accomplished through leveraging Wikipedia, the world’s currently largest human built encyclopedia. Meanwhile, this integration also sufficiently complements the existing information contained in text corpus and facilitates the construction of a more comprehensive representation and retrieval framework. Specifically, we present 1) Semantic Path Chaining (SPC), a new text mining model that automatically discovers semantic relationships between concepts across multiple documents (which the traditional search paradigm such as search engines cannot help much) and effectively integrates various evidence sources from Wikipedia; 2) the kernel methods that provide a more appropriate estimation of semantic relatedness between concepts and better utilize Wikipedia background knowledge in our defined query contexts; 3) Concept Association Graph (CAG), a graph-based mining prototype system interfaced directly to Wikipedia, enables fast and customizable concept relationship search using Wikipedia resources. The effectiveness of the proposed techniques has been evaluated on different data sets. The experimental results demonstrate the search performance has been significantly enhanced in terms of accuracy and coverage compared with several baseline models. In particular, some existing state-of-the-art related work such as Srinivasan’s closed text mining algorithm, Explicit Semantic Analysis (ESA) [19] and the RelFinder system [26, 27, 41] has been used as the comparison models.
Spatially Aware Computing for Natural Interaction
(North Dakota State University, 2013) Roudaki, Amin
Spatial information refers to the location of an object in a physical or digital world. Besides, it also includes the relative position of an object related to other objects around it. In this dissertation, three systems are designed and developed. All of them apply spatial information in different fields. The ultimate goal is to increase the user friendliness and efficiency in those applications by utilizing spatial information. The first system is a novel Web page data extraction application, which takes advantage of 2D spatial information to discover structured records from a Web page. The extracted information is useful to re-organize the layout of a Web page to fit mobile browsing. The second application utilizes the 3D spatial information of a mobile device within a large paper-based workspace to implement interactive paper that combines the merits of paper documents and mobile devices. This application can overlay digital information on top of a paper document based on the location of a mobile device within a workspace. The third application further integrates 3D space information with sound detection to realize an automatic camera management system. This application automatically controls multiple cameras in a conference room, and creates an engaging video by intelligently switching camera shots among meeting participants based on their activities. Evaluations have been made on all three applications, and the results are promising. In summary, this dissertation comprehensively explores the usage of spatial information in various applications to improve the usability.
A Distributed Linear Programming Model in a Smart Grid
(North Dakota State University, 2013) Ranganathan, Prakash
Advances in computing and communication have resulted in large-scale distributed environments in recent years. They are capable of storing large volumes of data and, often, have multiple compute nodes. However, the inherent heterogeneity of data components, the dynamic nature of distributed systems, the need for information synchronization and data fusion over a network, and security and access-control issues makes the problem of resource management and monitoring a tremendous challenge in the context of a Smart grid. Unfortunately, the concept of cloud computing and the deployment of distributed algorithms have been overlooked in the electric grid sector. In particular, centralized methods for managing resources and data may not be sufficient to monitor a complex electric grid. Most of the electric grid management that includes generation, transmission, and distribution is, by and large, managed at a centralized control. In this dissertation, I present a distributed algorithm for resource management which builds on the traditional simplex algorithm used for solving large-scale linear optimization problems. The distributed algorithm is exact, meaning its results are identical if run in a centralized setting. More specifically in this dissertation, I discuss a distributed decision model, where a large-scale electric grid is decomposed into many sub models that can support the resource assignment, communication, computation, and control functions necessary to provide robustness and to prevent incidents such as cascading blackouts. The key contribution of this dissertation is to design, develop, and test a resource-allocation process through a decomposition principle in a Smart grid. I have implemented and tested the Dantzig-Wolfe decomposition process in standard IEEE 14-bus and 30-bus systems. The dissertation provides details about how to formulate, implement, and test such an LP-based design to study the dynamic behavior and impact of an electrical network while considering its failure and repair rates. The computational benefits of the Dantzig-Wolfe approach to find an optimal solution and its applicability to IEEE bus systems are presented.
Towards Test Focus Selection for Integration Testing Using Software Metrics
(North Dakota State University, 2013) Bani Ta’an, Shadi Elaiyan
Object-oriented software systems contain a large number of modules which make the unit testing, integration testing, and system testing very difficult and challenging. While the aim of the unit testing is to show that individual modules are working properly and the aim of the system testing is to determine whether the whole system meets its specifications, the aim of integration testing is to uncover errors in the interactions between system modules. Correct functioning of object-oriented software depends upon the successful integration of classes. While individual classes may function correctly, several faults can arise when these classes are integrated together. However, it is generally impossible to test all the connections between modules because of time and cost constraints. Thus, it is important to focus the testing on the connections presumed to be more error-prone. The general goal of this research is to let testers know where in a software system to focus when they perform integration testing to save time and resources. In this work, we propose a new approach to predict and rank error-prone connections in object-oriented systems. We define method level metrics that can be used for test focus selection in integration testing. In addition, we build a tool which calculates the metrics automatically. We performed experiments on several Java applications taken from different domains. Both error seeding technique and mutation testing were used for evaluation. The experimental results showed that our approach is very effective for selecting the test focus in integration testing.
A Data Mining Approach to Radiation Hybrid Mapping
(North Dakota State University, 2014) Seetan, Raed
The task of mapping markers from Radiation Hybrid (RH) mapping experiments is typically viewed as equivalent to the traveling-salesman problem, which has combinatorial complexity. As an additional problem, experiments commonly result in some unreliable markers that reduce the overall map quality. Due to the large numbers of markers in current radiation hybrid populations, the use of the data mining techniques becomes increasingly important for reducing both the computational complexity and the impact of noise of the original data. In this dissertation, a clustering-based approach is proposed for addressing both the problem of filtering unreliable markers (framework maps) and the problem of mapping large numbers of markers (comprehensive maps) efficiently. Traditional approaches for eliminating unreliable markers use resampling of the full data set, which has an even higher computational complexity than the original mapping problem. In contrast, the proposed algorithms use a divide-and-conquer strategy to construct framework maps based on clusters that exclude unreliable markers. The clusters of markers are ordered using parallel processing and are then combined to form the complete map. Three algorithms are presented that explore the trade-off between the number of markers included in the framework map and placement accuracy. Since the mapping problem is susceptible to noise, it is often beneficial to remove markers that are not trustworthy. Traditional mapping techniques for building comprehensive maps process all markers together, including unreliable markers, in a single-iteration approach. The accuracy of the constructed maps may be reduced. In this research work, two-stage algorithms are proposed to mapping most markers by first creating a framework map of the reliable markers, and then incrementally adding the remaining markers to construct high quality comprehensive maps. All proposed algorithms have been evaluated on several human chromosomes using radiation hybrid datasets with varying sizes, and also the performance of our proposed algorithms is compared with state-of-the-art RH mapping softwares. Overall, the proposed algorithms are not only much faster than the comparative approaches, but that the quality of the resulting maps is also much higher.
Metrics and Tools to Guide Design of Graphical User Interfaces
(North Dakota State University, 2014) Alemerien, Khalid Ali
User interface design metrics assist developers evaluate interface designs in early phase before delivering the software to end users. This dissertation presents a metric-based tool called GUIEvaluator for evaluating the complexity of the user interface based on its structure. The metrics-model consists of five modified structural measures of interface complexity: Alignment, grouping, size, density, and balance. The results of GUIEvaluator are discussed in comparison with the subjective evaluations of interface layouts and the existing complexity metrics-models. To extend this metrics-model, the Screen-Layout Cohesion (SLC) metric has been proposed. This metric is used to evaluate the usability of user interfaces. The SLC metric has been developed based on Aesthetic, structural, and semantic aspects of GUIs. To provide the SLC calculation, a complementary tool has been developed, which is called GUIExaminer. This dissertation demonstrates the potential of incorporating automated complexity and cohesion metrics into the user interface design process. The findings show that a strong positive correlation between the subjective evaluation and both the GUIEvaluator and GUIExaminer, at a significance level 0.05. Moreover, the findings provide evidence of the effectiveness of the GUIEvaluator and GUIExaminer to predict the best user interface design among a set of alternative user interfaces. In addition, the findings show that the GUIEvaluator and GUIExaminer can measure some usability aspects of a given user interface. However, the metrics validation proves the usefulness of GUIEvaluator and GUIExaminer for evaluating user interface designs.
A New Coupling Metric: Combining Structural and Semantic Relationships
(North Dakota State University, 2014) Alenezi, Mamdouh Khalaf
Maintaining object-oriented software is problematic and expensive. Earlier research has revealed that complex relationships among object-oriented software entities are key reasons that make maintenance costly. Therefore, measuring the strength of these relationships has become a requirement to develop proficient techniques for software maintenance. Coupling, a measure of the interdependence among software entities, is an important property for which many software metrics have been defined. It is widely agreed that the level of coupling in a software product has consequences for its maintenance. In order to understand which aspects of coupling affect quality or other external attributes of software, this dissertation introduces a new coupling metric for object-oriented software that combines structural and semantic relationships among methods and classes. The dissertation studies the usage of the new proposed coupling metric throughout change impact analysis, predicting fault-prone and maintainable classes. Three empirical studies were performed to evaluate the new coupling metric and established three results. Firstly, the new coupling metric can be effectively used to specify other classes that might potentially affected by a change to a given class. Secondly, a significant correlation between the new coupling metric and faults has been found. Finally, it has been found that the new metric shows a good promise in predicting maintainable classes. We expect that this new software metric contributes to the improvement of the design of incremental change of software and thus lead to increasing software quality and reducing software maintenance costs.
Improved Genetic Programming Techniques For Data Classification
(North Dakota State University, 2014) Al-Madi, Naila Shikri
Evolutionary algorithms are one category of optimization techniques that are inspired by processes of biological evolution. Evolutionary computation is applied to many domains and one of the most important is data mining. Data mining is a relatively broad field that deals with the automatic knowledge discovery from databases and it is one of the most developed fields in the area of artificial intelligence. Classification is a data mining method that assigns items in a collection to target classes with the goal to accurately predict the target class for each item in the data. Genetic programming (GP) is one of the effective evolutionary computation techniques to solve classification problems. GP solves classification problems as an optimization tasks, where it searches for the best solution with highest accuracy. However, GP suffers from some weaknesses such as long execution time, and the need to tune many parameters for each problem. Furthermore, GP can not obtain high accuracy for multiclass classification problems as opposed to binary problems. In this dissertation, we address these drawbacks and propose some approaches in order to overcome them. Adaptive GP variants are proposed in order to automatically adapt the parameter settings and shorten the execution time. Moreover, two approaches are proposed to improve the accuracy of GP when applied to multiclass classification problems. In addition, a Segment-based approach is proposed to accelerate the GP execution time for the data classification problem. Furthermore, a parallelization of the GP process using the MapReduce methodology was proposed which aims to shorten the GP execution time and to provide the ability to use large population sizes leading to a faster convergence. The proposed approaches are evaluated using different measures, such as accuracy, execution time, sensitivity, specificity, and statistical tests. Comparisons between the proposed approaches with the standard GP, and with other classification techniques were performed, and the results showed that these approaches overcome the drawbacks of standard GP by successfully improving the accuracy and execution time.
Measurement of Non-Technical Skills of Software Development Teams
(North Dakota State University, 2014) Bender, Lisa Louise
Software Development managers recognize that project team dynamics is a key component of the success of any project. Managers can have a project with well-defined goals, an adequate schedule, technically skilled people and all the necessary tools, but if the project team members cannot communicate and collaborate effectively with each other and with end users, then project success is at risk. Common problems with non-technical skills include dysfunctional communication, negative attitudes, uncooperativeness, mistrust, avoidance, and ineffective negotiations between team members and users. Such problems must be identified and addressed to improve individual and team performance. There are tools available that assist in measuring the effectiveness of the technical skills and processes that teams use to execute projects, but there are no proven tools to effectively measure the non-technical skills of software developers. Other industries (e.g. airline and medical) are also finding that teamwork issues are related to non-technical skills, as well as lack of technical expertise. These industries are beginning to use behavioral marker systems to structure individual and team assessments. Behavioral markers are observable behaviors that impact individual or team performance. This dissertation work explores and develops a behavioral marker system tool, adapted from model in other industries, to assist managers in assessing the non-technical skills of project team individuals within groups. An empirical study was also conducted to prove the validity of the tool and the report is included in this study. We also developed and report upon empirical work that assesses how Social Sensitivity (a non-technical skill) impacts team performance. There are four components to this work: Develop a useful non-technical skills taxonomy; Develop a behavioral marker system for software developers and the non-technical skills taxonomy; Validate the software developer behavioral marker system; Investigate specifically the effect of social sensitivity on team performance. The evaluation is based on data collected from experiments. The overall goal of this work is to provide software development team managers with a methodology to evaluate and provide feedback on the non-technical skills of software developers and to investigate if a particular non-technical skill can positively affect team performance.
Mapreduce-Enabled Scalable Nature-Inspired Approaches for Clustering
(North Dakota State University, 2014) Aljarah, Ibrahim Mithgal
The increasing volume of data to be analyzed imposes new challenges to the data mining methodologies. Traditional data mining such as clustering methods do not scale well with larger data sizes and are computationally expensive in terms of memory and time. Clustering large data sets has received attention in the last few years in several application areas such as document categorization, which is in urgent need of scalable approaches. Swarm intelligence algorithms have self-organizing features, which are used to share knowledge among swarm members to locate the best solution. These algorithms have been successfully applied to clustering, however, they suffer from the scalability issue when large data is involved. In order to satisfy these needs, new parallel scalable clustering methods need to be developed. The MapReduce framework has become a popular model for parallelizing data-intensive applications due to its features such as fault-tolerance, scalability, and usability. However, the challenge is to formulate the tasks with map and reduce functions. This dissertation firstly presents a scalable particle swarm optimization (MR-CPSO) clustering algorithm that is based on the MapReduce framework. Experimental results reveal that the proposed algorithm scales very well with increasing data set sizes while maintaining good clustering quality. Moreover, a parallel intrusion detection system using the MR-CPSO is introduced. This system has been tested on a real large-scale intrusion data set to confirm its scalability and detection quality. In addition, the MapReduce framework is utilized to implement a parallel glowworm swarm optimization (MR-GSO) algorithm to optimize difficult multimodal functions. The experiments demonstrate that MR-GSO can achieve high function peak capture rates. Moreover, this dissertation presents a new clustering algorithm based on GSO (CGSO). CGSO takes into account the multimodal search capability to locate optimal centroids in order to enhance the clustering quality without the need to provide the number of clusters in advance. The experimental results demonstrate that CGSO outperforms other well-known clustering algorithms. In addition, a MapReduce GSO clustering (MRCGSO) algorithm version is introduced to evaluate the algorithm's scalability with large scale data sets. MRCGSO achieves a good speedup and utilization when more computing nodes are used.

Browse

Browsing Computer Science Doctoral Work by Issue Date