Search Results

Now showing 1 - 10 of 10
  • Item
    Ex-Ante Temporal Optimization in Soybean Origination: An Overdetermined Approach Through Deep Learning
    (North Dakota State University, 2021) Carlson, Noah Joseph
    Digitization is influencing commodity trading and agricultural markets and as they transition towards extreme liquidity, agribusiness risk exposures increase, and traditional competitive advantages diminish. In commodity origination, logistics and destination basis comprise the most volatile and determinant influences of margin. To capture a consistently higher margin and represent narrowed interior basis, agribusiness firms must manage these risks by optimizing transformations. To accomplish this, ex-ante decision-making is often necessary as forward price clairvoyance is not always prevalent, is risky, or contains premium. Modeling this spatial equilibrium is difficult through traditional reductionist and essentialist application as the overdetermined and convoluted system presents bidirectional and simultaneous price discoveries. Developments in neurobiology, technology, and Artificial Intelligence expand capabilities to represent brain behavior and unconscious inference in computational modeling. The use of Recurrent Deep Machine Learning could improve ex-ante decision accuracy within commodity trading through its nonlinear, nonlocal, nonstationary, and sequential capabilities.
  • Item
    Soil Moisture Prediction Using Meteorological Data, Satellite Imagery, and Machine Learning in the Red River Valley of the North
    (North Dakota State University, 2021) Acharya, Umesh
    Weather stations provide key information related to soil moisture and have been used by farmers to decide various field operations. We first evaluated the discrepancies in soil moisture between a weather stations and nearby field; due to soil texture, crop residue cover, crop type, growth stage and duration of temporal dependency to recent rainfall and evaporation rates using regression analysis. The regression analysis showed strong relationship between soil moisture at the weather station and the nearby field at the late vegetative and early reproductive stages. The correlation thereafter declines at later growth stages for corn and wheat. We can adduce that the regression coefficient of soil moisture with four-day cumulative rainfall slightly increased with an increase in the crop residue resulting in a low root mean square error (RMSE) value. We then investigated the effectiveness of machine learning techniques such as random forest regression (RFR), boosted regression trees (BRT), support vector regression, and artificial neural network to predict soil moisture in nearby fields based on RMSE of a 30% validation dataset and to determine the relative importance of predictor variables. The RFR and BRT performed best over other machine learning algorithms based on the lower RMSE values of 0.045 and 0.048 m3 m-3, respectively. The Classification and Regression Trees (CART), RFR and BRT models showed soil moisture at nearby weather stations had the highest relative influence for moisture prediction, followed by the four-day cumulative rainfall and Potential Evapotranspiration (PET), and subsequently followed by bulk density and Saturated Hydraulic Conductivity (Ksat). We then evaluated the integration of weather station data, RFR machine learning, and remotely sensed satellite imagery to predict soil moisture in nearby fields. Soil moisture predicted with an RFR algorithm using OPtical TRApezoidal Model (OPTRAM) moisture values, rainfall, standardized precipitation index (SPI) and percent clay showed high goodness of fit (r2=0.69) and low RMSE (0.053 m3 m-3). This research shows that the integration of weather station data, machine learning, and remote sensing tools can be used to effectively predict soil moisture in the Red River Valley of the North among a large diversity of cropping systems.
  • Item
    Increasing the Predictive Potential of Machine Learning Models for Enhancing Cybersecurity
    (North Dakota State University, 2021) Ahsan, Mostofa Kamrul
    Networks have an increasing influence on our modern life, making Cybersecurity an important field of research. Cybersecurity techniques mainly focus on antivirus software, firewalls and intrusion detection systems (IDSs), etc. These techniques protect networks from both internal and external attacks. This research is composed of three different essays. It highlights and improves the applications of machine learning techniques in the Cybersecurity domain. Since the feature size and observations of the cyber incident data are increasing with the growth of internet usage, conventional defense strategies against cyberattacks are getting invalid most of the time. On the other hand, the applications of machine learning tasks are getting better consistently to prevent cyber risks in a timely manner. For the last decade, machine learning and Cybersecurity have converged to enhance risk elimination. Since the cyber domain knowledge and adopting machine learning techniques do not align on the same page in the case of deployment of data-driven intelligent systems, there are inconsistencies where it is needed to bridge the gap. We have studied the most recent research works in this field and documented the most common issues regarding the implementation of machine learning algorithms in Cybersecurity. According to these findings, we have conducted research and experiments to improve the quality of service and security strength by discovering new approaches.
  • Item
    Machine Vision Methods for Evaluating Plant Stand Count and Weed Classification Using Open-Source Platforms
    (North Dakota State University, 2021) Pathak, Harsh
    Evaluating plant stand count or classifying weeds by manual scouting is time-consuming, laborious, and subject to human errors. Proximal remote sensed imagery used in conjunction with machine vision algorithms can be used for these purposes. Despite its great potential, the rate of using these technologies is still slow due to their subscription cost and data privacy issues. Therefore, in this research, open-source image processing software, ImageJ and Python that support in-house processing, was used to develop algorithms to evaluate stand count, develop spatial distribution maps, and classify the four common weeds of North Dakota. A novel sliding and shifting region of interest method was developed for plant stand count. Handcrafted simple image processing and machine learning approaches with shape features were successfully employed for weed species classification. Such tools and methodologies using open-source platforms can be extended to other scenarios and are expected to be impactful and helpful to stakeholders.
  • Item
    Using Machine Learning and Text Mining Algorithms to Facilitate Research Discovery of Plant Food Metabolomics and Its Application for Human Health Benefit Targets
    (North Dakota State University, 2020) Mathew, Jithin Jose
    With the increase in scholarly articles published every day, the need for an automated systematic exploratory literature review tool is rising. With the advance in Text Mining and Machine Learning methods, such data exploratory tools are researched and developed in every scientific domain. This research aims at finding the best keyphrase extraction algorithm and topic modeling algorithm that is going to be the foundation and main component of a tool that will aid in Systematic Literature Review. Based on experimentation on a set of highly relevant scholarly articles published in the domain of food science, two graph-based keyphrase extraction algorithms, TopicalPageRank and PositionRank were picked as the best two algorithms among 9 keyphrase extraction algorithms for picking domain-specific keywords. Among the two topic modeling algorithms, Latent Dirichlet Assignment (LDA) and Non-zero Matrix Factorization (NMF), documents chosen in this research were best classified into suitable topics by the NMF method validated by a domain expert. This research lays the framework for a faster tool development for Systematic Literature Review.
  • Item
    Rangeland Forage Growth Prediction, Logistics, Energy, and Economics Analysis and Tool Development Using Open-Source Software
    (North Dakota State University, 2022) Navaneetha Srinivasagan, Subhashree
    Forage availability was crucial for livestock production across the United States. Rangelands occupied vast areas 31 % of land and were the primary source of forage for livestock. However, extreme climatic conditions such as drought affect rangeland forage production and pose a serious threat to the rangeland enterprise. This increases the need to monitor forage in vast rangelands and adapt to other measures such as cultivating or buying forage to balance demand and supply. Despite this need, resources (studies and tools) on rangeland forage monitoring and existing forage production, handling, and economics were scattered and scarce. Therefore, a comprehensive systematic literature review was performed to gather the current understanding of the technology and resources used for monitoring and economics of forage production. Remote sensing technologies were widely used in recent research for their ability to scout vast areas frequently and machine learning (ML) in successfully comprehending complex relationships. Forage production economics was predominantly available for alfalfa forage crop, but other crops and bale collection logistics during production were ignored. Bale collection using conventional tractor carrying 1 and 2 bales/trip (BPT) and automatic bale picker (8-23 BPT) was simulated mathematically and analyzed with open-source R software using realistic equipment turning scenarios. Fuel consumption based on aggregation distance for ABP decreased on average by 72 % and 53 % compared to the tractor with 1 and 2 BPT. A web-based calculator tool was developed using open-source HTML, CSS, and JavaScript software for forage economic analysis including more than 10 varieties of forage crops involving the economics of bale collection (tractor and ABP). Pasture biomass yield prediction was performed with R software using vegetation index (VI) and climate data through ML approaches. Recursive feature selection (RFE) and random forest (RF) model for forage yield emerged as the best methodology based on accuracy. A web-based interactive tool was developed using the Shiny package in R to accommodate “field-specific,” pasture-scale inputs for predicting biomass yield. In conclusion, these successful results demonstrate the possibility of using open-source software for simulating logistics, developing models, and building tools for forage monitoring and analyzing the economics of forage production.
  • Item
    On the Feasibility of Machine Learning Algorithms Towards Low-Cost Flow Cytometry
    (North Dakota State University, 2023-08-01) Vandal, Noah
    Utilization low cost, scalable architectures for detection of specific cells for both mass flow and minute incidence analysis is something that is attractive for the clinical researcher, in order to expand access to otherwise costly devices. We demonstrate the use of a low-cost microfluidics device that performs detection of beads and cells, both for cell counting and for discrete cell type identification. This was accomplished using polymer technology via implementation of polydimethylsiloxane microfluidics, which were created by using a 3-D printed mold, and machine learning technologies with algorithms that can inference and track analyte particles within the microfluidic of interest. Our demonstration of our microfluidics device is proof that creating low cost instruments for analyte detection using current machine learning models and hardware is possible. We foresee the scalability of this design to be immense, in terms of throughput rate, inexpensiveness of product, and multiple different parameters and classes that can be searched for.
  • Item
    Soybean Leaf Chlorophyll Estimation and Iron Deficiency Field Rating Determination at Plot and Field Scales Through Image Processing and Machine Learning
    (North Dakota State University, 2020) Hassanijalilian, Oveis
    Iron deficiency chlorosis (IDC) is the most common reason for chlorosis in soybean (Glycine max (L.) Merrill) and causes an average yield loss of 120 million dollars per year across 1.8 million ha in the North Central US alone. As the most effective way to avoid IDC is the use of tolerant cultivars, they are visually rated for IDC by experts; however, this method is subjective and not feasible for a larger scale. An alternate more objective image processing method can be implemented in various platforms and fields. This approach relies on a color vegetation index (CVI) that can quantify chlorophyll, as chlorophyll content is a good IDC indicator. Therefore, this research is aimed at developing image processing methods at leaf, plot, and field scales with machine learning methods for chlorophyll and IDC measurement. This study also reviewed and synthesized the IDC measurement and management methods. Smartphone digital images with machine learning models successfully estimated the chlorophyll content of soybean leaves infield. Dark green color index (DGCI) was the best-correlated CVI with chlorophyll. The pixel count of four different ranges of DGCI (RPC) was used as input features for different models, and the support vector machine produced the highest performance. Handheld camera images of soybean plots extracted DGCI, which mimicked visual rating, and canopy size that were used as inputs to decision-tree based models for IDC classification. The AdaBoost model was the best model in classifying IDC severity. Unmanned aerial vehicle soybean IDC cultivar trial fields images extracted DGCI, canopy size, and their product (CDP) for IDC severity monitoring and yield prediction. The area under the curve (AUC) was employed to aggregate the data into a single value through time, and the correlation between all the features and yield was good. Although CDP at latest growth stage had the highest correlation with yield, AUC of CDP was the most consistent index for soybean yield prediction. This research demonstrated that digital image processing along with the machine learning methods can be successfully applied to the soybean IDC measurement and the various soybean related stakeholders can benefit from this research.
  • Item
    Extracting Useful Information and Building Predictive Models from Medical and Health-Care Data Using Machine Learning Techniques
    (North Dakota State University, 2020) Kabir, Md Faisal
    In healthcare, a large number of medical data has emerged. To effectively use these data to improve healthcare outcomes, clinicians need to identify the relevant measures and apply the correct analysis methods for the type of data at hand. In this dissertation, we present various machine learning (ML) and data mining (DM) methods that could be applied to the type of data sets that are available in the healthcare area. The first part of the dissertation investigates DM methods on healthcare or medical data to find significant information in the form of rules. Class association rule mining, a variant of association rule mining, was used to obtain the rules with some targeted items or class labels. These rules can be used to improve public awareness of different cancer symptoms and could also be useful to initiate prevention strategies. In the second part of the thesis, ML techniques have been applied in healthcare or medical data to build a predictive model. Three different classification techniques on a real-world breast cancer risk factor data set have been investigated. Due to the imbalance characteristics of the data set various resampling methods were used before applying the classifiers. It is shown that there was a significant improvement in performance when applying a resampling technique as compared to applying no resampling technique. Moreover, super learning technique that uses multiple base learners, have been investigated to boost the performance of classification models. Two different forms of super learner have been investigated - the first one uses two base learners while the second one uses three base learners. The models were then evaluated against well-known benchmark data sets related to the healthcare domain and the results showed that the SL model performs better than the individual classifier and the baseline ensemble. Finally, we assessed cancer-relevant genes of prostate cancer with the most significant correlations with the clinical outcome of the sample type and the overall survival. Rules from the RNA-sequencing of prostate cancer patients was discovered. Moreover, we built the regression model and from the model rules for predicting the survival time of patients were generated.
  • Item
    Intrusion Detection With an Autoencoder and ANOVA Feature Selector
    (North Dakota State University, 2021) Satyal, Rashmi
    Intrusion detection systems are systems that aim at identifying malicious activities or violation of policies in a network. The problem of high dimensionality in intrusion detection systems is a barrier in processing data and analyzing network traffic. This work aims at tackling problems associated with high data dimensionality using a feature selection technique based on one way ANOVA F-test before the classification process. It also involves study of autoencoder as a classification technique for network data as opposed to the traditional use of autoencoders in image data. Experiments have been conducted using the popular NSL-KDD dataset and the results of those experiments are compared with existing literature.