NDSU Repository

Search Results

Now showing 1 - 10 of 12

Stock Price Prediction Using Recurrent Neural Networks
(North Dakota State University, 2018) Jahan, Israt
The stock market is generally very unpredictable in nature. There are many factors that might be responsible to determine the price of a particular stock such as the market trend, supply and demand ratio, global economy, public sentiments, sensitive financial information, earning declaration, historical price and many more. These factors explain the challenge of accurate prediction. But, with the help of new technologies like data mining and machine learning, we can analyze big data and develop an accurate prediction model that avoids some human errors. In this work, the closing prices of specific stocks are predicted from sample data using a supervised machine learning algorithm. In particular, a Recurrent Neural Network (RNN) algorithm is used on time-series data of the stocks. The predicted closing prices are cross checked with the true closing price. Finally, it is suggested that this model can be used to make predictions of other volatile financial instruments.
Health Risk Prediction Using Big Medical Data - a Collaborative Filtering-Enhanced Deep Learning Approach
(North Dakota State University, 2018) Li, Xin
Deep learning has yielded immense success on many different scenarios. With the success in other real world application, it has been applied into big medical data. However, discovering knowledge from these data can be very challenging because they normally contain large amount of unstructured data, they may have lots of missing values, and they can be highly complex and heterogeneous. In these cases the deep neural network itself is not applicable enough. To solve these problems we propose a Collaborative Filtering-Enhanced Deep Learning Approach. In particular, first we estimate missing values based on the information mining from the structured and unstructured data. Secondly, a deep neural network-based method is applied, which can help us handle complex and multi-modality data. The proposed algorithm is applied to analyze big medical data and make personalized health risk prediction. Extensive experiments on real-world datasets show improvements of our proposed algorithm over the state-of-the-art methods.
Object Classification Using Stacked Autoencoder and Convolutional Neural Network
(North Dakota State University, 2016) Gottimukkula, Vijaya Chander Rao
In the recent years, deep learning has shown to have a formidable impact on object classification and has bolstered the advances in machine learning research. Many image datasets such as MNIST, CIFAR-10, SVHN, Imagenet, Caltech, etc. are available which contain a broad spectrum of image data for training and testing purposes. Numerous deep learning architectures have been developed in the last few years, and significant results were obtained upon testing against datasets. However, state-of-the-art results have been achieved through Convolutional Neural Networks (CNN). This paper investigates different deep learning models based on the standard Convolutional Neural Networks and Stacked Auto Encoders architectures for object classification on given image datasets. Accuracy values were computed and presented for these models on three image classification datasets.
Comparison of RNN, LSTM and GRU on Speech Recognition Data
(North Dakota State University, 2018) Shewalkar, Apeksha Nagesh
Deep Learning [DL] provides an efficient way to train Deep Neural Networks [DNN]. DDNs when used for end-to-end Automatic Speech Recognition [ASR] tasks, could produce more accurate results compared to traditional ASR. Normal feedforward neural networks are not suitable for speech data as they cannot persist past information. Whereas Recurrent Neural Networks [RNNs] can persist past information and handle temporal dependencies. For this project, three recurrent networks, standard RNN, Long Short-Term Memory [LSTM] networks and Gated Recurrent Unit [GRU] networks are evaluated in order to compare their performance on speech data. The data set used for the experiments is a reduced version of TED-LIUM speech data. According to the experiments and their evaluation, LSTM performed best among all other networks with a good word error rate at the same time GRU also achieved results close to those of LSTM in less time.
Performance Comparison of Apache Spark MLlib
(North Dakota State University, 2018) Sharma, Pallavi
This study makes an attempt to understand the performance of Apache Spark and the MLlib platform. To this end, the cluster computing system of Apache Spark is set up and five supervised machine learning algorithms (Naïve-Bayes, Decision Tree, Random Forest, Support Vector Machine and Logistic Regression) were investigated. Among the available cluster modes, these algorithms were implemented on two cluster modes, Local and GPU Cluster mode. The performance metrics such as classification accuracy, area under ROC and area under PR for the algorithms were investigated by considering three datasets. It is concluded that the algorithms are computed in parallel in both the modes with GPU Cluster mode performing better than the Local mode for all algorithms in terms of time taken for completion. However, the mentioned performance metrics were not affected in the two modes hinting that the parallel computation does not play a major role in determining these metrics.
Prediction Accuracy of Financial Data - Applying Several Resampling Techniques
(North Dakota State University, 2020) Ali, Mohammad Reza
With the help of Data Mining and Machine Learning, prediction has been a very popular and demanding instrument to plan and accomplish a future goal. The financial sector is one of the crucial sectors of present human society. Predicting the correct outcome is a pivotal matter in this sector. In this work, an assessment was done to the prediction efficiency by applying several Machine Learning Classification Algorithms and resampling methods. These techniques were applied to financial data, more specifically to Bank Marketing in order to predict the tendency of clients to subscribe to a bank term deposit. For the correct prediction of the outcome, imbalance in the data set affects the results greatly. Consequently, the prediction becomes inaccurate. Researchers are working this issue and many investigators are using different methods. This research paper uses some sampling techniques together with several conventional Machine Learning algorithms to improve the prediction precision.
Brain Cancer Detection Using MRI Scans
(North Dakota State University, 2020) Thotapally, Shanthanreddy
An estimate of about 700,000 Americans today live with a brain tumor. Of these, 70% are benign and 30% are malicious. The average survival rate of all the malicious brain tumor patients is 35%. Diagnosing these tumors early on gives the best chance for survival. The Doctors use MRI scans to identify the presence of a tumor and it’s characteristics like the type and size. In this paper, I implemented a Deep learning convolutional neural network model that classifies the brain tumors using MRI scans. We shall use VGG-16 deep-learning approach to implement the machine learning algorithm. The proposed system can be divided into 3 parts: data input and preprocessing, building the VGG-16 model, image classification using the built model. Using this approach, I have achieved 80% accuracy. The accuracy of the model developed will depend on how correctly the affected brain tumor images can be classified from the unaffected.
Image Classification Using Transfer Learning and Convolution Neural Networks
(North Dakota State University, 2020) Burugupalli, Mohan
In the recent years, deep learning has shown to have a formidable impact on image classification and has bolstered the advances in machine learning research. The scope of image recognition is going to bring big changes in the Information Technology domain. This paper aims to classify medical images by leveraging the advantages of Transfer Learning over Conventional methods. Three types of approaches are used namely, pre-trained CNN as a Feature Extractor, Feature Extractor with Image Augmentation, and Fine-tuning with Image Augmentation. The best pre-trained network architectures such as VGG16, VGG19, ResNet50, Inception, Xception and DenseNet are used for classification with each being applied to all the three approaches mentioned. The results are captured to find the best combination of pre-trained network and an approach that classifies the medical datasets with a higher accuracy.
Feature Engineering on the Cybersecurity Dataset for Deployment on Software Defined Network
(North Dakota State University, 2020) Rifat, Nafiz Imtiaz
These days, due to dependency on the fast-moving world's modern technology, the increasing use of smart devices and the internet affect network traffic. Many intrusion detection studies concentrate on feature selection or reduction because some of the features are not correlated with the target variable, and some are redundant, which results in a tedious detection process and decrease the performance of an intrusion detection system (IDS). Our purpose is not to use all the features available but to take only the essential features; therefore, the process can be effective and efficient. In this paper, we have applied feature reduction algorithms on the NSL-KDD dataset for choosing a different kind of combination of features based on importance, similarity, correlation as an input to five classification algorithms to evaluate and determine the best performing model to deploy on a Software Defined Network (SDN) to reduce the dimension of the selected features.
Naïve Bayes Classifier: A MapReduce Approach
(North Dakota State University, 2014) Zheng, Songtao
Machine learning algorithms have the advantage of making use of the powerful Hadoop distributed computing platform and the MapReduce programming model to process data in parallel. Many machine learning algorithms have been investigated to be transformed to the MapReduce paradigm in order to make use of the Hadoop Distributed File System (HDFS). Naïve Bayes classifier is one of the supervised learning classification algorithm that can be programmed in form of MapReduce. In our study, we build a Naïve Bayes MapReduce model and evaluate the classifier on five datasets based on the prediction accuracy. Also, a scalability analysis is conducted to see the speedup of the data processing time with the increasing number of nodes in the cluster. Results show that running the Naïve Bayes MapReduce model across multiple nodes can save considerate amount of time compared with running the model against a single node, without sacrificing the classification accuracy.

Filters

Settings

Sort By

Results per page

Search Results