NDSU Repository

Search Results

Now showing 1 - 10 of 28

Stock Price Prediction Using Recurrent Neural Networks
(North Dakota State University, 2018) Jahan, Israt
The stock market is generally very unpredictable in nature. There are many factors that might be responsible to determine the price of a particular stock such as the market trend, supply and demand ratio, global economy, public sentiments, sensitive financial information, earning declaration, historical price and many more. These factors explain the challenge of accurate prediction. But, with the help of new technologies like data mining and machine learning, we can analyze big data and develop an accurate prediction model that avoids some human errors. In this work, the closing prices of specific stocks are predicted from sample data using a supervised machine learning algorithm. In particular, a Recurrent Neural Network (RNN) algorithm is used on time-series data of the stocks. The predicted closing prices are cross checked with the true closing price. Finally, it is suggested that this model can be used to make predictions of other volatile financial instruments.
Credit Card Fraud Detection Predictive Modeling
(North Dakota State University, 2020) Sharma, Nishant
Finance fraud is a growing problem with consequences in the financial industry and data mining has been successfully applied to huge volume of complex financial datasets to automate and analyze credit card frauds in online transactions. Data Mining is challenging process due to two major reasons–first, profiles of normal and fraudulent behaviors change frequently and second, card fraud data sets are highly skewed. This paper investigates and checks the performance of Random Forest Classifier, AdaBoost Classifier, XGBoost Classifier and LightGBM Classifier on highly skewed credit card fraud data. Dataset of credit card transactions is sourced from European cardholders containing 284,786 transactions. These techniques are applied on the raw and preprocessed data. The performance of the techniques is evaluated based on accuracy, sensitivity, specificity, precision. The results indicate about the optimal accuracy for Random Forest, AdaBoost, XGBoost and LightGBM classifiers are 85%, 83%, 97.4%, and 93% respectively.
Mining Connected Frequent Boolean Expressions
(North Dakota State University, 2017) Kolte, Deepak
In this paper, we are finding Connected Frequent Boolean Expressions from cancer dataset [14] and protein protein interaction network [14] to discover group of dysregulated genes. Frequent Itemset Mining is a process of finding different sets of items that occur together frequently in a set of transactions. These itemsets are called Frequent Itemsets (FBE). Connected FBE (CFBE) are a group of items that not only classify as FBE but they are also connected in a graph/network. The nodes in this graph are the items and the edges between them indicate relationships. This can particularly be very helpful in cases where the items are not independent of each other and the presence of one item with another specific item can decide whether the group of items will be frequent or not.
An Interactive Visualization System for a Multi-Level View of Opinion Mining Results
(North Dakota State University, 2014) Chitkara, Karan
Products are purchased and sold on a daily basis and people tend to critique on products they purchase. Those who want to buy a product will read reviews on that product given by others before buying; likewise those who have already bought a product will write a review on it. This paper presents a technique for visualizing data that comes from reviews given online for different products. My contribution to this project is to create a tool and process the tagged files generated with the help of machine learning. This project also focuses on the implementation of Semantic matching which reduces redundancy by grouping similar data together. Semantic matching helps put all the synonyms of the data together. Implementation of Semantic matching is supported by the implementation of error correction technique. Error correction improves data quality by correcting spelling mistakes made by people while writing reviews.
Perceptions of Genetically Modified Foods by Gender
(North Dakota State University, 2016) Lu, Yang
Twitter is one of the most popular worldwide social networking services. It has more than 320 million monthly active users around the world. So it’s a very good way to discover what’s happening in the world and we can even get people’s opinions of some topics through their posts. Genetically modified food is one of the hottest topics all over the world. For the work of this paper, our aim is to determine people’s opinions concerning GMOs, but also interested in whether there are differences by gender. To achieve the goal, the idea is to capture a large set of Twitter feeds that all include a reference to GMOs, then carry out analytics on the tweets to classify them by gender, then carry out statistical tests aimed at identifying differences in perceptions by gender.
A Comparative Study on Different Big Data Tools
(North Dakota State University, 2020) Ibtisum, Sifat
Big data has long been the topic of fascination for computer science enthusiasts around the world, and has gained even more prominence in recent times with the continuous explosion of data resulting from the likes of social media and the quest for tech giants to gain access to deeper analysis. This paper discusses various tools in big data technology and conducts a comparison among them. Different tools namely Sqoop, Apache Flume, Apache Kafka, Hive, Spark and many more are included. Various datasets are used for the experiment and a comparative study is made to figure out which tool works faster and more efficiently over the others, and explains the reason behind this.
Disease Similarity Using Biological Module Dysregulation Profile
(North Dakota State University, 2016) Zaman, Eshita
Diseases can be grouped according to phenotypic and genotypic similarities. Gene expression and micro-RNA data paved the way to look inside the genetic coding and classify diseases accurately. Modern system biology seeks to understand the underlying protein complexes in a cell and how they are altered in disease condition. In this research, we aimed to mine cohesive biological modules from large micro-RNA dataset and show the genes in these modules are dysregulated in a number of diseases. We used 13 different types of cancer and DME algorithm to extract dense modules satisfying a user defined density. Binary attribute proles of genes are also provided. We have shown that disease similarity based on the average module dysregulation yield disease pairs that share common disease genes. Collectively, we have concluded that the recurrence of these modules in different cancer types increase the therapeutic opportunity to treat more diseases with existing drugs.
Critical Information Retrieval from Emails
(North Dakota State University, 2014) Sen, Souvik
With efficiency being a driving force in today’s ecommerce, emails have become a major form of communication. However, deciphering information from these emails has been a be-labored task. With every email containing proprietary information, handling this information has become an arduous task that requires money, time and effort. To tackle this ecommerce problem, a computerized solution is important to expedite the extraction of information. In this paper, we have applied Named Entity Recognition, different rules and algorithms to extract important information from emails. The proposed solution tackles challenges revolving around tabular and natural language formats, which are the largest formats used for email communication. Use of this solution makes business easier to navigate through a variety of attachments: PDF, Word, and Excel. The proposed application has been applied on a dataset supplied by a Transportation company by which the results have been captured.
Semantics-Based Calorie Calculator
(North Dakota State University, 2017) Narra, Sravan Raghu Kumar
In recent years, people are considering healthy diet habits and many of them are trying to track and maintain their daily diet and consumption. To assist them, there are many applications available online and those applications are capable of recording calories for the ingredients consumed, but users must check individual calories and calculate total calories manually. In this paper, we propose a new technique to calculate calories for a given recipe in multiple formats. The new technique uses tokenization, hashing techniques and fuzzy matching for entity extraction and finally does the unit conversion to calculate calories. We compared the results of the proposed technique with the outcomes of the existing applications. These results proved that the new technique has the capacity to produce similar results compared to that of the existing applications and able to calculate calories for recipes in the different formats available on the internet.
Prediction Accuracy of Financial Data - Applying Several Resampling Techniques
(North Dakota State University, 2020) Ali, Mohammad Reza
With the help of Data Mining and Machine Learning, prediction has been a very popular and demanding instrument to plan and accomplish a future goal. The financial sector is one of the crucial sectors of present human society. Predicting the correct outcome is a pivotal matter in this sector. In this work, an assessment was done to the prediction efficiency by applying several Machine Learning Classification Algorithms and resampling methods. These techniques were applied to financial data, more specifically to Bank Marketing in order to predict the tendency of clients to subscribe to a bank term deposit. For the correct prediction of the outcome, imbalance in the data set affects the results greatly. Consequently, the prediction becomes inaccurate. Researchers are working this issue and many investigators are using different methods. This research paper uses some sampling techniques together with several conventional Machine Learning algorithms to improve the prediction precision.

Filters

Settings

Sort By

Results per page

Search Results