Browsing by Author "Chitraranjan, Charith Devinda"

Now showing 1 - 2 of 2

Frequent Substring-Based Sequence Classification Using Reduced Alphabets
(North Dakota State University, 2011) Chitraranjan, Charith Devinda
In recent years, various disciplines have generated large quantities of sequence data which has necessitated automated techniques for classifying these sequences into different categories of interest. Especially with the rapid rate at which biological sequence data has been emerging out of high throughput sequencing efforts, the need to interpret these large quantities of raw sequence data and gain deeper insights into them has become an essential part of modern biological research. Understanding the functions, localization and structure of newly identified protein sequences in particular has become a major challenge and is seeking the aid of computational techniques to keep up with the pace. In this thesis, we1 evaluate frequent pattern-based algorithms for predicting aforementioned attributes of proteins from their primary structure ( amino acid sequence). \Ve also apply our algorithms to datasets containing wheat Expressed Sequence Tags (ESTs) as an attempt to predict ESTs that are likely to be located near the centromere of their respective chromosomes. \Ve use frequent substrings mined from the training sequences as features to train a classifier. Our evaluation includes SVM and association rule-based classifiers. Some amino acids have similar properties and may substitute one another without altering the topology or function of a protein. Therefore, we use a combination of reduced amino acid alphabets in an attempt to capture patterns that may contain such substitutions. Frequent substrings mined from different alphabets are treated as features resulting from multiple sources and we evaluate both feature fusion and classifier fusion approaches towards multiple source prediction. 'We compare the performance of the different approaches using protein sub-cellular location, protein function and EST chromosomal location datasets. Pair-wise sequence-alignment-based Nearest Neighbor and basic SVM k-gram classifiers are also included as baseline algorithms in the comparison. Results show that frequent pattern-based SVM classifiers demonstrate better performance compared to other classifiers on the sub-cellular location datasets and they perform competitively with the nearest neighbor classifier on the protein function datasets. Our results also show that the use of reduced alphabets provides statistically significant performance improvements for the SVM-based classifier fusion algorithm, for half of the classes studied.
Tracking Vehicles from Mobile Phone Received Signal Strength Sequences
(North Dakota State University, 2015) Chitraranjan, Charith Devinda
We address the problem of tracking vehicles from received signal strength (RSS) sequences generated by mobile phones carried in them. Our main objectives are to provide travel-time estimates for selected roads and provide personal navigation assistance when GPS is unavailable or undesirable. A mobile phone periodically measures the RSS levels from the associated cell tower and several (six for GSM) strongest neighbor cell towers. Each such measurement is known as an RSS fingerprint. In Chapter 3, we propose local alignment of mobile phone RSS measurements to track vehicles. We use local alignment instead of the traditionally used global alignment to allow for vehicles changing roads. More specifically, we use local dynamic time warping to align the RSS sequence of a phone, to a reference sequence that we had collected for the relevant road. Due to fluctuations in RSS levels and other effects, even at the same location, the set of cell towers reported in a fingerprint and their reported RSS levels vary over time. To model these variations, in Chapter 4.1, we propose a complete observation model for RSS fingerprints that specifies for each gird-location in the area of interest, the distribution of the probability of observing any fingerprint at that location. We then use it with a Dynamic Bayesian Network to track vehicles. Unlike traditional observation models, which model only the variation of the RSS levels, we model the variation of the set of cells reported in fingerprints as well. Accurate estimation of the parameters of either traditional or our complete observation model requires recording fingerprints by driving on the roads of interest, which is tedious and expensive. Therefore, to avoid such driving, we propose unsupervised learning in Chapter 5 to estimate model parameters using RSS sequences of phone calls made by road-users. Experiments with RSS data collected on five roads demonstrate that our proposed algorithms produce lower errors than relevant existing methods. Furthermore, application of our algorithms to real subscriber call traces produced travel-time estimates for a given road segment that were, on average, within 13% - 14% of travel-times computed through license plate recognition.