A Data Mining Approach to Radiation Hybrid Mapping
Abstract
The task of mapping markers from Radiation Hybrid (RH) mapping experiments is typically viewed as equivalent to the traveling-salesman problem, which has combinatorial complexity. As an additional problem, experiments commonly result in some unreliable markers that reduce the overall map quality. Due to the large numbers of markers in current radiation hybrid populations, the use of the data mining techniques becomes increasingly important for reducing both the computational complexity and the impact of noise of the original data. In this dissertation, a clustering-based approach is proposed for addressing both the problem of filtering unreliable markers (framework maps) and the problem of mapping large numbers of markers (comprehensive maps) efficiently. Traditional approaches for eliminating unreliable markers use resampling of the full data set, which has an even higher computational complexity than the original mapping problem. In contrast, the proposed algorithms use a divide-and-conquer strategy to construct framework maps based on clusters that exclude unreliable markers. The clusters of markers are ordered using parallel processing and are then combined to form the complete map. Three algorithms are presented that explore the trade-off between the number of markers included in the framework map and placement accuracy. Since the mapping problem is susceptible to noise, it is often beneficial to remove markers that are not trustworthy. Traditional mapping techniques for building comprehensive maps process all markers together, including unreliable markers, in a single-iteration approach. The accuracy of the constructed maps may be reduced. In this research work, two-stage algorithms are proposed to mapping most markers by first creating a framework map of the reliable markers, and then incrementally adding the remaining markers to construct high quality comprehensive maps. All proposed algorithms have been evaluated on several human chromosomes using radiation hybrid datasets with varying sizes, and also the performance of our proposed algorithms is compared with state-of-the-art RH mapping softwares. Overall, the proposed algorithms are not only much faster than the comparative approaches, but that the quality of the resulting maps is also much higher.