Computational Methods for Bulk and Single-cell Chromatin Interaction Data
Abstract
Chromatin interactions occur when the physical regions of chromatin in close proximity interact with each other inside the nucleus. Analyzing chromatin interactions plays a crucial role in deciphering the spatial organization of the genome. Identifying the significant interactions and their functionalities reveals great insights on gene expressions, gene regulations and genetic diseases such as cancer. In addition, single cell chromatin interaction data is important to understand the chromatin structure changes, diversity among individual cells, and the genomics differences between different cell types. In recent years, Hi-C, chromosome conformation capture with high throughput sequencing, has gained widespread popularity for its ability to map genome-wide chromatin interactions in a single experiment and it is capable of extracting both single cell and bulk chromatin interaction data.
With the evolution of experimental methods like Hi-C, computational tools are essential to efficiently and accurately process the vast amount of genomic data. Since the experiment costs are notably higher, optimized computational tools and methods are needed to extract most possible information from the data. Moreover, processing single cell Hi-C data imposes number of challenges due to its sparseness and limited interaction counts. So the development of computational methods and tools to process data from both single cell Hi-C and bulk Hi-C technologies are focused in this work and those are proven to be enhancing the efficiency and accuracy of Hi-C data processing pipelines.
In this dissertation, each chapter consists of a single individual method or a tool to enhance chromatin interaction processing pipelines and the final chapter focuses on the interplay between epigenetic data and chromatin interactions data. The studies that are focused on building computational methods include increasing data read accuracy for bulk Hi-C, identifying statistically significant interactions at single cell Hi-C data, and imputation of single cell Hi-C data to improve quality and quantity of raw reads. It is anticipated that the utilization of the tools and methods outlined in these studies will significantly enhance the workflows of future research on chromatin organization and its correlation with cellular functions and genetic diseases.