Exploring the non-coding genome at single-cell resolution

The human genome is comprised of a substantial amount of non-coding sequences, previously termed ‘junk-DNA’. This view has in recent years become less relevant as our understanding of the genome improves. Of the 6.2 billion base pairs in the genome, less than 2 % code for proteins, but the remaining  may still have important functions in the cell. Some of these functions include, but are not limited to regulatory regions involved in regulation of local or regionally adjacent genes, non-coding RNAs and 3D genomic folding. The involvement of the non-coding genome is a dynamic process that varies throughout human development, and is frequently disrupted in cancer or neurodegenerative diseases.

Bildet kan inneholde: produkt, utgangsenhet, laptop, gjøre, rektangel.

Recent technological advances in next-generation sequencing allows for the analysis of RNA-expression at single-cell resolution. Unlike bulk RNA-sequencing, which presents the average expression from a sample of multiple cells, single-cell RNA-sequencing (scRNA-seq) can be used to identify cell types, differentiation trajectories and infer expression markers specific for cell populations.

Open regions of DNA are associated with transcriptional activity. Assay for transposase accessible chromatin sequencing (ATAC-seq) is used to map open regions in the genome based on the sequence accessibility of the Tn5 transposase to cleave DNA in nucleosome depleted regions and is used at single-cell resolution (Buenrostro et al., 2015; Cusanovich et al., 2015). We utilize software packages to analyze in house scATAC-seq data and integrate with scRNA-seq data using Seurat (Satija et al., 2015; Stuart et al., 2019) to gain insight on the regulation of non-coding sequences and how the regulation is significant for cellular heterogeneity.

 

Our group has produced several single-cell sequencing data sets from embryonic stem cells and cancer cells. The master student will be introduced to single-cell sequencing analysis using these data. Additionally, the student will get the required theoretical background to understand the biological aspects of the data sets. Once the necessary methods for single-cell sequencing analysis has been acquired, the master student will develop his/her skills to analyze the non-coding parts of the genome.

Methods:

This project in bioinformatics will expose the student to a variety of state-of-the-art sequencing analysis methods and computational skills, including:

- RNA and ATAC-seq analysis

- High performance computing

- Software development

- Advanced scripting/programming in Python, R and Bash

 

We are looking for a student with programming skills and interest in bioinformatics.

Our lab has recently developed a GUI-based tool for analysis of single-cell RNA-seq and single-cell ATAC-seq (ShinyArchR.UiO, Bioinformatics DOI: 10.1093/bioinformatics/btab680, https://github.com/EskelandLab/ShinyArchRUiO). However, this tool is focused on protein coding genes and is thus not applicable for analysis of non-coding sequences. The aim of the master project is to develop a new tool which integrates single-cell RNA-seq and single-cell ATAC-seq analysis and focuses on the non-coding genome. The outcome of the project will yield a tool which will be of high value to a broad field of research focusing on epigenetic regulation in disease and development.

 

References

Baek,S. and Lee,I. (2020) Single-cell ATAC sequencing analysis: From data preprocessing to hypothesis generation. Computational and Structural Biotechnology Journal, 18, 1429–1439.

Buenrostro,J.D. et al. (2015) Single-cell chromatin accessibility reveals principles of regulatory variation. Nature, 523, 486–490.

Cusanovich,D.A. et al. (2015) Multiplex Single Cell Profiling of Chromatin Accessibility by Combinatorial Cellular Indexing. Science, 348, 910–914.

Satija,R. et al. (2015) Spatial reconstruction of single-cell gene expression data. Nat Biotechnol, 33, 495–502.

Stuart,T. et al. (2019) Comprehensive Integration of Single-Cell Data. Cell, 177, 1888-1902.e21.

Publisert 9. nov. 2021 14:05 - Sist endret 9. nov. 2021 14:06

Veileder(e)

Omfang (studiepoeng)

60