Hi-C is currently the most widely used assay to investigate the 3D organization of the genome and to study its role in DNA replication, gene expression, and disease. In this topic, we will discuss the basic steps of the Hi-C protocol and some interaction patterns of Hi-C maps.
Why is chromatin organization important?
It is known that eukaryotic DNA is intricately packed in the nucleus. First, the DNA molecule is never naked — it is always associated with many proteins. The complex of DNA and proteins is called chromatin. You can recall how chromatin is packed into the nucleus in this video.
Indeed, proper DNA folding ensures flexible connections between various chromatin components. As the DNA molecule should be tightly packed, many genome regions remain highly accessible to factors regulating transcription, replication, and other processes. In addition, gene expression may be tuned by physical contact between two or more interacting genomic loci. Contact participants are usually closely located in 3D space, but may be separated by many nucleotides in the linear genome or even located on distinct chromosomes. Discovering chromatin organization in mammals and other species is an interesting task on its own. More fascinating, chromatin organization provides insight into mechanisms of gene regulation.
Chromosome conformation capture
Chromosome conformation capture (3C) techniques are a set of molecular biology methods used to study chromatin structure. 3C and 3C-based experiments use the same strategy to reveal the spatial organization of chromatin, that is, the ligation of DNA pieces that are physically close to each other in the nucleus. Let's look at the principles of the proximity ligation assay:
1. First, cells are fixed with chemical reagents, most often formaldehyde. The purpose of this step is to "freeze" cells in a particular state, otherwise, dynamic changes in chromatin would inevitably occur. Formaldehyde readily permeates cell membranes and cross-link macromolecules, creating protein-protein and DNA-protein covalent bonds.
2. The fixed chromatin is then cut with a restriction enzyme. The resulting sticky ends, or 5'-overhangs, are filled in with biotin-labeled nucleotides.
3. Resulting blunt-end segments are ligated. At this step, we expect that ligation will preferentially occur among DNA fragments that are close in a 3D space.
4. Biotin pull-down is performed to select only the biotinylated junctions. Then, interactions are detected by polymerase chain reaction (PCR) or by Next-generation sequencing (NGS), depending on the method.
There are several "generations" of 3C methods that differ in scale. In 2002, Dekker et al. developed a Chromosome Conformation Capture (3C), a technology that allowed the analysis of spatial relationships between two pre-selectedgenomic sites. This approach is also called "one-versus-one" as it is limited to two known DNA sequences and requires specific primers to amplify regions of interest. To overcome these issues, several 3C-derived methods have been developed that generate higher throughput chromatin interaction data. Among variants of the 3C technique, the Hi-C technique was the first method to capture chromosome conformation on an "all-versus-all" basis — that is, it can profile contacts of virtually any pair of genomic loci. This technique combines the 3C procedure with next-generation sequencing that allows the revealing of a holistic view of the 3D genome structure in one assay.
Hi-C data analysis
Hi-C interactions are simply chimeric ligation products, formed of two distinct genomic fragments joined at the middle. Ligation products are traditionally sequenced using paired-end technology to cover both parts of the chimeric molecule. In this scenario, a forward read represents the sequence from the first DNA site, while the reverse one — from the second DNA site.
Our next goal is to find a unique alignment for each read on the genome of interest. Again, forward and reverse reads from one pair may align to the genomic regions that are many nucleotides away from each other. After read mapping, we get the genomic coordinates of each read on the reference genome.
Read pairs can be represented in the form of a contact map, as shown in the picture below. In typical contact maps, chromosomes or their parts are plotted against one another. Each dot represents one ligation event or one contact of two DNA loci.
In theory, we can analyze raw contacts and count interaction frequencies between exact places to which reads were mapped. However, the space of all possible pairwise interactions, which are surveyed in Hi-C, is very large. In light of this, Hi-C data are often very sparse —many genomic loci have very few or zero contacts. Data sparsity is problematic for various statistical data models.
To overcome this issue, the reference genome is usually partitioned into genomic intervals (bins). Each read is assigned to the genomic bin. Then, the amount of contacts (interaction frequency) between two genomic bins is calculated as shown in the above picture. Genome binning gives us more counts in sparse data, however, at the expense of resolution. The large bin size does not allow for analyzing interactions on a fine scale.
Common Hi-C map patterns
Contacts can be visualized as maps, where more intense color indicates higher contact frequency between two genomic regions. Let's take a look at a real Hi-C map of Bellardia pandia (blowfly). Here we can see more intense brown boxes, or frequently interacting genomic sites. In contrast, white spaces indicate that very few interactions occur between two loci. Here, you may evidence that Hi-C map patterns are not random — the diagonal looks more intense and the whole map resembles a plaid pattern. Let's discuss, why some of these patterns occur in many Hi-C maps.
Chromosome territories. When considering organisms with multiple chromosomes, we encounter two types of contacts. Loci within one chromosome form cis interactions. In contrast, loci from different chromosomes form trans interactions. It is established now that contacts within chromosomes (cis-contacts) are much more frequent than contacts between chromosomes (trans-contacts). This fact is observed as more intense squares on the diagonal when chromosomes are plotted against one another. Indeed, according to the chromosome territory concept, interphase chromosomes occupy distinct regions of a cell's nucleus, which is supported by Hi-C data.
A/B compartments. Although distinct chromosomes tend to occupy their limited space, neighboring chromosomes can "invade" each other's territories. Chromosomes' regions cluster to form separate compartments within the nucleus, which are euchromatin (active, A) and heterochromatin (inactive, B). Regions in compartment A interact preferentially within its compartment rather than with B-compartment-associated regions, and vice versa. A and B compartments are visible in Hi-C matrices by a characteristic "checkerboard" or "plaid" pattern, as you may evidence in the following schematic picture.
Local patterns. Before, we discussed global Hi-C map patterns. Let's now look in detail at intrachromosomal contacts. When analyzing local patterns within a single chromosome, we evidence dense and sparse contact regions. Some dense regions may be referred to as structures known as topologically associating domains (TADs). These are regions that can be described by dense intradomain contact frequency, and reduced contacts outside the domain. On an even finer scale, Hi-C data has been used to identify specific point contact between distant chromatin regions. Cis point contacts are called chromatin loops, and they are apparent as intense point-to-point interactions in Hi-C maps. This level of analysis is especially challenging for the resolution limit of Hi-C data.
Hi-C applications
The study of the spatial structure of chromosomes in the nucleus helps researchers understand many biological processes such as gene transcription, replication, repair, and regulation. For example, two ends of chromatin loops may represent regulatory elements (e.g. enhancer and promoter) that control various genes. Another observation is that changes in A/B compartments occur in diseased cell lines and affect transcription.
Hi-C can reveal chromatin conformation changes during dynamic changes, such as cell cycle stages. Moreover, Hi-C is useful to describe chromatin architecture in different cancers and their impact on disease pathogenesis. For instance, modification of TAD boundaries is associated with disease and cancer. Also, we can catch chromosome translocations that are considered the primary cause of many cancers. These chromosome abnormalities can be detected as abnormal Hi-C map patterns.
Moreover, 3D genome structures are used to track evolutionary changes. For example, we can describe whether TADs and other structures are conserved across species. Surprisingly, Hi-C data is used in the field of genome assembly. Hi-C provides interaction data over very large genomic distances that is useful for ordering sets of contigs (for details, see the Scaffolding topic).
Conclusion
Hi-C is a technique that allows profiling interactions for all read pairs in an entire genome. The most basic way to represent Hi-C data is in matrix format and in the form of a contact map. To construct the matrix, the number of interactions between sets of regions is calculated. On the different levels of mammalian Hi-C maps, well-defined structures can be traced: chromosomal territories, TADs, and chromatin loops. These features of chromatin organization may shed light on the gene expression regulation or reflect some pathological conditions.