In this topic, we introduce Illumina/Solexa, one of the most popular methods of nucleic acid sequencing. We will cover the basic principles of the technology and discuss its general protocol.
Basic principles of Illumina sequencing
Illumina technology is based on the principle of sequencing by synthesis. This means that signal detection is performed during the synthesis process of a strand of DNA that is complementary to the single-stranded template DNA. Templates are usually short (400-600bp) DNA molecules fixed to a solid surface by specific adapters. This surface is called a flow cell. Each of the four deoxynucleotide triphosphates (dNTPs) or DNA building blocks has a different, specific fluorophore label. Each dNTP is also modified so that if it binds to a growing chain, it will stop the synthesis process. This allows the researcher to record the signal associated with each type of dNTP after every step of the reaction.
Illumina sequencing read length is typically 50-150 bp. Illumina is the only sequencing technology that allows you to perform paired-end sequencing, which means you can read letters from both sides of the fragment. As a result, you get a pair of 150 bp sequences that are closely located in the original DNA but are separated by an unread fragment.
Overall, Illumina technology allows you to read millions of templates at the same time with very high quality. Compared to Sanger sequencing technology, the read length is shorter but the amount of DNA you can process is much higher.
Protocol of Illumina sequencing
Drilling down, the protocol of Illumina sequencing contains several steps:
- DNA or RNA extraction. This is the process of isolating the nucleic acid from a biological sample like a cell population or a tissue sample. In the case of RNA, after extraction, a reverse transcription step is needed to create cDNA molecules from the isolated RNA. This is because RNA itself is too unstable to be sequenced and therefore needs to be converted into stable double-stranded DNA. The result of this stage is the DNA solution used for next steps.
-
DNA fragmentation. Illumina sequencing is an example of "shotgun sequencing." This term originated from the fact that the original DNA sample template molecules are fragmented into pieces approximately 400-600 bp in length to allow cluster formations and increase the efficiency of the sequencing process. DNA fragmentation is achieved by sonication.
-
Addition of adapters. The adapter ligation step is when additional artificial sequences are chemically connected to the DNA fragments. Adapters consist of three main parts: the first allows the strand to bind to a solid surface called a flow cell. This part of the adapter is specific to the 5' or the 3' end of DNA (P7 and P5). Also, this part of the adapter allows initial library PCR amplification. The second part of the adapter is the index of a sample to do different experiments simultaneously (i7 index and i5 index). The last part is an adapter that contains the primer binding site to start the synthesis in the process of sequencing. Due to the differences between adapters at the 3' and 5' end, Illumina sequencing allows you to distinguish between forward and reverse reads in pair-end mode.
-
Binding to flow cell and cluster formation (bridge PCR). The adapters added in the previous step are complementary to short sequences attached to the flow cell surface. This allows the template strands to bind to the flow cell surface. As the signal from a single DNA molecule is low, PCR-based template amplification is used to create identical clones in the same position in the flow cell to amplify the fluorescence. This process is called "bridge PCR," as the molecules subsequently attach to the flow cell with the shape of a bridge.
-
Sequencing. The original DNA molecules are attached at one end to a solid surface and thus are fixed at one point in space. All the nucleotides added are modified to cause the chain synthesis to be suspended after every addition made by the DNA polymerase. The nucleotides are fluorescently labeled, and at every stop, the color of the fluorophore is recorded, indicating which nucleotide has joined the chain. Then, the fluorophore and the modification are cleaved off from the last attached nucleotide, and the process loops. As each of the 4 letters gives a different fluorescent signal at a particular position in a flow cell, we manage to recreate millions of the initial nucleic acid sequences.
In the process of sequencing, each letter is determined from one sequencing cycle, therefore the read length remains identical for the whole library. Usually, 150 cycles for both sides are used for DNA-seq data and 75 cycles for RNA-seq data. For small RNAs, it is sufficient to use single-end sequencing with 50 cycles because of the small length of each molecule.
Following the sequencing procedure, there is always a computational data analysis step.
Here you can find a detailed video explanation of Illumina sequencing.
Conclusion
In this topic, we introduced the general idea of Illumina sequencing and the basic steps of its protocol. This technique allows you to collect information about millions of reads in parallel using fluorescence-based signal detection after each nucleotide binding. Library preparation requires fragmentation, adapter ligation, initial PCR amplification, and cluster amplification via bridge PCR to increase the signal. Due to its extreme precision and the ability to process enormous amount of data, Illumina has been the most popular sequencing technique since it was first developed.