The central dogma of molecular biology says that DNA is the basis for RNA, and RNA is the basis for protein. But nucleic acids and proteins are very different molecules. Nucleic acids are made up of 4 types of nucleotides, but proteins are made up of 20 different amino acids. How to translate information from the language of nucleic acids into the language of amino acids?
Genetic code
As you already know from previous topics (read topic Central dogma), proteins are assembled from amino acids during translation.
The system for how genetic information recorded in a sequence of nucleotides is turned into a sequence of amino acids that make up a functional protein is called the genetic code.
The ribosome is the organelle that all cells have to facilitate translation by moving along the messenger RNA chain, where it reads each group of three nucleotides and adds the correct amino acid to the growing polypeptide. These sets of 3 nucleotides are called codons or triplets and code for 1 amino acid.
The ribosome reads nucleotides in triplets, i.e., three at a time and puts an amino acid in place of the triplet. Because nucleic acids are made up of 4 different types of nucleotides (adenine, thymine (uracil in RNA), cytosine, guanine), and only 3 of them are required in a triplet, the number of all possible non-repeating, order-specific triplets for encoding amino acids is = 64. But only 20 essential amino acids are involved in protein synthesis. What does that mean? One amino acid can be encoded by different triplets sometimes, but one triplet only encodes one specific amino acid. Scientists have long known which triplet encodes which amino acid and have created a table summarizing the whole genetic code.
The genetic code table
Biologists will often need to manually decipher DNA sequences. To do this, they use a genetic code table.
It is quite simple to use. In the center is the nucleotide that will be first in the triplet, in the next circle — the second, and in the next circle is the third. in the last outer circle are abbreviations from the full names of amino acids. For example: CAU is Histidine (His). GUC — Valine (Val). AAA — lysine (Lys) and so on.
But, as you have already noticed, there are 3 strange STOP words in the table, corresponding to UAA, UAG, UGA. This is not a mistake, these are stop codons (termination codons). They do not code for an amino acid, their purpose is to tell the ribosome that protein synthesis ends at this point. Having stumbled upon it, one can understand that the translation must be interrupted for some reason. In addition to telling the ribosome when a correct protein is complete, the stop codons are also an effective method of dealing with errors that arise during the translation process (the "wrong" protein will simply not be synthesized because sooner or later a stop codon will accidentally form).
The AUG codon (sometimes GUG in bacteria) not only encodes the amino acids methionine (valine in case of GUG), but also initiates translation.
Properties of the genetic code
The genetic code has several properties. Some authors highlight more, but we will talk about the main ones, because the rest are one way or another a consequence of them.
Triplicity means that one amino acid is always encoded by 3 nucleotides. There are 61 sense codons and 3 stop codons.
Degeneracy, or redundancy. The same information can be written in different ways, that is, the same amino acid can be coded for by several codons. Thus, the amino acid leucine can be encoded by six triplets — UUA, UUG, CUU, CUC, CUA, CUG. Valine is encoded by four triplets, phenylalanine by two, and only tryptophan and methionine are encoded by one codon. This redundancy is needed to reduce the probability of translation errors in the most frequently occurring amino acids. Their codons are similar, and if the UUG codon is accidentally entered in the case of UUA, there will be no error.
Unambiguity tells us that each triplet corresponds to only one amino acid. Otherwise, during the translation of the same codon, different amino acids would appear in the chain, which would change the primary structure and properties of the protein, and complete chaos would reign in the body.
The non-overlapping of the genetic code is characteristic of most organisms. This means that one specific nucleotide cannot be part of 2 or more codons at the same time. That is, a sequence of 6 nucleotides will encode 2 amino acids, not 3, 4, or 5.
The universality of the genetic code means that, with few exceptions, codons code for the same amino acids in all living organisms. Thanks to this property, people have learned to force genetically modified bacteria to produce proteins that people need. For example, today this is how they make insulin for patients with diabetes.
Conclusion
Genes are responsible for the storage and transmission of biological information from generation to generation. There is a system of recording hereditary information called the genetic code. The genetic code has a number of properties: triplicity, degeneracy, unambiguity, non-overlapping, and universality.