The traditional Sanger method allows the sequencing of one DNA fragment at a time, which makes it hardly applicable to hundreds or thousands of targets, such as genes. Indeed, a few decades ago researchers mainly examined one or few biological targets (genes, proteins, etc.) in an experiment. In contrast, modern high-scale approaches are designed to scan as many targets as possible in hope of finding something curious in the resulting data. All in all, this holistic approach in looking at a complex system gave rise to a growing number of "omics" disciplines. In this topic, we will cover different omics areas, including their appropriate application and experimental challenges.
The omics cascade
It is proposed that the first omics term, genome, was created in the 1920s as a blend of the words "gene" and "chromosome". Lately, the suffix "ome" has gained the meaning that the complete set of molecules is assessed in a system. And that's how terms such as transcriptome (the entire set of transcripts), proteome (the complete set of proteins), and metabolome (a collection of small molecules) came about. Similarly, the addition of "omics" to a molecular term implies a field that analyzes all molecules of the same type in an assay. For example, the field of genomics studies the entire set of genetic material, transcriptomics measures nearly all transcripts, and so on. Researchers are rapidly taking up "ome" and "omics" suffixes to describe very specific branches of sciences, for example, Human breathomics, Mouse phenome. Some of these omics are not serious and never used at all, so we will cover only the main biological omics.
We have already mentioned some key omics fields, genomics, transcriptomics, proteomics, and metabolomics — but how are different omics disciplines interlinked with each other? Simply put, omics have a hierarchy that is based on the Central dogma of molecular biology. Each level of regulation provides us with new information about a complex system: from information encoded in a DNA (genotype) to observable characteristics of an organism (phenotype). Different omics form layers in the so-called omics cascade, as depicted in the picture below. The endpoint of the cascade, metabolomics, is a downstream product of genomics, transcriptomics, and proteomics, influenced by environmental factors.
The genome remains nearly static for a concrete organism during its lifetime. In contrast, transcriptome, proteome, and metabolome are dynamic, altering from cell to cell and even within a cell under different conditions. For instance, they change widely during cell development and in response to various environmental stressors.
Let's now follow up on each discipline, from genomics to metabolomics.
Genomics
As mentioned above, genomics is the study of an organism's entire genome. The leading power behind genomics has been, without a doubt, the Human Genome Project (HGP), which ended up with a draft of the human genome. Based on the Sanger sequencing method, the HGP took around 13 years (1990 — 2003) and cost an astronomical amount of money. New high-throughput sequencing techniques, generally known as next-generation sequencing (NGS), allowed a dramatic increase in the speed at which an individual's genome can be sequenced. NGS offers highly efficient, rapid, low-cost DNA sequencing that is beyond the reach of traditional Sanger methods. Just look how affordable sequencing has become!
An alternative to whole-genome sequencing is the targeted sequencing of parts of a genome. Most often, this involves just sequencing the protein-coding regions of a genome, which are called exons, and account for < 3% of the total human genome. Low-cost DNA sequencing provides an opportunity to sequence more individual genomes or exomes in a reasonable time. As you may know, genomes of individuals belonging to the same species are not identical. Differences in genomes are called genetic variations. Variations make us different from each other, however, some of them may predispose one to disease or influence the response to agents.
Application. With the methods of genomics, we can now analyze a complete set of genetic variations in individuals and find associations with phenotype using statistical analysis. Thus, genome-scale sequencing provides answers for many important issues, such as susceptibility to diseases, the discovery of efficient therapies, understanding the genetic basis of tumor cell proliferation, and much more.
Epigenomics
There is no doubt that DNA contains all of the information necessary to build and maintain an organism. But have you ever wondered how different types of cells from one organism look so different? Your muscle cells, leukocytes, or neurons share identical genetic information, however, different cell types switch off some gene sets and switch on others. This is where epigenetics comes into play. Epigenetics refers to mechanisms that alter gene expression without actual changes to the DNA sequence. One of the mechanisms is to mark an individual DNA base with a chemical group, such as a methyl group. These epigenetic modifications are, to varying degrees, reversible.
Application. As you might guess, in the field of epigenomics, researchers are trying to capture the genome-wide locations and understand the functions of all the chemical tags that mark the chromatin. For example, we can compare genome-scale methylation sites between normal and tumor tissues, young and aged organisms. Until recently, irreversible DNA changes, such as mutations, were thought to be the main drivers of human diseases. Now it is clear that epigenome can also cause, or result from, pathology. Thus, epigenomics has become an essential part of efforts to better understand the pathophysiology of the disease, tumor formation, and aging.
Transcriptomics
Transcriptomics captures a snapshot in time of the total transcripts present in an organism, tissue, or cell. While genome sequence stays relatively constant over time, transcriptome changes dynamically in response to various factors. For example, in response to heat stress, cells activate a transcriptional program known as heat shock response. You may detect this stress response by transient expression of mRNA and ultimately heat shock proteins (Hsps) as a part of the cell protection mechanism.
As cellular responses are so heterogeneous, there are two main approaches to studying them. The first, bulk sequencing, is a transcriptomics analysis of pooled cell populations and tissues. It measures the average expression level of genes across hundreds or even millions of cells in a sample. When many cells in a sample activate any transcription program "in-sync", we probably detect it. In contrast, more recent single-cell sequencing analyzes cells at the individual level. The latter approach allows for catching slight differences between cell populations and describes complex cell compositions.
Application. Transcriptomics, both bulk and single cell, can shed light on many theoretical and practical issues, such as:
- What is the exact structure of the gene? Annotation of gene boundaries, transcription start sites, and other gene features.
- What is the complete set of transcripts in a given sample? Detection of novel transcripts and analysis of regulatory non-coding RNAs.
- Which genes are expressed differently between two samples (e.g. healthy/diseased individual)?
Understanding the transcriptome is essential for revealing molecular mechanisms of pathologies and cellular responses to environmental stressors. Similar to DNA targets, transcripts associated with diseases may serve as biomarkers for diagnostic purposes and as predictors of treatment effects.
Proteomics
Nucleic acids have one important advantage: they can be easily amplified directly or indirectly (with the conversion of RNA to DNA) in a polymerase chain reaction (PCR). Proteins and small molecules can't be copied, therefore, we are content with a little amount of sample material for our analysis. Moreover, we need other technologies that are applicable to proteomics and metabolomics, for instance, mass-spectrometry.
For years, gene expression has primarily been studied at the transcriptional level because sequencing methods are just cheaper. So, may we just measure mRNA transcript quantity and assume that it reflects the corresponding protein quantity? First, several studies have revealed that quantities of mRNAs and proteins correlate only modestly (~40% correlation). So it seems that the measurement of RNA transcription does not provide a quantitative measure on a protein level. Thus, we need a separate branch with its own methods — proteomics that studies the complete set of proteins, their structures, and functions.
Post-translational modifications. What makes the protein-level analysis even more daunting are post-translational modifications (PTMs). The concept is in some way similar to epigenetic marks on nucleic acids — protein chemistry is expanded with different tags (e.g., methyl and acetyl groups, phosphorus) that affect their structure and function. With an account of unique PTMs, the number of proteins grow up exponentially.
Application. The presence, absence, or changes in the activity of certain proteins and PTMs can be associated with pathological conditions. For example, the concentration of C-reactive protein rises when there is inflammation throughout the body. Proteins associated with pathologies may serve as biomarkers, improve medical diagnosis and open new ways to the prevention and treatment of various diseases.
Metabolomics
In the metabolomics area, researchers deal with a total set of small molecules (< 1kDa) in a cell, organ, or organism. As the name implies, small molecules are much smaller compared with nucleic acid and any protein, and they are relatively few. Moreover, these compounds are very diverse in their physical and chemical properties and occur in a wide concentration range. Metabolomics is the endpoint of the "omics cascade" and can be considered as the closest to the phenotype level. This fact suggests that a metabolome may better reflect some disease characteristics than changes in gene or protein expressions.
Application. Small changes in proteome are often visible as much more dramatic differences in the metabolome. Therefore, in clinical practice, we can detect even slight metabolic changes in response to disease or drug treatment. You have probably ever done a blood glucose test or cholesterol test, which reflects susceptibility to diabetes and risks of heart disease, respectively. Moreover, the metabolites can also reflect the lifestyle, diet, and effects of other environmental factors much more readily than profiles obtained by other omics methods. For example, if we put a laboratory rat on a high-fat diet, elevated lipid metabolites will be detected in the animal's bloodstream in hours. Let's now highlight the advantages of metabolome analysis: (1) non-invasive sample collection: metabolites can be measured in biological fluids (blood, urine) (2) dynamic changes can be monitored (for consecutive days, months, etc.) (3) metabolomics has the closest link to phenotype.
Conclusion
The addition of the "omics" suffix to a molecular term implies a comprehensive, or global, assessment of a set of molecules. Omics research is a rapidly evolving, multidisciplinary field that encompasses genomics, epigenomics, transcriptomics, proteomics, and metabolomics. Each of these fields offers the possibility to provide a global picture of a system as a whole, at multiple levels. The cost of omics-derived applications is still high, but future technological improvements are likely to overcome this problem.