Natural scienceBioinformatics

Introduction to bioinformatics

5 minutes read

The relevance of working in an interdisciplinary field is growing in the modern world. With the advancement of technology, we are able to generate large amount of data, particularly biological data, which can be difficult to analyze without computational methods due to its complexity and diversity. Bioinformatics is the interdisciplinary field that applies such computational approaches to biological data. All in all, powerful means of biology, genetics, math, programming and statistics shed light on the complex processes of life. In this topic, we briefly highlight characteristics of biological data, bioinformatic tasks and current challenges.

What is biological data?

Biological data refers to data describing living systems and their products. With the advancing technologies, the amount of biological data increases. Over the past twenty years, biological research has largely been dominated by "genomic era". To describe genetic information, we utilize DNA-sequencing data, which is one of the main types of data in bioinformatics. However, genome is just an "instruction" for making other functional molecules. Thus, it is important for us to know how genes are actually expressed and whether they are resulted in proteins, which are other examples of biological data.

Regardless the nature of the data, many experiments result in sequences stored in files (such as text files). Data can also be multidimensional, such as the 3D atom coordinates of biological molecules. For example, by resolving 3D structures of proteins, one can predict interactions between a target protein and a potential drug. All shared data is stored in various biological databases in specified formats. Because different types of molecules in a cell interact in complex biological pathways, it becomes useful to combine diverse data types. Data integration provides a basis for so-called large-scale approaches.

What makes biological data so special?

Biological data has its own unique characteristics and challenges. First, biological datasets are often massive: even a simple organism's DNA may comprise gigabases of text data. Despite the apparent richness of data, it is often incomplete and prone to errors.

It's also crucial to take in account a heterogeneity of living systems. Moreover, in a complex organism, many factors may affect a phenomenon in question. That is why it is critical to consider results probabilistically rather than as absolute truth. We can find many examples in medical practice — even patients with similar symptoms may respond differently to treatment. Given the complex interconnections between biological molecules, it remains challenging to distinguish between biological signal and random effect, or "noise". Therefore, a clear understanding of what "noise" means in each particular case is vital. For example, even in genetically identical cells in the same conditions, researchers detect variations in cell development, biochemical processes and so on.

To extract useful information from a large dataset we need:

  • efficient algorithms for seeking data features

  • data preparation, or pre-processing

  • the knowledge of algorithm assumptions and limitations of an experiment

How to analyze biological data?

In terms of big datasets, computational methods help scientists manipulate biological data in reasonable time. To retrieve and process different types of data, various bioinformatics software are available. Moreover, many scientists offer custom solutions by writing their own scripts in a programming languages such as Python or R. To validate the results and distinguish a real pattern from a random event, scientists use statistical methods. Eventually, to correctly interpret the data, it's also essential to consider the result from a biological point of view.

What questions can bioinformatics answer?

Combining computational approaches and theoretical knowledge, bioinformaticians can shed light on fundamental and practical questions or problems such as:

  • What is the cause of disease at a molecular level?

  • How to treat a disorder and avoid side effects of a treatment?

  • How to create effective vaccines against pathogens like viruses?

  • What do organisms or species have in common? And how are they different from each other?

  • How rapidly do bacteria acquire antibiotic resistance?

  • How to use genome to study the migration of ancient populations?

And much more!

Conclusion

Computational methods help scientists store and manipulate biological data, which is diverse and highly specific. Nucleotide sequence of DNA, amino acids sequence of protein and gene expression profiling are prominent examples of biological datasets. Using various types of biological data, we can find out how genes work, predict phenotypes (for example, healthy or diseased) of living organisms, follow evolutionary relationships and much more. Bioinformatics combines molecular biology, genetics, math, physics, and programming to immerse you in a wonderful world of data analysis with applications in medicine and fundamental science.

14 learners liked this piece of theory. 1 didn't like it. What about you?
Report a typo