Biodiversity Metrics
In this topic we will discuss statistical metrics used for description of a community, comparison of several communities and the approaches to calculate them.
Introduction
Biodiversity can be defined as the variation among living organisms, encompassing diversity within species, between species, and across ecosystems. Biodiversity serves as a metric that describes the composition of a population, community, or set of samples. Various approaches, including metagenome analysis and populational study, are employed by researchers to measure biodiversity, resulting in the quantification of organisms across different samples, ecosystems, geographical locations, or time periods.
The level of diversity may be low when the number of organisms in a particular site is limited, while it can be high when the site thrives with the coexistence of multiple species. Typical examples can be desert and forest biomes.
Additionally, we can categorize states of biodiversity as 'endangered,' 'intact,' or 'restored.' We can assign a certain state when comparing taxa composition at different time points.
Researchers utilize alpha, beta, and gamma diversity metrics to describe diversity, which we will explore further in this topic.
Alpha-diversity: Diversity within sample
Alpha-diversity refers to a diversity at a small-scale level, characterizing the richness of species within a sample or specific functional community. It quantifies the variety of species observed within a single sample, defined area, such as a designated plot, a particular ecosystem like a pond, an agricultural field, or even a distinct section of a forest. Below we will speak about three typical metrics which are used for alpha-diversity assessment.
1. Species Richness (S): Species richness is a simple metric that counts the number of different species or operational taxonomic units (OTUs) present in a sample or community. Answers a question: "How many different species in the sample? "
Species richness is the number of species in the sample
, when – species
2. Shannon Diversity Index (H): The Shannon diversity index (or Shannon-Weiner index, or Shannon's entropy index), assumes that species heterogeneity in the sample depends upon both the number of species (species richness) and their relative individual distribution (species evenness). The total number of different species individuals present in a sample is the measure of the total distribution of richness. A higher Shannon diversity index value indicates greater species diversity, with a larger number of different species present in more even proportions. Conversely, a lower Shannon diversity index value suggests lower species diversity, with fewer species dominating the sample.
2.1 Species evenness is a measure of the relative abundance of each species
,
when – individuals, – number of individuals of one species or taxa
2.2 Shannon Diversity Index (H) is a measure of sample heterogeneity
,
when – total number of species, – represents the proportion or relative abundance of the -th species in the community.
3. Simpson Diversity Index (D): The Simpson diversity index quantifies the dominance or concentration of species within a sample. It takes into account both the species richness and the proportional abundance of each species, emphasizing the effect of dominant species. Lower values of Simpson diversity index indicates that a few species are more abundant compared to others in the sample. On the other hand, a higher Simpson diversity index value indicates a more even distribution of species abundances, with no single species dominating the sample.
,
when – total number of species, – represents the proportion or relative abundance of the -th species in the community.
Beta-diversity: Between-sample differences
Beta-diversity is a measure that quantifies the differences or similarities in species composition between different samples or communities. It focuses on the turnover of species across spatial or temporal scales, providing insights into the degree of dissimilarity or similarity in species assemblages.
Bray-Curtis Dissimilarity index (BC): The Bray-Curtis dissimilarity index is a commonly used metric to measure the dissimilarity between two samples or communities based on their species abundances. It takes into account the relative abundances of species shared between the two samples. Index ranges from 0 to 1, where 0 represents complete similarity (identical species abundances) and 1 represents complete dissimilarity (no shared species abundances).
Formula:
and represent the abundances of shared OTUs in two samples.
Upper part of the equation calculates the absolute differences in species abundances between the samples for each shared species.
Lower part of the equation calculates the total abundances in both samples
Jaccard distance: The Jaccard distance is a metric used to measure the dissimilarity between two samples or communities based on their species presence or absence. It focuses solely on the presence or absence of species, disregarding their abundances. Jaccard dist
ance value ranges from 0 to 1, where 0 represents complete similarity (identical species presence/absence) and 1 represents complete dissimilarity (no shared species).UniFrac Distance: The UniFrac distance, unlike Jaccard distance, is a metric which is used to assess dissimilarity between microbial communities based on their shared evolutionary history. It takes into account the phylogenetic relationships among microbial taxa and (in case of weighted UniFrac) taxa frequencies. To calculate UniFrac distance you need to build phylogenetic tree on the set of samples; mark species on the tree according to the sample and calculate the fraction of branch length that is shared between two samples or unique to one or the other sample.
Unweighted and weighted UniFrac metrics:
unweighted UniFrac
purely based on sequence distances (does not include abundance information)
Calculated as the sum of the branch lengths that belong to one sample but not the other is divided by the sum of the branch lengths that belong to one or both samples (see formula and picture below)
weighted UniFrac
branch lengths are weighted by relative abundances (includes both sequence and abundance information)
Calculated as sum of the branch lengths, where each branch length is weighted by the proportional abundance of taxa in the two samples taken from the same node; divided by the sum of the total branch lengths
Formula explanation:
, is the set of branch lengths in the phylogenetic tree
and represent the two samples being compared
, – count of individual from sample A or B on the node
, total count of the individuals from samples A or B on the tree
△ is the symmetric difference between two sets
∪ is the union between two sets
Gamma diversity: Total diversity of species within a larger geographic or ecological region
Gamma diversity is regional diversity, and it is the total diversity measured for a group of places—all plots in the study, all streams in a watershed, all Norwegian forest stands.
In the context of microbiome research, gamma diversity refers to the total number of unique microbial species or taxa present in a given environment or sample.
There are many approaches to calculate gamma diversity.
To calculate species richness within a landscape you may utilize Simmons (D) or Shannon's (H) diversity index. Such diversity index can be called 'gamma richness'.
With OTU richness you can plot 'rarefaction curves'. Rarefaction curves displays the numbers of samples (on the x-axis) and observed OTU richness (y-axis).
The plot below demonstrates cumulative diversity levels across multiple samples distributed across the world.
We expect that the more samples from one habitat we take – more organisms we will find in the habitat. At some point we will sample that much organisms, that they will represent all the community from the habitat. And starting from this point, the rarefaction curve, specific to the habitat will become more flat (as a blue line on the plot)
Addressing the significance of differences between sample groups
Ok, so you calculated the species richness, or diversity index of each of the samples. How can you tell whether in one group of samples there is a significantly wider species diversity? Or which species is significantly associated with a studied feature? Or whether the community has changed over time period?
Differences in microbial abundance at the level of phylum, genus, species, gene, and pathways, within and between groups, can be analyzed using the Wilcoxon test. To find the association between taxes, between groups, or some other features it is possible to use Pearson's rank correlation.
However there are many others statistical tests which can be applied.
Conclusion
In this topic we discussed major descriptive and comparative strategies, that are typically applied to assess biodiversity.