In the topic on phylogenetic tree algorithms, we covered five approaches: UPGMA, Neighbor-joining, minimum evolution, maximum parsimony, and maximum likelihood. We also discussed the "long-branch attraction" effect when sequences on long branches attract to each other even though they aren't similar enough. In this topic, you will learn about tree evaluation techniques and the structure of real programs like RAxML and FastTree.
Bootstrap
There are two main sets of evaluation programs: the first set tests the reliability of the final tree that is created, and the second set compares one tree with another. Bootstrapping is a method from the first category. Let's see how it works.
Given an initial alignment with sequences and columns, we can randomly select columns from an initial alignment generating new alignment. So, some columns can be chosen several times, other may not be chosen at all. Bootstrap replicate tree is constructed on a newly formed alignment. Such procedure is repeated 100 to 1000 times and the resulted merged tree (= consensus tree) includes weights on branches reflecting bootstrap support.
The higher the number on a branch, the higher the statistical confidence of the supported group. For example, we repeated bootstrap 100 times and observe a branch with bootstrap value 90. It means that this branch occurred in 90 out of 100 bootstrap replicate trees.
The bootstrap technique can be computationally expensive, so a variation called jackknifing can be used. In this approach, half of the sites in an alignment are removed randomly and a "replicate" tree is built. This approach is repeated 100 to 1000 times. A consensus tree is built based on the generated "replicate" trees. We use quotes around the word "replicate" because these trees are built based on a short variation of the original alignment. Strictly speaking, jackknifing and bootstrapping aren't comparable because they use alignments of different lengths. But, jackknifing can be used when the processing speed needs to be minimized.
Comparing trees
It is often important to test whether one tree is significantly better than the other. The Kishino-Hasegawa test (KH test) was first used to compare maximum likelihood (ML) trees. However, it was found that this test gives erroneous results for ML trees, but not for maximum parsimony trees (MP). A corrected version of the KH test, the Shimodaira-Hasegawa test (SH test), was introduced to compare ML tree topologies.
The KH test shows whether two given tree topologies are equally supported by an alignment. First, the likelihood ratio is calculated, which is twice the difference between the log-likelihoods of the tree topologies, . Then, bootstrapping is performed 100 to 1000 times on the alignment. Each round, the likelihood ratio for each bootstrap replicate, , is calculated. Using all calculated we can construct a distribution of these values. Given a significance level of 5%, we can test whether the initial falls into the range of 2.5 to 97.5% in the distribution of . If it falls in the specified range, we can say that both topologies are equally good explanations of the alignment.
The SH test is a corrected multi-tree test that fits ML trees better than the KH test. The SH test shows whether all specified tree topologies (including the ML tree) are equally good explanations of a given alignment. The first step is to calculate likelihood ratio between ML topology and other topology , . Then, as in the KH test, bootstrapping, distribution construction, and testing against distribution are performed. All steps are executed to all specified tree topologies. If , where is likelihood of topology, then the SH test concludes that some tree topologies are not equally good explanations of the alignment.
Application
In scientific projects, the number of sequences is continually and dramatically increasing. It's common to analyze hundreds and thousands of sequences with hundreds of columns in alignment. So, robust techniques are needed. For example, the last version of FastTree uses a thick sandwich of different approaches like heuristic NJ, a step reducing the length of the tree (we didn't discuss such techniques) and ML approaches. Another popular program RAxML applies ML approach. It is a convenient command-line program where you can turn on/off bootstrap support, select a substitution model like JC69 or K80, and choose a model of rate heterogeneity like distribution.
So, it's important to know basic phylogenetic tree algorithms and understand the overall procedure. But keep in mind that "serious" programs that can handle huge datasets in a reasonable time frames always include a lot of heuristics and preliminary steps to reduce the number of studied trees and the length of the trees. So you shouldn't be frightened by the comprehensive and long pipelines before the main tree algorithm.
Conclusion
The basic evaluation technique of reliability of the final tree is called bootstrapping. If you use MP or ML approaches, you should use specific statistical tests for comparing trees — KH test for MP trees and SH test for ML trees. All the described methods are rarely used solo in scientific projects. Programs working with huge datasets with thousands of sequences use a comprehensive pipeline of approaches to reduce computational load.