Suppose we trained gradient boosting with the learning rate of 0.1 for 800 rounds in the regression setting. We want to see if setting different maximum tree depths will help to improve the ensemble performance:
Based on the plot above, what is the optimal value for the maximum tree depth?