Imagine that you need to build a regression model to estimate a required dose of vitamin D supplement from the patient dataset with the following five columns:
- name;
- gender;
- current vitamin D blood level;
- number of sunny days in the previous year;
- patient’s Zodiac sign.
If you do not filter the columns beforehand, which model is likely to perform best on these data?