Project
Automatic Polish Name Anglicization with Transformers
1 completion
~ 31 hours
NewIn this project, learn how to fine-tune a pre-trained Transformers model. Get practical skills in your corpus collection and setting specific parameters for fine-tuning.
Provided by
JetBrains Academy
About
You have a Facebook penfriend from Poland. His name is Alexandre, but it's written as Aleksandre in Poland. Your friend went abroad and posted a photo with Stany Zjednoczone as the geolocation. You can't understand where he is since the geolocation is in Polish. This frustrates you, and eventually, you decide to fine-tune your model so that all Polish names and toponyms are automatically anglicized.
Training project
This project allows you to practice and strengthen your coding skills, helping you get ready for more advanced tasks ahead.
What you'll learn
Once you choose a project, we'll provide you with a study plan that includes all the necessary topics from your course to get it built. Here’s what awaits you:
Create a dataset from two sources.
Filter the corpus, split it into train and test datasets, and transform it into the Hugging Face dataset.
Fine-tune the T5-base model on the created corpus. Use the ROUGE metric to evaluate the model at each step.
Check what your model generates. Transliterate the names in the test dataset.