Project
HyperSearch Engine
15 completions
~ 28 hours
4.4Learn more about text preprocessing with spacy, word embeddings, TF-IDF algorithm, and HTML tags to make a simple search engine.
Provided by
JetBrains Academy
About
Search engines are our first assistants in the quest for knowledge and entertainment. How about building your search engine with a database and a crawler? In this project, you will write a search engine based on the TF-IDF algorithm. Also, your program will be able to highlight the target word and show the context around it.
Training project
This project allows you to practice and strengthen your coding skills, helping you get ready for more advanced tasks ahead.
What you'll learn
Once you choose a project, we'll provide you with a study plan that includes all the necessary topics from your course to get it built. Here’s what awaits you:
Create a database from the files.
Find the most suited documents using the TF-IDF algorithm and the cosine similarity.
Make the search engine more advanced — set a limit and an offset!
Сonvert a query into tokens to find it in the text.
Show the context around the target words to make the search engine more user-friendly.
Reviews
4 days ago
Through this project, I learned how a vector-based search engine works end to end, including corpus processing, TF-IDF vectorization, cosine similarity, result ranking with limit and offset, and contextual highlighting of query terms. I deepened my understanding of how queries are embedded into a fi ...
4.4
Learners who completed this project within the course rated it as follows: