ProjectBeta
HyperSearch Engine
14 completions
~ 28 hours
4.3 This content is new. Please help us improve it by reporting bugs if you encounter them.
Learn more about text preprocessing with spacy, word embeddings, TF-IDF algorithm, and HTML tags to make a simple search engine.
Provided by
JetBrains Academy
About
Search engines are our first assistants in the quest for knowledge and entertainment. How about building your search engine with a database and a crawler? In this project, you will write a search engine based on the TF-IDF algorithm. Also, your program will be able to highlight the target word and show the context around it.
Training project
This project allows you to practice and strengthen your coding skills, helping you get ready for more advanced tasks ahead.
What you'll learn
Once you choose a project, we'll provide you with a study plan that includes all the necessary topics from your course to get it built. Here’s what awaits you:
Create a database from the files.
Find the most suited documents using the TF-IDF algorithm and the cosine similarity.
Make the search engine more advanced — set a limit and an offset!
Сonvert a query into tokens to find it in the text.
Show the context around the target words to make the search engine more user-friendly.
Reviews
5 months ago
Like I said before, amazing project, but it suffers for small faults
I've worked with TfidfVectorizer, spaCy, cosine similarity, pathlib and some other Python stuff. The projects is ok, but the requirements are not clear and requires some additional search to work through the stages. At the moment the last stage has an off-by-one error in tests. The project can be ...
4.3
Learners who completed this project within the course rated it as follows: