ProjectBeta

HyperSearch Engine

14 completions
~ 28 hours
4.3

Learn more about text preprocessing with spacy, word embeddings, TF-IDF algorithm, and HTML tags to make a simple search engine.

Provided by

JetBrains Academy JetBrains Academy

About

Search engines are our first assistants in the quest for knowledge and entertainment. How about building your search engine with a database and a crawler? In this project, you will write a search engine based on the TF-IDF algorithm. Also, your program will be able to highlight the target word and show the context around it.

Training project icon

Training project

This project allows you to practice and strengthen your coding skills, helping you get ready for more advanced tasks ahead.

What you'll learn

Once you choose a project, we'll provide you with a study plan that includes all the necessary topics from your course to get it built. Here’s what awaits you:
Find the most suited documents using the TF-IDF algorithm and the cosine similarity.
Make the search engine more advanced — set a limit and an offset!
Сonvert a query into tokens to find it in the text.
Show the context around the target words to make the search engine more user-friendly.

Reviews

Marcin Borkowski avatar
Marcin Borkowski
5 months ago
Like I said before, amazing project, but it suffers for small faults
Jürgen Wißkirchen avatar
Jürgen Wißkirchen
1 year ago
Moderator
I was a bit skeptical due to few solutions and critical reviews, but I was really positively surprised! I liked this project. It is fun and not hard.GO for it - my recommendation!
synth avatar
synth
2 years ago
Moderator
I've worked with TfidfVectorizer, spaCy, cosine similarity, pathlib and some other Python stuff. The projects is ok, but the requirements are not clear and requires some additional search to work through the stages. At the moment the last stage has an off-by-one error in tests. The project can be ...

4.3

Learners who completed this project within the course rated it as follows:
Usefulness
4.7
Fun
4.4
Clarity
3.7