Project

Building the Naive RAG

Hard

29 completions

~ 13 hours

4.5

Before selecting this project, it is recommended that you first complete the Introduction to LangChain project.

Unleash the film writer in you as you build a complete Retrieval-Augmented Generation (RAG) system. You'll master the RAG pipeline—from efficient data ingestion and processing, to context-aware generation.

Provided by

JetBrains Academy

About

Movies whisk us away to magical realms and ignite our imagination. So why settle for just watching your favourite films when you can re-imagine them? In this project, you'll feed a large language model a dataset of movie scripts to craft brand-new scenes, character dialogues, and even invent entirely new twists.

Graduate project

This project covers the core topics of the Introduction to AI Engineering with Python course, making it sufficiently challenging to be a proud addition to your portfolio.

At least one graduate project is required to complete the course.

What you'll learn

Once you choose a project, we'll provide you with a study plan that includes all the necessary topics from your course to get it built. Here’s what awaits you:

Divide the ingested script into smaller, context-preserving chunks.

Transform the text chunks into high-dimensional vector embeddings that capture their semantic meaning.

Process user queries and perform similarity searches on the embeddings. Fetch the most relevant scenes based on user queries.

Provide the LLM with the retrieved context along with supplementary instructions.

Reviews

Marcin Borkowski

4 months ago

5.0

I have learned bases of langchain, how it works and how helpful it is

Joydeep Chatterjee

6 months ago

5.0

Although it was a challenge figuring out how to use Qdrant, it was valuable for learning how to build the critical infrastructure for NLP/LLM engineering projects.

Shashank Gupta

6 months ago

4.7

This was a great project to learn how to build a simple (naive) RAG pipeline from end to end. It covered everything from data loading, data splitting, creating embeddings and the vector store, to retrieving relevant data and finally generating responses using an LLM with retrieved context included. ...

4.5

Learners who completed this project within the Introduction to AI Engineering with Python course rated it as follows:

Usefulness

4.7

Fun

4.6

Clarity

4.2