ProjectBeta

LLM Evals

Challenging

16 completions

~ 16 hours

3.7

This content is new. Please help us improve it by reporting bugs if you encounter them.

By the end of this project, you'll build a complete evaluation pipeline for an LLM application. You'll gain hands-on experience with evaluation techniques such as analytics, human-as-a-judge, and LLM-as-a-judge. You’ll also learn how to use tools like Langfuse and Ragas to supercharge LLM evaluation. This project will help you ensure that your AI offers accurate recommendations and consistently meets high performance and reliability standards.

Provided by

JetBrains Academy

About

LLM evaluation is at the core of building trustworthy AI. In this project, you’ll work on a chatbot for a smartphone sales site, but the real focus is on assessing its performance. You'll use tools such as Langfuse and Ragas and various strategies to see how well the model delivers recommendations and comparisons.

Graduate project

This project covers the core topics of the Introduction to AI Engineering with Python course, making it sufficiently challenging to be a proud addition to your portfolio.

At least one graduate project is required to complete the course.

What you'll learn

Once you choose a project, we'll provide you with a study plan that includes all the necessary topics from your course to get it built. Here’s what awaits you:

Get your environment ready for the project.

Log and analyze the application to understand how it is perfoming.

Collect user feedback to understand how your application is performing.

Use the Langfuse UI to annotate traces. Useful for expert evaluation of your LLM application.

Use Ragas to run model based evaluation.

Reviews

Shashank Gupta

1 week ago

5.0

It was a fantastic project to delve into the observability and evaluation of Agents. I learned how to integrate LangFuse's observability and monitoring framework into the agents, add user feedback per chat session, and manually annotate the agent's response after the chat. The best and most exciting ...

Marcin Borkowski

2 months ago

3.0

very inconsistent project. I have learned a lot. It was good project, but some parts were way too easy (like 4) and for example 5th part was truly annoying. 1. Limits!! SO many problems with limits in part 5. I have changed some options f.e. run_config=RunConfig(max_workers=3) and after that it was ...

Said Kentafi

3 months ago

3.3

The LLM Evals project provides a strong foundation for building end-to-end evaluation pipelines for LLM applications. The integration of Langfuse and Ragas adds valuable depth to monitoring and scoring workflows. I especially appreciated the focus on human-in-the-loop and LLM-as-a-judge techniques, ...

3.7

Learners who completed this project within the Introduction to AI Engineering with Python course rated it as follows:

Usefulness

4.6

Fun

3.3

Clarity

3.3