Project

Working with Scientific Papers

Challenging
30 completions
~ 17 hours
4.0

After completing this project, you will gain PostgreSQL skills focusing on SELECT, WHERE, and GROUP BY. You will learn how to extract information, and perform similarity searches using vector embeddings with pg-vector. This project will equip you with skills to analyze academic research papers, including working with machine learning embeddings.

Provided by

JetBrains Academy JetBrains Academy

About

Welcome to the "Working with scientific papers" project! In this project, you'll explore the rich collection of research articles on arXiv while gaining hands-on experience with SQL. You'll start by mastering basic SQL queries to extract and analyze article metadata, including authors, titles, and abstracts. As you progress, you'll learn to perform similarity searches using vector embeddings.

Graduate project icon

Graduate project

This project covers the core topics of the Introduction to AI Engineering with Python course, making it sufficiently challenging to be a proud addition to your portfolio.

At least one graduate project is required to complete the course.

What you'll learn

Once you choose a project, we'll provide you with a study plan that includes all the necessary topics from your course to get it built. Here’s what awaits you:
Find the number of papers published each year.
Identify pairs of research papers authored by the same individuals or co-authors. Find all papers that share at least one common author and present them in a paired format.
Find the least similar papers to "Continuity in Information Algebras" based on their vector embeddings.

Reviews

John
4 months ago
It was very interesting to learn about Natural Language Processing and embedding. The most critical feedback I can provide is the instructions for the Local Machine Setup should be included in Stage 1, not Stage 4. And some topics should be moved before earlier stages as well.
Andrzej Dańków avatar
Andrzej Dańków
4 months ago
I have difficulties with installing pg vector. I found out that WITH and creating new columns in the last step of the project wasn't needed. I solved it like this: SELECT id, title, embeddingFROM arxiv_papersORDER BY embedding <-> (SELECT embedding FROM arxiv_papers WHERE titl ...
Aurora Zarazaga
6 months ago
At this final stage I've invested more time on installing stuff and comprehend how docker works, how to get connected correctly to postgresql and how to solve all the solutions AI has guided me with, than applying the theory learnt. Could not be this stage more focussed on applying the theory easier ...

4.0

Learners who completed this project within the Introduction to AI Engineering with Python course rated it as follows:
Usefulness
4.3
Fun
3.9
Clarity
3.8