Generative AIAI fundamentals

How to interact with LLMs

2 minutes read

Interacting with LLMs has evolved beyond basic text conversations. Various methods now exist to meet different user needs, from casual users to developers and researchers. In this topic, we will explore the main ways to interact with LLMs, their target users, and common examples of each method's use.

Conversational interfaces

Let's start with the most common method for interacting with LLMs: chat interfaces. These interfaces provide a user-friendly, dialogue-based experience where users can input queries (known as prompts) and receive responses in a conversational way. Here is ChatGPT's familiar interface:

ChatGPT UI

As you can see, you get a chat-like graphical user interface (GUI) where you can type or speak to an LLM. The interaction history appears on screen, allowing for follow-up questions and maintaining context within a session. This method works best for general users, content creators, researchers, and anyone who wants a simple way to use an LLM for various tasks. Such tasks include brainstorming ideas, summarizing text, translation, and answering questions.

Command-line and REPL interfaces

For developers, data scientists, and other technical users, using LLMs often shifts from graphical interfaces to the command line. This approach includes both direct command-line interfaces (CLIs) and interactive REPLs (Read-Eval-Print Loop). While different, they share a text-based, programmatic nature that provides greater control, automation, and integration into development workflows.

This method works best for users who need to:

  • Automate repetitive tasks through scripting.

  • Experiment with and debug models in a controlled environment.

  • Manage and run models locally for more direct control over the underlying software and hardware.

Here is an example of a command-line interaction with an LLM:

$ ollama run llama3 "Ten fun names for a pet pelican"

In this example, the command ollama runs a specific model (llama3) and sends the prompt "Ten fun names for a pet pelican" to it. The response appears directly in the terminal.

A REPL offers a more interactive, conversational session within the command line. It creates a simple, interactive environment where users can enter commands, and the system reads, evaluates, and prints the results before looping back for the next command. This setup works especially well for iterative development and learning. Here's an example from Claude Code:

Claude Code's interactive REPL

APIs and integrations

Developers building with LLMs need a programmatic interface to interact with them. This happens through application programming interfaces (APIs). An API works as a bridge, letting a developer's application send requests and receive responses from an LLM programmatically. Instead of interacting through a chat window, developers write code to "call" the LLM. This approach is the backbone of the AI-powered features we see in software today, from simple content generation tools to complex, multi-step AI agents.

This approach is essential for:

  • Developers and engineers who create custom applications and services that use LLM capabilities.

  • Businesses that add AI features to their products to improve user experience and create new value.

  • Researchers who build complex systems and experiments that need programmatic control over model interactions.

  • Data scientists who want to include LLMs in their data processing and analysis workflows.

For example, a developer could use an API to build a customer service bot that automatically categorizes support tickets. The application would send the text of each new ticket to the LLM through an API call. The model then returns a structured response with the ticket's category, priority, and a summary.

Here is a conceptual example of what an API call might look like using a command-line tool like cURL:

curl https://api.anthropic.com/v1/messages \
     -H "content-type: application/json" \
     -H "x-api-key: YOUR_API_KEY" \
     -d '{
           "model": "claude-sonnet-4-5-20250929",
           "max_tokens": 4096,
           "messages": [
             {"role": "user", "content": "Summarize this article for me: [article text]"}
           ]
         }'

This command sends a request to a model provider's API endpoint, including the specific model to use and the prompt. The API processes the request and sends the LLM's generated summary back to the application.

A main use of model APIs is to integrate AI into existing software. These weave AI capabilities directly into the core functionality of an application, creating a seamless, AI-native experience. Here are some examples:

  • IDE integrations — AI pair programmers, such as GitHub Copilot, integrate directly into code editors and IDEs like IntelliJ IDEA, offering intelligent code completions and suggestions.

  • Websites across industries embed AI assistants directly into their user interface to help users perform tasks or find information using natural language.

  • Productivity tools like Google Workspace, Microsoft 365, Notion, and others embed AI assistants directly into their applications. These integrations help users create and refine content within their existing workflows.

  • Creative software and design tools like Adobe Photoshop and Canva incorporate generative AI features. With that, users can create or modify images, generate design elements, and automate editing tasks using text prompts without leaving the application.

Controlling LLM outputs

Modern LLM interactions need precise control over how the model delivers its responses. The output format is a crucial part of the interaction, affecting both user experience and the ability to build connected applications. Two essential output methods are structured output and streaming. Let's explore them briefly.

Sometimes, the LLM needs to interact with other software, APIs, or databases. In these cases, LLMs must generate predictable and machine-readable responses. This is achieved through structured output. Developers can instruct the model to return its response in a specific format, typically JSON, that follows a predefined schema. Most model APIs today offer built-in features to force the output to be valid JSON. In some cases, you might need to implement output parsers with libraries like Pydantic to validate the LLM's output.

This structured format helps programs reliably parse the LLM's response and execute a specific function (known as tool calling). For example, if a user asks, "What's the weather like in Paris?" the LLM returns a JSON object like this:

{
  "tool": "get_weather",
  "parameters": {
    "location": "Paris, FR"
  }
}

The application can then easily parse this object, call the get_weather function with the correct parameters, and use the result to answer the user's question. This technique forms the foundation for building AI agents that can perform actions.

In conversational applications, waiting several seconds for a complete response can feel unnatural and slow. To create a more fluid and responsive experience, developers use streaming. Instead of waiting for the entire response, the output arrives piece by piece (token by token) as the model creates it.

The application can display these tokens to users incrementally, making it appear as if the LLM is typing in real-time. This approach significantly improves user experience by providing immediate feedback and reducing perceived wait times, especially for chatbots and virtual assistants.

Conclusion

From user-friendly conversational interfaces to developer-focused APIs, the ways we interact with LLMs have evolved to meet various needs. We saw that there's a suitable approach for everyone to use for various use cases:

  • Casual explorers seeking answers can use chat interfaces.

  • Developers who are creating complex workflows with interconnected systems.

  • Businesses that would like to integrate intelligence directly into their products to enhance user experience.

We also saw how developers can use structured output to receive predictable results for downstream processing. With streaming capabilities, conversations are more fluid and natural. These advances open the door for more sophisticated AI applications. As these interaction patterns improve, LLMs will become an essential part of our digital experiences.

13 learners liked this piece of theory. 0 didn't like it. What about you?
Report a typo