Generative AIBuilding with foundation modelsOpenAI platform

Introduction to OpenAI API

10 minutes read

Nowadays almost everyone is familiar with using official interfaces of LLMs like ChatGPT and many others, but what about bringing LLMs straight into your applications? This topic will guide you through the OpenAI API, revealing how you can enhance your projects with GPT's conversational abilities!

What is OpenAI API?

OpenAI API serves as a gateway for developers by accessing the capabilities of various models developed by OpenAI, such as text generation with GPT and image creation with DALL·E. With this API, you can integrate AI features into your applications without developing models from scratch.

The official API from OpenAI provides support for multiple programming languages, primarily JavaScript and Python. In addition to the official libraries, there are community-maintained libraries for some other programming languages.

In this topic, you will use the OpenAI API's Python library called openai to enrich your applications with GPT models. These models go beyond simple text generation; they can be integrated into your software platforms to automate tasks like drafting emails, creating code snippets, or providing AI chatbots within your product.

We mainly focus on the OpenAI API; however, OpenAI's interface is similar to the APIs of other LLM providers (such as Anthropic), and migration from one provider to another typically does not require many code changes and mostly comes down to changing the token and making minimal edits to the existing codebase. If you want to know more about other LLM-as-a-service providers and their pricing, you can use the LLM pricing calculator.

Chat Completions API

Chat completions, a feature of the OpenAI API, allows developers for building applications capable of conversing with users.

Before working with the API, install the latest version of the openai library:

pip install --upgrade openai

Below is an example code for sending a request to the Chat Completions API:

from openai import OpenAI

client = OpenAI(api_key="<ENTER_YOUR_OPENAI_API_KEY_HERE>")


def get_chat_completion(messages):
    return client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=messages,
        temperature=0,
    )


prompt = "Briefly explain the Python programming language."
messages = [{"role": "user", "content": prompt}]

chat_completion = get_chat_completion(messages)
gpt_response = chat_completion.choices[0].message.content

print(gpt_response)

After running the above code, you will receive a GPT response like this:

Python is a high-level, interpreted programming language known for its easy-to-read syntax. It's versatile and widely used for various types of programming, from web development to data science and artificial intelligence.

Here's what the code does:

The get_chat_completion function takes a list of messages as input and calls the chat.completions.create method from the OpenAI client.
The model parameter specifies the AI model used for generating the response.
The messages parameter holds the conversation history, which here is just one message containing the prompt.
The temperature parameter sets the randomness of the response; a value of 0 produces the most likely, deterministic response. A higher temperature would result in more varied responses.
The get_chat_completion function calls the API and retrieves a chat completion object. The actual text of the AI's response is accessed through chat_completion.choices[0].message.content.

Here is the full Chat Completions API response.

{'id': 'chatcmpl-8waC0yG49cNvIBm3PknyMXTDHgoWf', 'object': 'chat.completion', 'created': 1708972840, 'model': 'gpt-4-0125-preview', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': "Python is a high-level, interpreted programming language known for its clear syntax and readability, making it an excellent choice for beginners in programming. It was created by Guido van Rossum and released in 1991. Python supports multiple programming paradigms, including procedural, object-oriented, and functional programming. It comes with a comprehensive standard library and supports modules and packages, which encourages program modularity and code reuse.\n\nPython is widely used in various fields, including web development, data analysis, artificial intelligence, scientific and mathematical computing, automation, and software development. Its simplicity and versatility, along with the support of a large and active community, have contributed to its popularity and wide adoption in both academic and industrial settings. Python's design philosophy emphasizes code readability and the significance of programmer effort over computational efficiency, which is reflected in its use of significant indentation to define code blocks."}, 'logprobs': None, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 15, 'completion_tokens': 174, 'total_tokens': 189}, 'system_fingerprint': 'fp_89b1a570e1'}

Calculating token usage

Hosting LLMs incurs significant costs, and utilizing services such as OpenAI API comes with associated fees. When using the Chat Completions API, it's straightforward to estimate your token usage and approximate costs through the usage field in the API response.

Here's a helper function to estimate the costs for chat completions:

MODEL_35_TURBO = "gpt-3.5-turbo"
MODEL_4_TURBO = "gpt-4-turbo-preview"

MODELS = {
    MODEL_35_TURBO: {"input_cost": 0.0005 / 1000, "output_cost": 0.0015 / 1000},
    MODEL_4_TURBO: {"input_cost": 0.01 / 1000, "output_cost": 0.03 / 1000},
}


def calculate_tokens_cost(model, chat_completion):
    if model not in MODELS:
        raise ValueError(f"Model {model} is not supported.")

    model_costs = MODELS[model]
    input_tokens_cost = chat_completion.usage.prompt_tokens * model_costs["input_cost"]
    output_tokens_cost = (
        chat_completion.usage.completion_tokens * model_costs["output_cost"]
    )
    return input_tokens_cost + output_tokens_cost

The input and output token costs in the code are based on OpenAI's official pricing docs at the time of writing. Prices are subject to change, so always confirm the latest pricing on OpenAI's official documentation to ensure accurate cost estimations.

To use the calculate_tokens_cost function in your program, simply follow the example below:

prompt = "Briefly explain the Python programming language."
messages = [{"role": "user", "content": prompt}]

chat_completion = get_chat_completion(messages)
gpt_response = chat_completion.choices[0].message.content

print(gpt_response)

total_usage_costs = calculate_tokens_cost(MODEL_35_TURBO, chat_completion)
print(f"Total usage costs: ${total_usage_costs:.8f}")

Executing the code above will provide the AI's response and the total cost incurred for the tokens used:

Python is a high-level, interpreted programming language known for its simplicity and readability. It supports multiple programming paradigms, including procedural, object-oriented, and functional programming. Python is widely used for web development, data analysis, artificial intelligence, and scientific computing. It has a large standard library and a vibrant community of developers contributing to various open-source libraries and frameworks. Python code is typically written in a clear and concise manner, making it easy to learn and understand for beginners.
Total usage costs: $0.00015000

A great strategy for optimizing the expenses is to adjust the length of your prompts. Longer prompts use more tokens and cost more, while shorter prompts are more cost-effective. However, don't compromise on the details. Ensure your prompts are detailed enough to get a useful response from the AI, and you'll hit the golden mean!

The response_format parameter

When working with the OpenAI Chat Completions API, the response_format parameter determines the structure of the responses. By default, responses are returned as plain text. However, a structured format like JSON is more appropriate in some instances, particularly when you need to parse the response programmatically for further processing.

To request a JSON-formatted response, you must explicitly set the response_format parameter to "type": "json_object". It is important to note that if you need the response in JSON format, your prompt must also specify this requirement. Without a clear directive for JSON output in the prompt, the API will return an error, prompting you to include such an instruction.

Here's an example of requesting a JSON-formatted response:

from openai import OpenAI

client = OpenAI(api_key="<ENTER_YOUR_OPENAI_API_KEY_HERE>")


def get_chat_completion_json(messages):
    return client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=messages,
        temperature=0,
        response_format={"type": "json_object"},
    )


prompt = "Your output MUST be in JSON. Briefly explain the Python programming language."
messages = [{"role": "user", "content": prompt}]

chat_completion_json = get_chat_completion_json(messages)
gpt_response_json = chat_completion_json.choices[0].message.content

print(gpt_response_json)

In the code above, the response_format is set to "type": "json_object" when calling the create method on chat.completions. The messages parameter includes the required directive for JSON output. Executing this code will prompt the API to return a response in JSON format, which can be parsed and used in applications that require structured data.

The seed parameter

The seed parameter is useful when you need the same output corresponding to a specific prompt and set of parameters. Setting the seed ensures that you get similar outputs each time with the same input and system parameters, such as prompt and temperature. This is extremely helpful for testing, debugging, or any other situation where reproducible results are necessary.

Below is a code snippet demonstrating the use of the seed parameter to generate reproducible outputs with the Chat Completions API:

from openai import OpenAI
import os

# Retrieve the API key from environment variables for security
api_key = os.getenv("OPENAI_API_KEY")

client = OpenAI(api_key=api_key)


def get_chat_completion_with_seed(messages, seed_value):
    return client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=messages,
        temperature=0.7,  # A non-zero temperature for randomness
        seed=seed_value,
    )


prompt = "Can you tell me a joke?"
messages = [{"role": "user", "content": prompt}]

# Using a specific seed value for reproducibility
seed_value = 12345

chat_completion = get_chat_completion_with_seed(messages, seed_value)
gpt_response = chat_completion.choices[0].message.content

print(gpt_response)

Running the code snippet above with a fixed seed value and consistent system parameters should yield more predictable responses. OpenAI provides a system_fingerprint variable in the model's response. This fingerprint represents the backend configuration with which the model operates. If the system_fingerprint changes, it indicates that there may have been changes in the system that could affect the reproducibility of the results, even with a consistent seed.

Best practices for chat completions

Secure API key management. Embedding your API key directly into your code is a significant security risk. It also complicates the process of updating or rotating the key, particularly in a team environment. Instead, use environment variables to store such sensitive data and retrieve the key securely, as shown by the method os.getenv("OPENAI_API_KEY") in the previous code snippet.
Use the latest models. For the best results, we recommend using the most recent models provided by OpenAI. These models are regularly updated to deliver better performance and more accurate results. Refer to the official documentation on model upgrades to find the most current models and their capabilities.
Token Generation Latencies. Generating many output tokens can increase the response time. If you request the AI to produce a detailed text, it will require more time to generate a complete response. For detailed guidance on managing latencies and optimizing performance, consult OpenAI's production best practices.
Rate Limiting and Costs. OpenAI implements rate limits to ensure equitable access to its services and prevent overuse that could degrade performance for all users. The cost of using the OpenAI API is based on the number of tokens processed. Costs can accumulate quickly with extensive use, making it important to budget and manage your API usage.
User Privacy and Data Handling: Respect user privacy and comply with data protection regulations. Avoid sending sensitive or personal information in API requests and ensure that your application's data handling practices are transparent and secure.

Conclusion

The OpenAI API, with its chat completions feature, is a tool that can significantly enhance AI-driven applications. By understanding how to install and use the API, along with its parameters and best practices, you can create engaging and intelligent solutions. Fancy some practice on the theory with our exercises? Let's go!

17 learners liked this piece of theory. 0 didn't like it. What about you?

Report a typo