Guide Large Language Models
By: Kolena Editorial Team
Complete Guide to GPT-4 API [2024]
Complete Guide to GPT-4 API [2024]
Oct 10, 2024
Oct 10, 2024

What Is the GPT-4 API? 

The GPT-4 API, offered by OpenAI, allows developers to integrate the advanced capabilities of the GPT-4 family of large language models (LLMs) into their applications. It provides a programmatic way to leverage OpenAI’s language models, including their ability to understand and generate human-like text, carry out natural conversation, translate between languages, analyze data, and generate code. 

The GPT-4 API provides access to different OpenAI model versions, including GPT-4o, GPT-4 Turbo, and the previous-generation model GPT-3.5, enabling tailored performance for various use cases, from simple queries to processing extensive documents. At the time of this writing, all GPT-4 models provide a context window of 128K tokens.

The API supports not just text but also multimodal inputs, including files, images, and audio, which enhances its versatility. This capability is crucial for applications that require understanding and generating content across different types of media.

This is part of a series of articles about Large Language Models

What ChatGPT Models Are Available via the API?

As of the time of this writing, the OpenAI API provides access to the following LLMs, which also power the popular ChatGPT platform:

  • GPT-4o: This is the latest model. It supports a context length of up to 128k tokens, suitable for processing long documents such as novels. GPT-4o can handle text, image, and audio inputs, making it versatile for complex tasks requiring multimodal data.
  • GPT-4o mini: A lighter version of GPT-4o, this model maintains the same 128k token context length and multimodal capabilities but is optimized for faster performance with a smaller computational footprint.
  • GPT-4 Turbo: The standard version of GPT-4, also supporting 128k token context length. It handles text, image, and audio inputs, making it suitable for a wide range of applications that require high intelligence and multimodal input.
  • GPT-3.5: Designed for simpler, routine tasks, this model supports a shorter context length of 16k tokens. It processes text inputs and outputs efficiently and is ideal for tasks that do not require the capabilities of GPT-4 models.

OpenAI API Endpoints 

Here are the API endpoints available for GPT 4:

Endpoint NameEndpoint URLFunctionalityExample Request
Completions/v1/completionsGenerates text based on a given prompt. Useful for writing assistance, content creation, and general-purpose text generation.Request: POST /v1/completions
Body: { "model": "gpt-4o", "prompt": "Write a short story about a courageous cat.", "max_tokens": 100 }
Chat/v1/chat/completionsGenerates conversational responses, enabling applications to engage in dialogue with users.Request: POST /v1/chat/completions
Body: { "model": "gpt-4o", "messages": [{"role": "user", "content": "How do you bake a cake?"}], "max_tokens": 50 }
Edits/v1/editsEdits or refines provided text. Useful for improving grammar, tone, or content clarity.Request: POST /v1/edits<br>
Body: { "model": "gpt-4o", "input": "She walk to the store.", "instruction": "Correct the sentence.", "temperature": 0.5 }
Images/v1/images/generationsGenerates images from textual descriptions, useful for visual content creation and artistic applications.Request: POST /v1/images/generations<br>
Body: { "model": "dalle-2", "prompt": "A fantasy castle on a hill at sunset", "n": 1, "size": "1024x1024" }
Embeddings/v1/embeddingsGenerates vector representations of text for tasks like semantic search and clustering.Request: POST /v1/embeddings
Body: { "model": "text-embedding-ada-002", "input": "Understanding machine learning" }
Files/v1/filesManages files for use with fine-tuning and other API services. Supports uploading, listing, and deleting files.Request: POST /v1/files/upload
Body: Form-data with file upload, e.g., a .jsonl file for fine-tuning
Moderation/v1/moderationsAnalyzes text for inappropriate content, aiding in content moderation.Request: POST /v1/moderations
Body: { "input": "Text content to be checked", "model": "text-moderation-latest" }

Accessing GPT-4 APIs

Here are the general steps involved in accessing the various GPT models through the OpenAI API:

  1. Sign up and payment: Create an OpenAI API account and complete a payment of $5 or more to unlock usage tier 1. This payment grants access to the GPT-4, GPT-4 Turbo, GPT-4o, and GPT-4o Mini models. When you start using the APIs, your usage tier might be increased, which influences rate limits (see below).
  2. Choosing the API: Depending on your needs, you can choose from multiple API endpoints, each providing different LLM functionality (as described above).
  3. Using the playground: For testing and experimentation, you can start with the OpenAI Playground. It allows you to interact with the models in a user-friendly interface, making it easier to understand their capabilities and adjust parameters before integrating them into your application.
  4. Integration: Once familiar with the models, integrate the chosen API into your application. The OpenAI API Reference provides detailed instructions and examples to help you get started with API calls, handling responses, and optimizing performance for your specific use case.

GPT API Rate Limits

Rate limits are restrictions imposed on the number of requests or tokens a user can process within a specified period. These limits ensure fair usage and prevent abuse of the API services. 

OpenAI measures rate limits in several ways:

  1. RPM (requests per minute): Limits the number of API requests that can be made in a minute.
  2. RPD (requests per day): Limits the total number of API requests that can be made in a day.
  3. TPM (tokens per minute): Limits the number of tokens processed per minute.
  4. TPD (tokens per day): Limits the total number of tokens processed per day.
  5. IPM (images per minute): Limits the number of images processed per minute.

For example, if your RPM limit is 20, you could send 20 requests per minute, even if you don’t hit the token limit. Conversely, you might hit a token limit before the request limit, such as processing 150,000 tokens with only 15 requests.

Rate and usage limits can be viewed in account settings under the Limits section. Usage tiers automatically adjust based on API usage and spending, generally increasing rate limits across models as you move up tiers:

TierQualificationUsage Limits
FreeAllowed geography$100 / month
Tier 1$5 paid$100 / month
Tier 2$50 paid + 7 days$500 / month
Tier 3$100 paid + 7 days$1,000 / month
Tier 4$250 paid + 14 days$5,000 / month
Tier 5$1,000 paid + 30 days$50,000 / month

As of the time of this writing, these are the API rate limits for the primary OpenAI models:

gpt-3.5-turbodall-e-3gpt-4ogpt-4o-minigpt-4-turbo
Free_Tier_RPM31 img/min
Free_Tier_RPD200
Free_Tier_TPM40,000
Free_Tier_Batch_Queue_Limit200,000
Tier1_RPM3,5005 img/min500500500
Tier1_RPD10,00010,000
Tier1_TPM200,00030,000200,00030,000
Tier1_Batch_Queue_Limit2,000,00090,0002,000,00090,000
Tier2_RPM3,5007 img/min5,0005,0005,000
Tier2_TPM2,000,000450,0002,000,000450,000
Tier2_Batch_Queue_Limit5,000,0001,350,00020,000,0001,350,000
Tier3_RPM3,5007 img/min5,0005,0005,000
Tier3_TPM4,000,000800,0004,000,000600,000
Tier3_Batch_Queue_Limit100,000,00050,000,00040,000,00040,000,000
Tier4_RPM10,00015 img/min10,00010,00010,000
Tier4_TPM10,000,0002,000,00010,000,000800,000
Tier4_Batch_Queue_Limit1,000,000,000200,000,0001,000,000,00080,000,000
Tier5_RPM10,000200 img/min10,00030,00010,000
Tier5_TPM50,000,00030,000,000150,000,0002,000,000
Tier5_Batch_Queue_Limit10,000,000,0005,000,000,00015,000,000,000300,000,000

GPT-4 API Pricing 

GPT-4o Pricing

Standard pricing

  • Input tokens: $5.00 per 1M tokens
  • Output tokens: $15.00 per 1M tokens

Batch API pricing

  • Input tokens: $2.50 per 1M tokens
  • Output tokens: $7.50 per 1M tokens

GPT-4o Mini Pricing

GPT-4o Mini offers a more cost-efficient alternative with similar capabilities in a lighter package. 

Standard pricing

  • Input tokens: $0.150 per 1M tokens
  • Output tokens: $0.600 per 1M tokens

Batch API pricing

  • Input tokens: $0.075 per 1M tokens
  • Output tokens: $0.300 per 1M tokens

Pricing for Embedding Models

The embedding models are priced to support search, clustering, and classification functions.

text-embedding-3-small

  • Standard pricing: $0.020 per 1M tokens
  • Batch API pricing: $0.010 per 1M tokens

text-embedding-3-large

  • Standard pricing: $0.130 per 1M tokens
  • Batch API pricing: $0.065 per 1M tokens

ada v2

  • Standard pricing: $0.100 per 1M tokens
  • Batch API pricing: $0.050 per 1M tokens

Pricing for Fine-tuning Models

Fine-tuning models allow customization to meet specific application needs, with pricing as follows. At the time of this writing, fine tuning for GPT-4o and GPT-4 Turbo is available as early access. You can request access in the fine tuning UI.

gpt-4o-mini-2024-07-18

Standard pricing

  • Input tokens: $0.30 per 1M tokens
  • Output tokens: $1.20 per 1M tokens
  • Training tokens: $3.00 per 1M tokens

Batch API pricing

  • Input tokens: $0.15 per 1M tokens
  • Output tokens: $0.60 per 1M tokens

gpt-3.5-turbo

Standard pricing

  • Input tokens: $3.00 per 1M tokens
  • Output tokens: $6.00 per 1M tokens
  • Training tokens: $8.00 per 1M tokens

Batch API pricing

  • Input tokens: $1.50 per 1M tokens
  • Output tokens: $3.00 per 1M tokens

Tutorial: Getting Started with OpenAI API 

Account Setup

To get started with the OpenAI API, first create an OpenAI account or sign in. Navigate to the API key page and select Create new secret key, optionally naming the key. Make sure to save this somewhere safe and do not share it with anyone.

Setting Up Python

OpenAI provides a custom Python library which makes working with the OpenAI API in Python simple and efficient.

To use the OpenAI Python library, ensure you have Python installed. To download Python, visit the official Python website and download the latest version. You need at least Python 3.7.1 or newer.

Installing the OpenAI Python Library

Once you have Python 3.7.1 or newer installed and (optionally) set up a virtual environment, the OpenAI Python library can be installed. From the terminal / command line, run:

pip install --upgrade openai

Running pip list will show you the installed Python libraries, confirming that the OpenAI Python library was successfully installed.

Sending Your First API Request

After configuring Python and setting up your API key, the final step is to send a request to the OpenAI API using the Python library. Create a file named openai-test.py using the terminal or an IDE.

Inside the file, copy and paste the example below:

from openai import OpenAI
client = OpenAI()

completion = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
`{"role": "system", "content": "You are a positive assistant, skilled in explaining technical concepts while making users feel better."},`  
`{"role": "user", "content": "Explain the concept of horizontal scalability."}`
 ]
)

print(completion.choices[0].message)

To run the code, enter python openai-test.py into the terminal / command line.

The response should look something like this:

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Horizontal scalability refers to the ability of a system to increase capacity by connecting multiple hardware or software entities so that they work as a single logical unit. Imagine you could scale your software as much as you need without any limitation, what a great feeling!"
      }
    }
  ]
}

AI Testing & Validation with Kolena

Kolena is an AI/ML testing & validation platform that solves one of AI’s biggest problems: the lack of trust in model effectiveness. The use cases for AI are enormous, but AI lacks trust from both builders and the public. It is our responsibility to build that trust with full transparency and explainability of ML model performance, not just from a high-level aggregate ‘accuracy’ number, but from rigorous testing and evaluation at scenario levels.

With Kolena, machine learning engineers and data scientists can uncover hidden machine learning model behaviors, easily identify gaps in the test data coverage, and truly learn where and why a model is underperforming, all in minutes not weeks. Kolena’s AI / ML model testing and validation solution helps developers build safe, reliable, and fair systems by allowing companies to instantly stitch together razor-sharp test cases from their data sets, enabling them to scrutinize AI/ML models in the precise scenarios those models will be unleashed upon the real world. Kolena platform transforms the current nature of AI development from experimental into an engineering discipline that can be trusted and automated.

Learn more about Kolena