What Is the GPT-4 API?
The GPT-4 API, offered by OpenAI, allows developers to integrate the advanced capabilities of the GPT-4 family of large language models (LLMs) into their applications. It provides a programmatic way to leverage OpenAI’s language models, including their ability to understand and generate human-like text, carry out natural conversation, translate between languages, analyze data, and generate code.
The GPT-4 API provides access to different OpenAI model versions, including GPT-4o, GPT-4 Turbo, and the previous-generation model GPT-3.5, enabling tailored performance for various use cases, from simple queries to processing extensive documents. At the time of this writing, all GPT-4 models provide a context window of 128K tokens.
The API supports not just text but also multimodal inputs, including files, images, and audio, which enhances its versatility. This capability is crucial for applications that require understanding and generating content across different types of media.
This is part of a series of articles about Large Language Models
What ChatGPT Models Are Available via the API?
As of the time of this writing, the OpenAI API provides access to the following LLMs, which also power the popular ChatGPT platform:
- GPT-4o: This is the latest model. It supports a context length of up to 128k tokens, suitable for processing long documents such as novels. GPT-4o can handle text, image, and audio inputs, making it versatile for complex tasks requiring multimodal data.
- GPT-4o mini: A lighter version of GPT-4o, this model maintains the same 128k token context length and multimodal capabilities but is optimized for faster performance with a smaller computational footprint.
- GPT-4 Turbo: The standard version of GPT-4, also supporting 128k token context length. It handles text, image, and audio inputs, making it suitable for a wide range of applications that require high intelligence and multimodal input.
- GPT-3.5: Designed for simpler, routine tasks, this model supports a shorter context length of 16k tokens. It processes text inputs and outputs efficiently and is ideal for tasks that do not require the capabilities of GPT-4 models.
OpenAI API Endpoints
Here are the API endpoints available for GPT 4:
Endpoint Name | Endpoint URL | Functionality | Example Request |
Completions | /v1/completions | Generates text based on a given prompt. Useful for writing assistance, content creation, and general-purpose text generation. | Request: POST /v1/completions Body: { "model": "gpt-4o", "prompt": "Write a short story about a courageous cat.", "max_tokens": 100 } |
Chat | /v1/chat/completions | Generates conversational responses, enabling applications to engage in dialogue with users. | Request: POST /v1/chat/completions Body: { "model": "gpt-4o", "messages": [{"role": "user", "content": "How do you bake a cake?"}], "max_tokens": 50 } |
Edits | /v1/edits | Edits or refines provided text. Useful for improving grammar, tone, or content clarity. | Request: POST /v1/edits<br> Body: { "model": "gpt-4o", "input": "She walk to the store.", "instruction": "Correct the sentence.", "temperature": 0.5 } |
Images | /v1/images/generations | Generates images from textual descriptions, useful for visual content creation and artistic applications. | Request: POST /v1/images/generations<br> Body: { "model": "dalle-2", "prompt": "A fantasy castle on a hill at sunset", "n": 1, "size": "1024x1024" } |
Embeddings | /v1/embeddings | Generates vector representations of text for tasks like semantic search and clustering. | Request: POST /v1/embeddings Body: { "model": "text-embedding-ada-002", "input": "Understanding machine learning" } |
Files | /v1/files | Manages files for use with fine-tuning and other API services. Supports uploading, listing, and deleting files. | Request: POST /v1/files/upload Body: Form-data with file upload, e.g., a .jsonl file for fine-tuning |
Moderation | /v1/moderations | Analyzes text for inappropriate content, aiding in content moderation. | Request: POST /v1/moderations Body: { "input": "Text content to be checked", "model": "text-moderation-latest" } |
Accessing GPT-4 APIs
Here are the general steps involved in accessing the various GPT models through the OpenAI API:
- Sign up and payment: Create an OpenAI API account and complete a payment of $5 or more to unlock usage tier 1. This payment grants access to the GPT-4, GPT-4 Turbo, GPT-4o, and GPT-4o Mini models. When you start using the APIs, your usage tier might be increased, which influences rate limits (see below).
- Choosing the API: Depending on your needs, you can choose from multiple API endpoints, each providing different LLM functionality (as described above).
- Using the playground: For testing and experimentation, you can start with the OpenAI Playground. It allows you to interact with the models in a user-friendly interface, making it easier to understand their capabilities and adjust parameters before integrating them into your application.
- Integration: Once familiar with the models, integrate the chosen API into your application. The OpenAI API Reference provides detailed instructions and examples to help you get started with API calls, handling responses, and optimizing performance for your specific use case.
GPT API Rate Limits
Rate limits are restrictions imposed on the number of requests or tokens a user can process within a specified period. These limits ensure fair usage and prevent abuse of the API services.
OpenAI measures rate limits in several ways:
- RPM (requests per minute): Limits the number of API requests that can be made in a minute.
- RPD (requests per day): Limits the total number of API requests that can be made in a day.
- TPM (tokens per minute): Limits the number of tokens processed per minute.
- TPD (tokens per day): Limits the total number of tokens processed per day.
- IPM (images per minute): Limits the number of images processed per minute.
For example, if your RPM limit is 20, you could send 20 requests per minute, even if you don’t hit the token limit. Conversely, you might hit a token limit before the request limit, such as processing 150,000 tokens with only 15 requests.
Rate and usage limits can be viewed in account settings under the Limits section. Usage tiers automatically adjust based on API usage and spending, generally increasing rate limits across models as you move up tiers:
Tier | Qualification | Usage Limits |
Free | Allowed geography | $100 / month |
Tier 1 | $5 paid | $100 / month |
Tier 2 | $50 paid + 7 days | $500 / month |
Tier 3 | $100 paid + 7 days | $1,000 / month |
Tier 4 | $250 paid + 14 days | $5,000 / month |
Tier 5 | $1,000 paid + 30 days | $50,000 / month |
As of the time of this writing, these are the API rate limits for the primary OpenAI models:
gpt-3.5-turbo | dall-e-3 | gpt-4o | gpt-4o-mini | gpt-4-turbo | |
Free_Tier_RPM | 3 | 1 img/min | – | – | – |
Free_Tier_RPD | 200 | – | – | – | – |
Free_Tier_TPM | 40,000 | – | – | – | – |
Free_Tier_Batch_Queue_Limit | 200,000 | – | – | – | – |
Tier1_RPM | 3,500 | 5 img/min | 500 | 500 | 500 |
Tier1_RPD | 10,000 | – | – | 10,000 | – |
Tier1_TPM | 200,000 | – | 30,000 | 200,000 | 30,000 |
Tier1_Batch_Queue_Limit | 2,000,000 | – | 90,000 | 2,000,000 | 90,000 |
Tier2_RPM | 3,500 | 7 img/min | 5,000 | 5,000 | 5,000 |
Tier2_TPM | 2,000,000 | – | 450,000 | 2,000,000 | 450,000 |
Tier2_Batch_Queue_Limit | 5,000,000 | – | 1,350,000 | 20,000,000 | 1,350,000 |
Tier3_RPM | 3,500 | 7 img/min | 5,000 | 5,000 | 5,000 |
Tier3_TPM | 4,000,000 | – | 800,000 | 4,000,000 | 600,000 |
Tier3_Batch_Queue_Limit | 100,000,000 | – | 50,000,000 | 40,000,000 | 40,000,000 |
Tier4_RPM | 10,000 | 15 img/min | 10,000 | 10,000 | 10,000 |
Tier4_TPM | 10,000,000 | – | 2,000,000 | 10,000,000 | 800,000 |
Tier4_Batch_Queue_Limit | 1,000,000,000 | – | 200,000,000 | 1,000,000,000 | 80,000,000 |
Tier5_RPM | 10,000 | 200 img/min | 10,000 | 30,000 | 10,000 |
Tier5_TPM | 50,000,000 | – | 30,000,000 | 150,000,000 | 2,000,000 |
Tier5_Batch_Queue_Limit | 10,000,000,000 | – | 5,000,000,000 | 15,000,000,000 | 300,000,000 |
GPT-4 API Pricing
GPT-4o Pricing
Standard pricing
- Input tokens: $5.00 per 1M tokens
- Output tokens: $15.00 per 1M tokens
Batch API pricing
- Input tokens: $2.50 per 1M tokens
- Output tokens: $7.50 per 1M tokens
GPT-4o Mini Pricing
GPT-4o Mini offers a more cost-efficient alternative with similar capabilities in a lighter package.
Standard pricing
- Input tokens: $0.150 per 1M tokens
- Output tokens: $0.600 per 1M tokens
Batch API pricing
- Input tokens: $0.075 per 1M tokens
- Output tokens: $0.300 per 1M tokens
Pricing for Embedding Models
The embedding models are priced to support search, clustering, and classification functions.
text-embedding-3-small
- Standard pricing: $0.020 per 1M tokens
- Batch API pricing: $0.010 per 1M tokens
text-embedding-3-large
- Standard pricing: $0.130 per 1M tokens
- Batch API pricing: $0.065 per 1M tokens
ada v2
- Standard pricing: $0.100 per 1M tokens
- Batch API pricing: $0.050 per 1M tokens
Pricing for Fine-tuning Models
Fine-tuning models allow customization to meet specific application needs, with pricing as follows. At the time of this writing, fine tuning for GPT-4o and GPT-4 Turbo is available as early access. You can request access in the fine tuning UI.
gpt-4o-mini-2024-07-18
Standard pricing
- Input tokens: $0.30 per 1M tokens
- Output tokens: $1.20 per 1M tokens
- Training tokens: $3.00 per 1M tokens
Batch API pricing
- Input tokens: $0.15 per 1M tokens
- Output tokens: $0.60 per 1M tokens
gpt-3.5-turbo
Standard pricing
- Input tokens: $3.00 per 1M tokens
- Output tokens: $6.00 per 1M tokens
- Training tokens: $8.00 per 1M tokens
Batch API pricing
- Input tokens: $1.50 per 1M tokens
- Output tokens: $3.00 per 1M tokens
Tutorial: Getting Started with OpenAI API
Account Setup
To get started with the OpenAI API, first create an OpenAI account or sign in. Navigate to the API key page and select Create new secret key, optionally naming the key. Make sure to save this somewhere safe and do not share it with anyone.
Setting Up Python
OpenAI provides a custom Python library which makes working with the OpenAI API in Python simple and efficient.
To use the OpenAI Python library, ensure you have Python installed. To download Python, visit the official Python website and download the latest version. You need at least Python 3.7.1 or newer.
Installing the OpenAI Python Library
Once you have Python 3.7.1 or newer installed and (optionally) set up a virtual environment, the OpenAI Python library can be installed. From the terminal / command line, run:
pip install --upgrade openai
Running pip list
will show you the installed Python libraries, confirming that the OpenAI Python library was successfully installed.
Sending Your First API Request
After configuring Python and setting up your API key, the final step is to send a request to the OpenAI API using the Python library. Create a file named openai-test.py using the terminal or an IDE.
Inside the file, copy and paste the example below:
from openai import OpenAI
client = OpenAI()
completion = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
`{"role": "system", "content": "You are a positive assistant, skilled in explaining technical concepts while making users feel better."},`
`{"role": "user", "content": "Explain the concept of horizontal scalability."}`
]
)
print(completion.choices[0].message)
To run the code, enter python openai-test.py into the terminal / command line.
The response should look something like this:
{
"choices": [
{
"message": {
"role": "assistant",
"content": "Horizontal scalability refers to the ability of a system to increase capacity by connecting multiple hardware or software entities so that they work as a single logical unit. Imagine you could scale your software as much as you need without any limitation, what a great feeling!"
}
}
]
}
AI Testing & Validation with Kolena
Kolena is an AI/ML testing & validation platform that solves one of AI’s biggest problems: the lack of trust in model effectiveness. The use cases for AI are enormous, but AI lacks trust from both builders and the public. It is our responsibility to build that trust with full transparency and explainability of ML model performance, not just from a high-level aggregate ‘accuracy’ number, but from rigorous testing and evaluation at scenario levels.
With Kolena, machine learning engineers and data scientists can uncover hidden machine learning model behaviors, easily identify gaps in the test data coverage, and truly learn where and why a model is underperforming, all in minutes not weeks. Kolena’s AI / ML model testing and validation solution helps developers build safe, reliable, and fair systems by allowing companies to instantly stitch together razor-sharp test cases from their data sets, enabling them to scrutinize AI/ML models in the precise scenarios those models will be unleashed upon the real world. Kolena platform transforms the current nature of AI development from experimental into an engineering discipline that can be trusted and automated.