What Is the Gemini API?
The Gemini API is a tool provided by Google to facilitate access to its latest AI models, specifically the Gemini model family. This API allows developers to turn ideas into scalable applications by leveraging AI capabilities.
As of the time of this writing, the most advanced model available via the Gemini API is Gemini 1.5 Pro with a context length of approximately 2 million tokens. The Gemini API also offers Gemini Flash, which has more limited capabilities but is cheaper and faster to run, with a context window of 1 million tokens.
Gemini currently supports the largest context window of any LLM system, enabling the analysis and understanding of large datasets. Additionally, the Gemini models are natively multimodal, meaning they can integrate and process different types of data, including text, images, video, and audio.
This is part of a series of articles about Large Language Models.
Gemini API Key Capabilities
Generate Completions Using Gemini Models
The Gemini API allows you to generate text completions based on a given prompt. This feature is highly versatile, supporting tasks from simple sentence completion to generating long-form content. To use this capability, you initialize the model and provide a text prompt. The API processes the prompt and generates coherent and contextually relevant text.
Gemini models are designed to handle complex, multi-turn conversations. This allows developers to build advanced chatbot applications that can maintain context over several interactions. By leveraging the generate_content
method, you can engage users in meaningful and contextually appropriate dialogue.
Solve Tasks with Fine-Tuning
The Gemini API offers fine-tuning capabilities to enhance model performance on specific tasks. Fine-tuning involves providing the model with a training dataset containing examples of the desired task. This process helps the model learn and encode the necessary information to perform the task more effectively.
By structuring your training data with prompt inputs and expected outputs, you can teach the model to mimic desired behaviors. The output is a new model combining the original parameters with the newly learned ones, resulting in improved performance for niche applications.
Call Code With Function Calling
The function calling feature in the Gemini API enables integration with external APIs by generating structured data outputs. These outputs specify function names and suggested arguments, allowing applications to call the necessary APIs and incorporate the results into further model prompts.
This feature is useful for interacting with real-time information and various services, enhancing the ability to provide contextual and relevant answers. Function declarations in the model prompt describe the API functions, their purposes, and parameters.
Search and Answer With Embeddings
The embedding service in the Gemini API generates high-quality embeddings for words, phrases, and sentences, which are essential for various natural language processing tasks. Embeddings convert text into numerical coordinates, capturing semantic meaning and context. This allows for effective text comparison, semantic search, text classification, and clustering.
Texts with similar meanings have closer embeddings, enabling models to understand and relate different texts accurately. The ability to generate and utilize embeddings enhances the capability to analyze and interpret large volumes of text data efficiently.
Gemini API Pricing
The Gemini API provides two pricing options: free of charge and pay-as-you-go. The pricing details below are correct as of the time of this writing. For up-to-date pricing and more detail, see the official pricing page.
Free of Charge
Price (input): Free
Context caching: Free, up to 1 million tokens per hour
Price (output): Free
Prompts/responses used to improve products: Yes
Rate limits:
- 15 RPM (requests per minute)
- 1 million TPM (tokens per minute)
- 1,500 RPD (requests per day)
Pay-as-You-Go
Price (input):
- $0.35 per 1 million tokens (for prompts up to 128K tokens)
- $0.70 per 1 million tokens (for prompts longer than 128K tokens)
Context caching:
- $0.0875 per 1 million tokens (for prompts up to 128K tokens)
- $0.175 per 1 million tokens (for prompts longer than 128K tokens)
- $1.00 per 1 million tokens per hour (storage)
Price (output):
- $1.05 per 1 million tokens (for prompts up to 128K tokens)
- $2.10 per 1 million tokens (for prompts longer than 128K tokens)
Prompts/responses used to improve products: No
Rate limits:
- 1000 RPM (requests per minute)
- 4 million TPM (tokens per minute)
How to Get a Gemini API Key
To use the Gemini API, you need to obtain an API key, which can be created quickly through Google AI Studio. Follow these steps to get your key and verify its setup:
- Create an API key:
- Go to Google AI Studio and navigate to the API key section.
- Click on the option to create a new API key. This will generate a unique key that you’ll use to access the Gemini API.
- Verify your API key:
- Open a terminal or command prompt.
- Use a
curl
command to ensure your API key is correctly set up. You can pass the API key in the URL or use thex-goog-api-key
header.
Example: Passing the API key in the URL
export API_KEY=”My-API-Key”
curl -H 'Content-Type: application/json' \
-d '{
"contents": [
{
"role": "user",
"parts": [
{"text": "List four genres of music."}
]
}
]
}' \
"https://generativelanguage.googleapis.com/v1/models/gemini-pro:generateContent?key=${API_KEY}"
Example: Using the x-goog-api-key header
API_KEY="My_API_Key"
curl -H 'Content-Type: application/json' \
-H "x-goog-api-key: ${API_KEY}" \
-d '{
"contents": [
{
"role": "user",
"parts": [
{"text": "List four genres of music."}
]
}
]
}' \
"https://generativelanguage.googleapis.com/v1/models/gemini-pro:generateContent"
Getting Started with Gemini API
This section provides a quick guide to help you start using the Gemini API using your preferred software development kit (SDK).
Prerequisites
Ensure your development environment meets the following requirements:
- Python 3.9 or later
- Jupyter installed to run the notebook
Install the Gemini API SDK
To interact with the Gemini API, you’ll need the google-generativeai
package. Install it using pip:
pip3 install -q -U google-generativeai
Configure Your API Key
You need an API key to use the Gemini API. After obtaining your key (refer to the previous section for details), configure it by setting it as an environment variable to avoid accidentally exposing it in your codebase:
export API_KEY=<My_API_Key>
Initialize the Model
Before making any API calls, import the necessary modules and initialize the model. Gemini 1.5 models support both text-only and multimodal prompts.
import google.generativeai as genai
import os
genai.configure(api_key=os.environ["API_KEY"])
model = genai.GenerativeModel('gemini-1.5-flash')
Make Your First Request
You can now make your first request to generate text. Here’s an example of generating a story:
response = model.generate_content("Write a romantic poem about a robot in love.")
print(response.text)
This basic setup will get you started with the Gemini API, enabling you to leverage AI capabilities in your applications.
Working with the Gemini API
Generate Text From Text Inputs
To generate text using text inputs with the Gemini API, you need to use one of the models from the Gemini 1.5 series or the Gemini 1.0 Pro model. These models are designed to handle a wide range of tasks, from simple text generation to complex, multi-turn conversations.
Start by initializing the model. For this example, we’ll use the gemini-1.5-flash model:
import google.generativeai as genai
model = genai.GenerativeModel('gemini-1.5-flash')
To generate text, use the generate_content
method, which takes a prompt string as input. Here’s a basic example:
response = model.generate_content("Why is the ocean blue?")
print(response.text)
The response.text
contains the generated text output. For formatted output, especially if you want to display Markdown, we can use the Python markdown library.
import markdown
print( markdown.markdown( response.text ) )
Generate Text from Image and Text Inputs
The Gemini 1.5 models also support multimodal inputs, allowing you to provide both text and images as input to generate text output. This is particularly useful for creating rich descriptions or narratives based on visual content.
First, ensure you have an image to work with. For example, download an image and open it using the PIL.Image
library:
curl -o image.jpg https://t0.gstatic.com/licensed-image?q=tbn:ANd9GcQ_Kevbk21QBRy-PgB4kQpS79brbmmEG7m3VOTShAn4PecDU5H5UxrJxE3Dw1JiaG17V88QIol19-3TM2wCHw
from PIL import Image
img = Image.open('image.jpg')
Next, use the generate_content
method to provide both text and image as input:
response = model.generate_content(["Write an enthusiastic news article based on this image. Include a description of the event in the photo and discuss the background leading up to it.", img])
print(response.text)
Chat Conversations
For interactive and multi-turn conversations, the Gemini API offers a chat session feature. This is managed through the ChatSession
class, which keeps track of the conversation history and allows for a seamless chat experience.
Initialize the chat session with the chosen model:
chat = model.start_chat(history=[])
Send a message and get a response using the send_message
method:
response = chat.send_message("In two sentences, explain how a AI works to an eight-year-old.")
print(response.text)
The conversation history is automatically managed, making it easy to continue the chat:
response = chat.send_message("OK, please give a more in-depth explanation to a university student.", stream=True)
for chunk in response:
print(chunk.text)
print("_" * 80)
The stream=True
argument streams the response, providing real-time updates. The conversation history can be accessed and displayed using the chat session’s history
attribute:
for message in chat.history:
print(f'**{message.role}**: {message.parts[0].text}')
AI Testing & Validation with Kolena
Kolena is an AI/ML testing & validation platform that solves one of AI’s biggest problems: the lack of trust in model effectiveness. The use cases for AI are enormous, but AI lacks trust from both builders and the public. It is our responsibility to build that trust with full transparency and explainability of ML model performance, not just from a high-level aggregate ‘accuracy’ number, but from rigorous testing and evaluation at scenario levels.
With Kolena, machine learning engineers and data scientists can uncover hidden machine learning model behaviors, easily identify gaps in the test data coverage, and truly learn where and why a model is underperforming, all in minutes not weeks. Kolena’s AI / ML model testing and validation solution helps developers build safe, reliable, and fair systems by allowing companies to instantly stitch together razor-sharp test cases from their data sets, enabling them to scrutinize AI/ML models in the precise scenarios those models will be unleashed upon the real world. Kolena platform transforms the current nature of AI development from experimental into an engineering discipline that can be trusted and automated.