Guide Large Language Models
Working with Gemini API: Quick Start for Developers
Working with Gemini API: Quick Start for Developers
Oct 10, 2024
Oct 10, 2024

What Is the Gemini API? 

The Gemini API is a tool provided by Google to facilitate access to its latest AI models, specifically the Gemini model family. This API allows developers to turn ideas into scalable applications by leveraging AI capabilities.

As of the time of this writing, the most advanced model available via the Gemini API is Gemini 1.5 Pro with a context length of approximately 2 million tokens. The Gemini API also offers Gemini Flash, which has more limited capabilities but is cheaper and faster to run, with a context window of 1 million tokens.

Gemini currently supports the largest context window of any LLM system, enabling the analysis and understanding of large datasets. Additionally, the Gemini models are natively multimodal, meaning they can integrate and process different types of data, including text, images, video, and audio.

This is part of a series of articles about Large Language Models. 

Gemini API Key Capabilities 

Generate Completions Using Gemini Models

The Gemini API allows you to generate text completions based on a given prompt. This feature is highly versatile, supporting tasks from simple sentence completion to generating long-form content. To use this capability, you initialize the model and provide a text prompt. The API processes the prompt and generates coherent and contextually relevant text.

Gemini models are designed to handle complex, multi-turn conversations. This allows developers to build advanced chatbot applications that can maintain context over several interactions. By leveraging the generate_content method, you can engage users in meaningful and contextually appropriate dialogue.

Solve Tasks with Fine-Tuning

The Gemini API offers fine-tuning capabilities to enhance model performance on specific tasks. Fine-tuning involves providing the model with a training dataset containing examples of the desired task. This process helps the model learn and encode the necessary information to perform the task more effectively. 

By structuring your training data with prompt inputs and expected outputs, you can teach the model to mimic desired behaviors. The output is a new model combining the original parameters with the newly learned ones, resulting in improved performance for niche applications.

Call Code With Function Calling

The function calling feature in the Gemini API enables integration with external APIs by generating structured data outputs. These outputs specify function names and suggested arguments, allowing applications to call the necessary APIs and incorporate the results into further model prompts. 

This feature is useful for interacting with real-time information and various services, enhancing the ability to provide contextual and relevant answers. Function declarations in the model prompt describe the API functions, their purposes, and parameters.

Search and Answer With Embeddings

The embedding service in the Gemini API generates high-quality embeddings for words, phrases, and sentences, which are essential for various natural language processing tasks. Embeddings convert text into numerical coordinates, capturing semantic meaning and context. This allows for effective text comparison, semantic search, text classification, and clustering. 

Texts with similar meanings have closer embeddings, enabling models to understand and relate different texts accurately. The ability to generate and utilize embeddings enhances the capability to analyze and interpret large volumes of text data efficiently.

Gemini API Pricing 

The Gemini API provides two pricing options: free of charge and pay-as-you-go. The pricing details below are correct as of the time of this writing. For up-to-date pricing and more detail, see the official pricing page.

Free of Charge

Price (input): Free

Context caching: Free, up to 1 million tokens per hour

Price (output): Free

Prompts/responses used to improve products: Yes

Rate limits:

  • 15 RPM (requests per minute)
  • 1 million TPM (tokens per minute)
  • 1,500 RPD (requests per day)

Pay-as-You-Go

Price (input):

  • $0.35 per 1 million tokens (for prompts up to 128K tokens)
  • $0.70 per 1 million tokens (for prompts longer than 128K tokens)

Context caching:

  • $0.0875 per 1 million tokens (for prompts up to 128K tokens)
  • $0.175 per 1 million tokens (for prompts longer than 128K tokens)
  • $1.00 per 1 million tokens per hour (storage)

Price (output):

  • $1.05 per 1 million tokens (for prompts up to 128K tokens)
  • $2.10 per 1 million tokens (for prompts longer than 128K tokens)

Prompts/responses used to improve products: No

Rate limits:

  • 1000 RPM (requests per minute)
  • 4 million TPM (tokens per minute)

How to Get a Gemini API Key 

To use the Gemini API, you need to obtain an API key, which can be created quickly through Google AI Studio. Follow these steps to get your key and verify its setup:

  1. Create an API key:
    • Go to Google AI Studio and navigate to the API key section.
    • Click on the option to create a new API key. This will generate a unique key that you’ll use to access the Gemini API.
  2. Verify your API key:
    • Open a terminal or command prompt.
    • Use a curl command to ensure your API key is correctly set up. You can pass the API key in the URL or use the x-goog-api-key header.

Example: Passing the API key in the URL

export API_KEY=”My-API-Key”
curl -H 'Content-Type: application/json' \
-d '{
  "contents": [
    {
      "role": "user",
      "parts": [
        {"text": "List four genres of music."}
      ]
    }
  ]
}' \
"https://generativelanguage.googleapis.com/v1/models/gemini-pro:generateContent?key=${API_KEY}"

Example: Using the x-goog-api-key header

API_KEY="My_API_Key"

curl -H 'Content-Type: application/json' \
-H "x-goog-api-key: ${API_KEY}" \
-d '{
  "contents": [
    {
      "role": "user",
      "parts": [
        {"text": "List four genres of music."}
      ]
    }
  ]
}' \
"https://generativelanguage.googleapis.com/v1/models/gemini-pro:generateContent"

Getting Started with Gemini API

This section provides a quick guide to help you start using the Gemini API using your preferred software development kit (SDK).

Prerequisites

Ensure your development environment meets the following requirements:

  • Python 3.9 or later
  • Jupyter installed to run the notebook

Install the Gemini API SDK

To interact with the Gemini API, you’ll need the google-generativeai package. Install it using pip:

pip3 install -q -U google-generativeai

Configure Your API Key

You need an API key to use the Gemini API. After obtaining your key (refer to the previous section for details), configure it by setting it as an environment variable to avoid accidentally exposing it in your codebase:

export API_KEY=<My_API_Key>

Initialize the Model

Before making any API calls, import the necessary modules and initialize the model. Gemini 1.5 models support both text-only and multimodal prompts.

import google.generativeai as genai
import os

genai.configure(api_key=os.environ["API_KEY"])

model = genai.GenerativeModel('gemini-1.5-flash')

Make Your First Request

You can now make your first request to generate text. Here’s an example of generating a story:

response = model.generate_content("Write a romantic poem about a robot in love.")
print(response.text)

This basic setup will get you started with the Gemini API, enabling you to leverage AI capabilities in your applications.

Working with the Gemini API 

Generate Text From Text Inputs

To generate text using text inputs with the Gemini API, you need to use one of the models from the Gemini 1.5 series or the Gemini 1.0 Pro model. These models are designed to handle a wide range of tasks, from simple text generation to complex, multi-turn conversations.

Start by initializing the model. For this example, we’ll use the gemini-1.5-flash model:

import google.generativeai as genai

model = genai.GenerativeModel('gemini-1.5-flash')

To generate text, use the generate_content method, which takes a prompt string as input. Here’s a basic example:

response = model.generate_content("Why is the ocean blue?")
print(response.text)

The response.text contains the generated text output. For formatted output, especially if you want to display Markdown,  we can use the Python markdown library.

import markdown

 
print( markdown.markdown( response.text ) )

Generate Text from Image and Text Inputs

The Gemini 1.5 models also support multimodal inputs, allowing you to provide both text and images as input to generate text output. This is particularly useful for creating rich descriptions or narratives based on visual content.

First, ensure you have an image to work with. For example, download an image and open it using the PIL.Image library:

curl -o image.jpg https://t0.gstatic.com/licensed-image?q=tbn:ANd9GcQ_Kevbk21QBRy-PgB4kQpS79brbmmEG7m3VOTShAn4PecDU5H5UxrJxE3Dw1JiaG17V88QIol19-3TM2wCHw

from PIL import Image

img = Image.open('image.jpg')

Next, use the generate_content method to provide both text and image as input:

response = model.generate_content(["Write an enthusiastic news article based on this image. Include a description of the event in the photo and discuss the background leading up to it.", img])
print(response.text)

Chat Conversations

For interactive and multi-turn conversations, the Gemini API offers a chat session feature. This is managed through the ChatSession class, which keeps track of the conversation history and allows for a seamless chat experience.

Initialize the chat session with the chosen model:

chat = model.start_chat(history=[])

Send a message and get a response using the send_message method:

response = chat.send_message("In two sentences, explain how a AI works to an eight-year-old.")
print(response.text)

The conversation history is automatically managed, making it easy to continue the chat:

response = chat.send_message("OK, please give a more in-depth explanation to a university student.", stream=True)

for chunk in response:
	print(chunk.text)
print("_" * 80)

The stream=True argument streams the response, providing real-time updates. The conversation history can be accessed and displayed using the chat session’s history attribute:

for message in chat.history:
	print(f'**{message.role}**: {message.parts[0].text}')

AI Testing & Validation with Kolena

Kolena is an AI/ML testing & validation platform that solves one of AI’s biggest problems: the lack of trust in model effectiveness. The use cases for AI are enormous, but AI lacks trust from both builders and the public. It is our responsibility to build that trust with full transparency and explainability of ML model performance, not just from a high-level aggregate ‘accuracy’ number, but from rigorous testing and evaluation at scenario levels.

With Kolena, machine learning engineers and data scientists can uncover hidden machine learning model behaviors, easily identify gaps in the test data coverage, and truly learn where and why a model is underperforming, all in minutes not weeks. Kolena’s AI / ML model testing and validation solution helps developers build safe, reliable, and fair systems by allowing companies to instantly stitch together razor-sharp test cases from their data sets, enabling them to scrutinize AI/ML models in the precise scenarios those models will be unleashed upon the real world. Kolena platform transforms the current nature of AI development from experimental into an engineering discipline that can be trusted and automated.

Learn more about Kolena