Skip to main content
Version: 2026 R2

Google Vertex AI

Google Vertex AI provides access to the Gemini family of models — advanced multimodal models that support text, images, audio, and video. Gemini models offer large context windows, reaching up to 2 million tokens in some versions, while also delivering high performance and competitive pricing.

When to choose Google Vertex AI

Large documents and long-context scenarios:

  • analysis of multi-page documents,
  • processing long conversations and extended histories,
  • working with context spanning entire knowledge bases.

Multimedia processing:

  • image analysis and object detection,
  • audio transcription and analysis,
  • video processing.

Cost optimization:

  • Gemini Flash models offer an excellent price-to-performance ratio,
  • lower costs for high query volumes.

GCP integration:

  • you are already using Google Cloud Platform,
  • you need RAG capabilities based on Vertex AI Search.

Requirements

  • a Google Cloud Platform (GCP) account,
  • a GCP project with the Vertex AI API enabled,
  • a Service Account Key in JSON format,
  • a Google Cloud Storage bucket for file processing.

Step 1: Prepare your Google Cloud environment

1. Create a Service Account

  1. Go to Google Cloud Console.
  2. Select an existing project or create a new one.
  3. Go to IAM & Admin > Service Accounts.
  4. Click Create Service Account.
  5. Enter a name (e.g., aiproxy-vertex).
  6. Assign the following roles:
    • Vertex AI User,
    • Storage Object Admin (for the bucket).
  7. Click Create key > JSON and download the key file.

2. Enable the required APIs

  1. Go to APIs & Services > Library.
  2. Enable the following APIs:
    • Vertex AI API,
    • Cloud Storage API.

3. Create a Cloud Storage bucket

  1. Go to Cloud Storage > Buckets.
  2. Click Create bucket.
  3. Enter a name (e.g., aiproxy-files).
  4. Select a region (e.g., us-central1).
  5. Click Create.
info

A Cloud Storage bucket is required for processing files such as images, audio, and documents with Gemini models.

Step 2: Configure AI Proxy

Example aiconfiguration.json

{
"ProviderConnections": {
"GoogleVertex": {
"Description": "Google Vertex AI Connection",
"Type": "Gemini",
"ProviderConfiguration": {
"ApiKey": "your-google-api-key-if-available",
"ServiceAccount": "{\"type\":\"service_account\",\"project_id\":\"your-project\",\"private_key_id\":\"...\",\"private_key\":\"-----BEGIN PRIVATE KEY-----\\n...\\n-----END PRIVATE KEY-----\\n\",\"client_email\":\"aiproxy-vertex@your-project.iam.gserviceaccount.com\",\"client_id\":\"...\",\"auth_uri\":\"https://accounts.google.com/o/oauth2/auth\",\"token_uri\":\"https://oauth2.googleapis.com/token\"}",
"ProjectId": "your-gcp-project-id",
"Region": "us-central1",
"BucketName": "aiproxy-files"
}
}
},
"ProviderModels": [
{
"ConnectionName": "GoogleVertex",
"Priority": 100,
"Name": "Gemini Flash",
"Description": "",
"TextModel": {
"ModelName": "gemini-2.0-flash-exp"
},
"ImageModel": {
"ModelName": "imagen-3.0-fast-generate-001"
},
"AudioModel": {
"ModelName": "gemini-2.0-flash-exp"
},
"EmbeddingModel": {
"ModelName": "text-embedding-004"
}
}
],
"MethodTypesConfiguration": {
"ConciergePrompt": [ "Gemini Flash" ],
"ConciergeExecuteTool": [ "Gemini Flash" ]
}
}
Important
  • ServiceAccount - paste the full contents of the downloaded JSON file as a single string, with quotes properly escaped.
  • ProjectId - the project ID from Google Cloud.
  • Region - the region where you have Vertex AI enabled (e.g., us-central1 or europe-west1).
  • BucketName - the name of the Cloud Storage bucket you created.
Gemini models

The recommended models are gemini-flash and gemini-flash-lite. They are multimodal models that support text, images, and audio, while offering performance that is sufficient for most use cases.

Example docker-compose.yml

name: aiproxy_containers
services:
ai-proxy:
image: webconbps/aiproxy:1.0.0.235
container_name: ai-proxy
restart: unless-stopped
ports:
- "5298:8080"
- "7033:8081"
environment:
- ASPNETCORE_ENVIRONMENT=Production
- AppConfiguration__SelfHosted__Certificate__Path=/app/https/certificate.pem
- Logging__LogLevel__Default=Information
- Logging__LogLevel__Microsoft=Warning
volumes:
- ./certificates/certificate.pem:/app/https/certificate.pem:ro
- ./aiconfiguration.json:/app/aiconfiguration.json:ro

Step 3: Startup

# Make sure you have prepared files:
# - ./certificates/certificate.pem
# - ./aiconfiguration.json (with full Service Account JSON contents)

# Run container
docker-compose up -d

# Check logs
docker-compose logs -f ai-proxy

Troubleshooting

Error: Permission denied / 403 Forbidden

Possible causes:

  • the Service Account does not have the required permissions,
  • the required APIs are not enabled in the project.

Solution:

# Check if Service Account has roles:
# - Vertex AI User
# - Storage Object Admin

# Check if APIs are enabled:
# - Vertex AI API
# - Cloud Storage API

# Restart container
docker-compose restart ai-proxy

Error: Invalid Service Account JSON

Possible causes:

  • the JSON value provided in ServiceAccount is not in a valid format,
  • quotation marks in the JSON content are not properly escaped.

Solution:

# ServiceAccount must be a JSON string with escaped quotes
# Example of correct format:
# "ServiceAccount": "{\"type\":\"service_account\",\"project_id\":\"my-project\"...}"

# You can use online tools to escape JSON string

Error: Bucket not found

Possible causes:

  • the bucket with the specified name does not exist,
  • the Service Account does not have permission to access the bucket.

Solution:

# Check if bucket exists in Cloud Storage
# Make sure Service Account has Storage Object Admin role
# Check if BucketName in configuration is correct

The following models are recommended for use with AI Proxy.

Text and multimodal models:

  • gemini-2.0-flash-exp - the latest fast multimodal model, supporting text, images, and audio,
  • gemini-2.0-flash-lite - a lighter variant that still supports multimodal input,
  • gemini-1.5-flash - a proven multimodal model suitable for a wide range of use cases,
  • gemini-1.5-pro - a larger model that offers broader capabilities and stronger performance in more demanding scenarios.

Embedding models:

  • text-embedding-004 - the current model for generating embeddings,
  • text-multilingual-embedding-002 - an embedding model with multilingual support.

Image generation models:

  • imagen-3.0-fast-generate-001 - a model optimized for fast image generation,
  • imagen-3.0-generate-001 - a model focused on higher image quality.
Multimodal models

Models from the Gemini Flash family are multimodal, which means they can process text, images, and audio within a single model. This simplifies configuration and makes integration with different types of input more consistent.