Version: 2026 R2

Google Vertex AI

Google Vertex AI provides access to the Gemini family of models — advanced multimodal models that support text, images, audio, and video. Gemini models offer large context windows, reaching up to 2 million tokens in some versions, while also delivering high performance and competitive pricing.

When to choose Google Vertex AI

Large documents and long-context scenarios:

analysis of multi-page documents,
processing long conversations and extended histories,
working with context spanning entire knowledge bases.

Multimedia processing:

image analysis and object detection,
audio transcription and analysis,
video processing.

Cost optimization:

Gemini Flash models offer an excellent price-to-performance ratio,
lower costs for high query volumes.

GCP integration:

you are already using Google Cloud Platform,
you need RAG capabilities based on Vertex AI Search.

Requirements

a Google Cloud Platform (GCP) account,
a GCP project with the Vertex AI API enabled,
a Service Account Key in JSON format,
a Google Cloud Storage bucket for file processing.

Step 1: Prepare your Google Cloud environment

1. Create a Service Account

Go to Google Cloud Console.
Select an existing project or create a new one.
Go to IAM & Admin > Service Accounts.
Click Create Service Account.
Enter a name (e.g., aiproxy-vertex).
Assign the following roles:
- Vertex AI User,
- Storage Object Admin (for the bucket).
Click Create key > JSON and download the key file.

2. Enable the required APIs

Go to APIs & Services > Library.
Enable the following APIs:
- Vertex AI API,
- Cloud Storage API.

3. Create a Cloud Storage bucket

Go to Cloud Storage > Buckets.
Click Create bucket.
Enter a name (e.g., aiproxy-files).
Select a region (e.g., us-central1).
Click Create.

info

A Cloud Storage bucket is required for processing files such as images, audio, and documents with Gemini models.

Step 2: Configure AI Proxy

Example `aiconfiguration.json`

{
  "ProviderConnections": {
    "GoogleVertex": {
      "Description": "Google Vertex AI Connection",
      "Type": "Gemini",
      "ProviderConfiguration": {
        "ApiKey": "your-google-api-key-if-available",
        "ServiceAccount": "{\"type\":\"service_account\",\"project_id\":\"your-project\",\"private_key_id\":\"...\",\"private_key\":\"-----BEGIN PRIVATE KEY-----\\n...\\n-----END PRIVATE KEY-----\\n\",\"client_email\":\"aiproxy-vertex@your-project.iam.gserviceaccount.com\",\"client_id\":\"...\",\"auth_uri\":\"https://accounts.google.com/o/oauth2/auth\",\"token_uri\":\"https://oauth2.googleapis.com/token\"}",
        "ProjectId": "your-gcp-project-id",
        "Region": "us-central1",
        "BucketName": "aiproxy-files"
      }
    }
  },
  "ProviderModels": [
    {
      "ConnectionName": "GoogleVertex",
      "Priority": 100,
      "Name": "Gemini Flash",
      "Description": "",
      "TextModel": {
        "ModelName": "gemini-2.0-flash-exp"
      },
      "ImageModel": {
        "ModelName": "imagen-3.0-fast-generate-001"
      },
      "AudioModel": {
        "ModelName": "gemini-2.0-flash-exp"
      },
      "EmbeddingModel": {
        "ModelName": "text-embedding-004"
      }
    }
  ],
  "MethodTypesConfiguration": {
    "ConciergePrompt": [ "Gemini Flash" ],
    "ConciergeExecuteTool": [ "Gemini Flash" ]
  }
}

Important

ServiceAccount - paste the full contents of the downloaded JSON file as a single string, with quotes properly escaped.
ProjectId - the project ID from Google Cloud.
Region - the region where you have Vertex AI enabled (e.g., us-central1 or europe-west1).
BucketName - the name of the Cloud Storage bucket you created.

Gemini models

The recommended models are gemini-flash and gemini-flash-lite. They are multimodal models that support text, images, and audio, while offering performance that is sufficient for most use cases.

Example `docker-compose.yml`

name: aiproxy_containers
services:
  ai-proxy:
    image: webconbps/aiproxy:1.0.0.235
    container_name: ai-proxy
    restart: unless-stopped
    ports:
      - "5298:8080"
      - "7033:8081"
    environment:
      - ASPNETCORE_ENVIRONMENT=Production
      - AppConfiguration__SelfHosted__Certificate__Path=/app/https/certificate.pem
      - Logging__LogLevel__Default=Information
      - Logging__LogLevel__Microsoft=Warning
    volumes:
      - ./certificates/certificate.pem:/app/https/certificate.pem:ro
      - ./aiconfiguration.json:/app/aiconfiguration.json:ro

Step 3: Startup

# Make sure you have prepared files:
# - ./certificates/certificate.pem
# - ./aiconfiguration.json (with full Service Account JSON contents)

# Run container
docker-compose up -d

# Check logs
docker-compose logs -f ai-proxy

Troubleshooting

Error: Permission denied / 403 Forbidden

Possible causes:

the Service Account does not have the required permissions,
the required APIs are not enabled in the project.

Solution:

# Check if Service Account has roles:
# - Vertex AI User
# - Storage Object Admin

# Check if APIs are enabled:
# - Vertex AI API
# - Cloud Storage API

# Restart container
docker-compose restart ai-proxy

Error: Invalid Service Account JSON

Possible causes:

the JSON value provided in ServiceAccount is not in a valid format,
quotation marks in the JSON content are not properly escaped.

Solution:

# ServiceAccount must be a JSON string with escaped quotes
# Example of correct format:
# "ServiceAccount": "{\"type\":\"service_account\",\"project_id\":\"my-project\"...}"

# You can use online tools to escape JSON string

Error: Bucket not found

Possible causes:

the bucket with the specified name does not exist,
the Service Account does not have permission to access the bucket.

Solution:

# Check if bucket exists in Cloud Storage
# Make sure Service Account has Storage Object Admin role
# Check if BucketName in configuration is correct

Popular Gemini models

The following models are recommended for use with AI Proxy.

Text and multimodal models:

gemini-2.0-flash-exp - the latest fast multimodal model, supporting text, images, and audio,
gemini-2.0-flash-lite - a lighter variant that still supports multimodal input,
gemini-1.5-flash - a proven multimodal model suitable for a wide range of use cases,
gemini-1.5-pro - a larger model that offers broader capabilities and stronger performance in more demanding scenarios.

Embedding models:

text-embedding-004 - the current model for generating embeddings,
text-multilingual-embedding-002 - an embedding model with multilingual support.

Image generation models:

imagen-3.0-fast-generate-001 - a model optimized for fast image generation,
imagen-3.0-generate-001 - a model focused on higher image quality.

Multimodal models

Models from the Gemini Flash family are multimodal, which means they can process text, images, and audio within a single model. This simplifies configuration and makes integration with different types of input more consistent.

When to choose Google Vertex AI​

Requirements​

Step 1: Prepare your Google Cloud environment​

1. Create a Service Account​

2. Enable the required APIs​

3. Create a Cloud Storage bucket​

Step 2: Configure AI Proxy​

Example aiconfiguration.json​

Example docker-compose.yml​

Step 3: Startup​

Troubleshooting​

Error: Permission denied / 403 Forbidden​

Error: Invalid Service Account JSON​

Error: Bucket not found​

Popular Gemini models​

When to choose Google Vertex AI

Requirements

Step 1: Prepare your Google Cloud environment

1. Create a Service Account

2. Enable the required APIs

3. Create a Cloud Storage bucket

Step 2: Configure AI Proxy

Example `aiconfiguration.json`

Example `docker-compose.yml`

Step 3: Startup

Troubleshooting

Error: Permission denied / 403 Forbidden

Error: Invalid Service Account JSON

Error: Bucket not found

Popular Gemini models