Google Vertex AI
Google Vertex AI provides access to the Gemini family of models — advanced multimodal models that support text, images, audio, and video. Gemini models offer large context windows, reaching up to 2 million tokens in some versions, while also delivering high performance and competitive pricing.
When to choose Google Vertex AI
Large documents and long-context scenarios:
- analysis of multi-page documents,
- processing long conversations and extended histories,
- working with context spanning entire knowledge bases.
Multimedia processing:
- image analysis and object detection,
- audio transcription and analysis,
- video processing.
Cost optimization:
- Gemini Flash models offer an excellent price-to-performance ratio,
- lower costs for high query volumes.
GCP integration:
- you are already using Google Cloud Platform,
- you need RAG capabilities based on Vertex AI Search.
Requirements
- a Google Cloud Platform (GCP) account,
- a GCP project with the Vertex AI API enabled,
- a Service Account Key in JSON format,
- a Google Cloud Storage bucket for file processing.
Step 1: Prepare your Google Cloud environment
1. Create a Service Account
- Go to Google Cloud Console.
- Select an existing project or create a new one.
- Go to IAM & Admin > Service Accounts.
- Click Create Service Account.
- Enter a name (e.g.,
aiproxy-vertex). - Assign the following roles:
Vertex AI User,Storage Object Admin(for the bucket).
- Click Create key > JSON and download the key file.
2. Enable the required APIs
- Go to APIs & Services > Library.
- Enable the following APIs:
- Vertex AI API,
- Cloud Storage API.
3. Create a Cloud Storage bucket
- Go to Cloud Storage > Buckets.
- Click Create bucket.
- Enter a name (e.g.,
aiproxy-files). - Select a region (e.g.,
us-central1). - Click Create.
A Cloud Storage bucket is required for processing files such as images, audio, and documents with Gemini models.
Step 2: Configure AI Proxy
Example aiconfiguration.json
{
"ProviderConnections": {
"GoogleVertex": {
"Description": "Google Vertex AI Connection",
"Type": "Gemini",
"ProviderConfiguration": {
"ApiKey": "your-google-api-key-if-available",
"ServiceAccount": "{\"type\":\"service_account\",\"project_id\":\"your-project\",\"private_key_id\":\"...\",\"private_key\":\"-----BEGIN PRIVATE KEY-----\\n...\\n-----END PRIVATE KEY-----\\n\",\"client_email\":\"aiproxy-vertex@your-project.iam.gserviceaccount.com\",\"client_id\":\"...\",\"auth_uri\":\"https://accounts.google.com/o/oauth2/auth\",\"token_uri\":\"https://oauth2.googleapis.com/token\"}",
"ProjectId": "your-gcp-project-id",
"Region": "us-central1",
"BucketName": "aiproxy-files"
}
}
},
"ProviderModels": [
{
"ConnectionName": "GoogleVertex",
"Priority": 100,
"Name": "Gemini Flash",
"Description": "",
"TextModel": {
"ModelName": "gemini-2.0-flash-exp"
},
"ImageModel": {
"ModelName": "imagen-3.0-fast-generate-001"
},
"AudioModel": {
"ModelName": "gemini-2.0-flash-exp"
},
"EmbeddingModel": {
"ModelName": "text-embedding-004"
}
}
],
"MethodTypesConfiguration": {
"ConciergePrompt": [ "Gemini Flash" ],
"ConciergeExecuteTool": [ "Gemini Flash" ]
}
}
- ServiceAccount - paste the full contents of the downloaded JSON file as a single string, with quotes properly escaped.
- ProjectId - the project ID from Google Cloud.
- Region - the region where you have Vertex AI enabled (e.g.,
us-central1oreurope-west1). - BucketName - the name of the Cloud Storage bucket you created.
The recommended models are gemini-flash and gemini-flash-lite. They are multimodal models that support text, images, and audio, while offering performance that is sufficient for most use cases.
Example docker-compose.yml
name: aiproxy_containers
services:
ai-proxy:
image: webconbps/aiproxy:1.0.0.235
container_name: ai-proxy
restart: unless-stopped
ports:
- "5298:8080"
- "7033:8081"
environment:
- ASPNETCORE_ENVIRONMENT=Production
- AppConfiguration__SelfHosted__Certificate__Path=/app/https/certificate.pem
- Logging__LogLevel__Default=Information
- Logging__LogLevel__Microsoft=Warning
volumes:
- ./certificates/certificate.pem:/app/https/certificate.pem:ro
- ./aiconfiguration.json:/app/aiconfiguration.json:ro
Step 3: Startup
# Make sure you have prepared files:
# - ./certificates/certificate.pem
# - ./aiconfiguration.json (with full Service Account JSON contents)
# Run container
docker-compose up -d
# Check logs
docker-compose logs -f ai-proxy
Troubleshooting
Error: Permission denied / 403 Forbidden
Possible causes:
- the Service Account does not have the required permissions,
- the required APIs are not enabled in the project.
Solution:
# Check if Service Account has roles:
# - Vertex AI User
# - Storage Object Admin
# Check if APIs are enabled:
# - Vertex AI API
# - Cloud Storage API
# Restart container
docker-compose restart ai-proxy
Error: Invalid Service Account JSON
Possible causes:
- the JSON value provided in ServiceAccount is not in a valid format,
- quotation marks in the JSON content are not properly escaped.
Solution:
# ServiceAccount must be a JSON string with escaped quotes
# Example of correct format:
# "ServiceAccount": "{\"type\":\"service_account\",\"project_id\":\"my-project\"...}"
# You can use online tools to escape JSON string
Error: Bucket not found
Possible causes:
- the bucket with the specified name does not exist,
- the Service Account does not have permission to access the bucket.
Solution:
# Check if bucket exists in Cloud Storage
# Make sure Service Account has Storage Object Admin role
# Check if BucketName in configuration is correct
Popular Gemini models
The following models are recommended for use with AI Proxy.
Text and multimodal models:
- gemini-2.0-flash-exp - the latest fast multimodal model, supporting text, images, and audio,
- gemini-2.0-flash-lite - a lighter variant that still supports multimodal input,
- gemini-1.5-flash - a proven multimodal model suitable for a wide range of use cases,
- gemini-1.5-pro - a larger model that offers broader capabilities and stronger performance in more demanding scenarios.
Embedding models:
- text-embedding-004 - the current model for generating embeddings,
- text-multilingual-embedding-002 - an embedding model with multilingual support.
Image generation models:
- imagen-3.0-fast-generate-001 - a model optimized for fast image generation,
- imagen-3.0-generate-001 - a model focused on higher image quality.
Models from the Gemini Flash family are multimodal, which means they can process text, images, and audio within a single model. This simplifies configuration and makes integration with different types of input more consistent.