Here are the main types of predictions in Google Vertex AI, based on how models are deployed and used:
1) Online Predictions (Real-time)
Used when you need an immediate response from the model.
Best for:
- Chatbots
- Recommendation systems
- Fraud detection
- Real-time personalization
How it works:
- You send a request → Vertex AI returns a prediction instantly (milliseconds).
Example use case:
You enter text into a chatbot, and it responds immediately.
2) Batch Predictions (Offline)
Used when you have a large dataset and don’t need instant results.
Best for:
- Scoring large datasets
- Monthly/weekly analytics
- Data processing jobs
How it works:
- You upload a file (e.g., CSV in Cloud Storage)
- Vertex AI processes it in bulk
- You get results as another file
Example use case:
Predict churn probability for 1 million customers overnight.
3) Streaming Predictions
Used when predictions must be made continuously on incoming data.
Best for:
- IoT (Internet of Things)
- Real-time event processing
- Live data feeds
How it works:
- Data flows in (e.g., from Pub/Sub)
- Vertex AI makes predictions in real-time
Example use case:
Predict equipment failure from live sensor data.
4) AutoML Predictions
Predictions from models trained using Vertex AI AutoML (no coding required).
Types include:
- AutoML Tables
- AutoML Vision
- AutoML Text
- AutoML Video
Best for:
Business users or teams without deep ML expertise.
5) Custom Model Predictions
Predictions from models you train yourself (TensorFlow, PyTorch, Scikit-learn, etc.).
Best for:
- Advanced ML teams
- Research or complex use cases
- Highly customized AI models
You deploy your model to Vertex AI and then call it for predictions.
6) Foundation Model Predictions (Generative AI)
Using Google’s prebuilt models like:
- Gemini (text, chat, multimodal)
- Imagen (image generation)
- Codey (code generation)
Examples:
- Generate text
- Summarize documents
- Create images
- Answer questions
Simple Summary Table
Prediction Type
Use Case
Response Time
Online
Chatbots, real-time apps
Very fast
Batch
Large datasets
Slow
Streaming
Live data
Continuous
AutoML
No-code ML
Varies
Custom Model
Advanced ML
Varies
Foundation Model
Generative AI
Fast
From Blogger iPhone client