By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.
Azure AI Vision is a suite of pre-built computer vision APIs that extract text (OCR), analyze spatial relationships (people/object detection in video), and interpret visual content (image tagging, object detection). It’s critical in ML pipelines where unstructured image/video data must be converted into structured insights—e.g., automating invoice processing (OCR), monitoring retail foot traffic (Spatial Analysis), or validating IDs in banking. Unlike custom-trained models (e.g., Azure Custom Vision), these APIs require no training data and deploy in minutes, making them ideal for rapid prototyping or low-code solutions.
Azure AI Vision (Computer Vision API): Microsoft’s managed service for image analysis (tagging, object detection, OCR, facial analysis). Best for pre-built models with no training required. Supports batch and real-time processing.
Read API (OCR): A specialized OCR endpoint within Azure AI Vision optimized for printed and handwritten text (e.g., receipts, forms, PDFs). Handles multi-language and complex layouts (tables, mixed fonts).
Spatial Analysis: A video analytics feature that detects people/objects in space (e.g., crowd density, social distancing, queue length). Uses RTSP streams (security cameras) or pre-recorded videos. Outputs JSON events (e.g., "Person entered zone A at 10:02 AM").
Azure Video Analyzer (AVA): A hybrid service (cloud + edge) for real-time video processing (e.g., object tracking, anomaly detection). Often paired with Spatial Analysis for IoT scenarios (e.g., smart cities).
Azure Form Recognizer: A document-focused OCR service with pre-built models for invoices, receipts, IDs, and business cards. Better than Read API for structured forms (e.g., extracting line items from a receipt).
Azure Cognitive Search: A search engine that indexes OCR output (e.g., PDFs, images) for full-text search. Often used with AI Vision to enable searchable archives (e.g., legal documents).
Azure IoT Edge: Deploys AI Vision models to edge devices (e.g., cameras, drones) for low-latency processing without cloud dependency. Critical for offline scenarios (e.g., oil rigs, ships).
Bounding Box: A rectangle (x,y coordinates) marking the location of detected text/objects in an image. Returned by OCR and Spatial Analysis for downstream processing (e.g., cropping, redacting).
Confidence Score: A 0–1 probability indicating how certain the model is about a prediction (e.g., "92% confidence this is the word 'Invoice'"). Used to filter low-quality results in production.
RTSP (Real-Time Streaming Protocol): A video streaming protocol used by cameras to send live feeds to Spatial Analysis or Video Analyzer. Requires on-premises or edge deployment for real-time processing.
Batch vs. Real-Time Processing:
Scenario: Extract text from 10,000 scanned invoices stored in Azure Blob Storage.
Note the endpoint (e.g., https://<your-resource>.cognitiveservices.azure.com/) and API key.
https://<your-resource>.cognitiveservices.azure.com/
Upload images to Azure Blob Storage
invoices-input
Upload PDFs/images (supported formats: JPEG, PNG, PDF, TIFF).
Call the Read API (Batch or Real-Time)
POST /vision/v3.2/read/analyze
GET /vision/v3.2/read/operations/{operationId}
Real-Time (sync):
Process the OCR output
"Invoice #: 12345"
Store results in Azure SQL Database or Cosmos DB for downstream use.
Automate with Azure Functions
Scenario: Count customers entering a store and measure queue wait times using security cameras.
Configure an IoT Edge device (e.g., NVIDIA Jetson) to run the AVA module near the camera.
Connect the camera feed
rtsp://<camera-ip>/stream
Register the camera in AVA and assign a topology (e.g., "RetailAnalytics").
Define spatial zones and rules
Set rules (e.g., "Alert if >5 people in queue for >10 minutes").
Deploy the Spatial Analysis model
The model processes video locally (no cloud dependency) and sends JSON events to Azure Event Hubs.
Visualize insights in Power BI
Search: Azure Cognitive Search (index OCR output) vs. AI Vision (extract text).
Key Constraints
Spatial Analysis:
Tricky Scenarios
"Which service to search OCR’d documents?"-Azure Cognitive Search.
Cost Optimization
Explanation: Spatial Analysis requires edge deployment for offline/real-time processing. IoT Edge runs the model locally on the camera’s network.
A company needs to extract line items from 50,000 PDF invoices stored in Azure Blob Storage. Which service offers the highest accuracy for this task?
Explanation: Form Recognizer is optimized for structured documents (invoices, receipts) and extracts key-value pairs (e.g., "Item: Laptop, Price: $999").
A developer is building a mobile app that scans handwritten notes in real-time. Which Azure service should they call from the app?
Join 4M+ learners. Unlock unlimited quizzes, wrong-answer tracking, flashcards + reminders, study guides, and 1-on-1 challenges.