Fatskills
Practice. Master. Repeat.
Study Guide: Cloud ML - Azure AI Engineer Associate (Exam AI-102): Azure AI Vision (OCR, Spatial Analysis)
Source: https://www.fatskills.com/hesi/chapter/cloud-ml-cert-azure-ai-azure-ai-vision-ocr-spatial-analysis

Cloud ML - Azure AI Engineer Associate (Exam AI-102): Azure AI Vision (OCR, Spatial Analysis)

By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.

⏱️ ~7 min read

Azure_AI – Azure AI Vision (OCR, Spatial Analysis)

Azure AI Vision (OCR, Spatial Analysis) – AI-102 Exam-Ready Study Guide

What This Is

Azure AI Vision is a suite of pre-built computer vision APIs that extract text (OCR), analyze spatial relationships (people/object detection in video), and interpret visual content (image tagging, object detection). It’s critical in ML pipelines where unstructured image/video data must be converted into structured insights—e.g., automating invoice processing (OCR), monitoring retail foot traffic (Spatial Analysis), or validating IDs in banking. Unlike custom-trained models (e.g., Azure Custom Vision), these APIs require no training data and deploy in minutes, making them ideal for rapid prototyping or low-code solutions.

Key Terms & Services

Azure AI Vision (Computer Vision API): Microsoft’s managed service for image analysis (tagging, object detection, OCR, facial analysis). Best for pre-built models with no training required. Supports batch and real-time processing.
Read API (OCR): A specialized OCR endpoint within Azure AI Vision optimized for printed and handwritten text (e.g., receipts, forms, PDFs). Handles multi-language and complex layouts (tables, mixed fonts).
Spatial Analysis: A video analytics feature that detects people/objects in space (e.g., crowd density, social distancing, queue length). Uses RTSP streams (security cameras) or pre-recorded videos. Outputs JSON events (e.g., "Person entered zone A at 10:02 AM").
Azure Video Analyzer (AVA): A hybrid service (cloud + edge) for real-time video processing (e.g., object tracking, anomaly detection). Often paired with Spatial Analysis for IoT scenarios (e.g., smart cities).
Azure Form Recognizer: A document-focused OCR service with pre-built models for invoices, receipts, IDs, and business cards. Better than Read API for structured forms (e.g., extracting line items from a receipt).
Azure Cognitive Search: A search engine that indexes OCR output (e.g., PDFs, images) for full-text search. Often used with AI Vision to enable searchable archives (e.g., legal documents).
Azure IoT Edge: Deploys AI Vision models to edge devices (e.g., cameras, drones) for low-latency processing without cloud dependency. Critical for offline scenarios (e.g., oil rigs, ships).
Bounding Box: A rectangle (x,y coordinates) marking the location of detected text/objects in an image. Returned by OCR and Spatial Analysis for downstream processing (e.g., cropping, redacting).
Confidence Score: A 0–1 probability indicating how certain the model is about a prediction (e.g., "92% confidence this is the word 'Invoice'"). Used to filter low-quality results in production.
RTSP (Real-Time Streaming Protocol): A video streaming protocol used by cameras to send live feeds to Spatial Analysis or Video Analyzer. Requires on-premises or edge deployment for real-time processing.
Batch vs. Real-Time Processing:
Batch: Process large volumes of images/videos at once (e.g., nightly invoice processing). Uses Azure Storage + AI Vision batch endpoints.
Real-Time: Process individual frames/streams instantly (e.g., live retail analytics). Uses REST API calls or IoT Edge.

Step-by-Step / Process Flow

1. OCR with Azure AI Vision (Read API)

Scenario: Extract text from 10,000 scanned invoices stored in Azure Blob Storage.

Create an Azure AI Vision resource
Navigate to the Azure Portal-Create a Cognitive Services resource (select "Computer Vision").
Note the endpoint (e.g., https://<your-resource>.cognitiveservices.azure.com/) and API key.
Upload images to Azure Blob Storage
Create a Blob Storage container (e.g., invoices-input).
Upload PDFs/images (supported formats: JPEG, PNG, PDF, TIFF).
Call the Read API (Batch or Real-Time)
Batch (async):
- Use the POST /vision/v3.2/read/analyze endpoint with the Blob Storage SAS URL.
- Poll the GET /vision/v3.2/read/operations/{operationId} endpoint for results.
Real-Time (sync):
- Send a base64-encoded image or public URL to POST /vision/v3.2/read/analyze.
- Receive immediate JSON response with text + bounding boxes.
Process the OCR output
Parse the JSON to extract:
- Detected text (e.g., "Invoice #: 12345").
- Bounding boxes (for cropping/redaction).
- Confidence scores (filter low-confidence results).
Store results in Azure SQL Database or Cosmos DB for downstream use.
Automate with Azure Functions
Trigger a Function on new Blob uploads to call the Read API.
Use Durable Functions for async batch processing.

2. Spatial Analysis for Retail Foot Traffic

Scenario: Count customers entering a store and measure queue wait times using security cameras.

Set up Azure Video Analyzer (AVA)
Deploy an AVA resource in the Azure Portal.
Configure an IoT Edge device (e.g., NVIDIA Jetson) to run the AVA module near the camera.
Connect the camera feed
Configure the camera to stream via RTSP (e.g., rtsp://<camera-ip>/stream).
Register the camera in AVA and assign a topology (e.g., "RetailAnalytics").
Define spatial zones and rules
Use the AVA portal to draw:
- Entry/exit zones (e.g., "Front Door").
- Queue lines (e.g., "Checkout Line").
Set rules (e.g., "Alert if >5 people in queue for >10 minutes").
Deploy the Spatial Analysis model
Push the AVA module to the IoT Edge device.
The model processes video locally (no cloud dependency) and sends JSON events to Azure Event Hubs.
Visualize insights in Power BI
Connect Event Hubs to Azure Stream Analytics to aggregate data.
Build a Power BI dashboard showing:
- Peak foot traffic hours.
- Average queue wait time.
- Heatmaps of customer movement.

Common Mistakes

Mistake	Correction
Using Read API for structured forms (e.g., invoices).	Use Azure Form Recognizer instead—it’s optimized for key-value pairs (e.g., "Total: $100") and tables. Read API is better for unstructured text (e.g., books, signs).
Assuming Spatial Analysis works with cloud-only video.	Spatial Analysis requires RTSP streams or pre-recorded videos processed at the edge (IoT Edge). It cannot analyze videos stored in Blob Storage directly.
Ignoring confidence scores in OCR output.	Always filter results with low confidence (e.g., <80%) to avoid errors in downstream systems (e.g., billing). Use Azure Functions to auto-reject low-confidence extractions.
Deploying AI Vision models to edge without IoT Edge.	For offline/low-latency scenarios, use IoT Edge to deploy the model. The cloud API is for online-only processing.
Mixing up Azure AI Vision and Custom Vision.	- AI Vision: Pre-built models (no training needed). - Custom Vision: Train your own models (e.g., detect custom objects like "defective widgets").

Certification Exam Insights

Service Selection Traps
OCR: Know when to use Read API (general text) vs. Form Recognizer (structured documents).
Video Analytics: Spatial Analysis (people/object tracking) vs. Video Analyzer (custom models + edge deployment).
Search: Azure Cognitive Search (index OCR output) vs. AI Vision (extract text).
Key Constraints
Read API:
- Max 4 MB/image (for synchronous calls).
- Async batch supports up to 10,000 images per request.
Spatial Analysis:
- Requires RTSP (not HTTP video streams).
- IoT Edge is mandatory for real-time processing.
Tricky Scenarios
"Which service for handwritten notes in a PDF?"-Read API (Form Recognizer doesn’t support handwriting well).
"Which service for real-time queue monitoring in a store?"-Spatial Analysis + IoT Edge.
"Which service to search OCR’d documents?"-Azure Cognitive Search.
Cost Optimization
Batch processing is cheaper than real-time (pay per 1,000 transactions).
IoT Edge reduces cloud costs for video analytics (process locally, send only events to cloud).

Quick Check Questions

A retail chain wants to analyze customer movement in stores using existing security cameras. The solution must work offline during internet outages. Which Azure service should they use?
Answer: Azure Video Analyzer (AVA) + IoT Edge
Explanation: Spatial Analysis requires edge deployment for offline/real-time processing. IoT Edge runs the model locally on the camera’s network.
A company needs to extract line items from 50,000 PDF invoices stored in Azure Blob Storage. Which service offers the highest accuracy for this task?
Answer: Azure Form Recognizer
Explanation: Form Recognizer is optimized for structured documents (invoices, receipts) and extracts key-value pairs (e.g., "Item: Laptop, Price: $999").
A developer is building a mobile app that scans handwritten notes in real-time. Which Azure service should they call from the app?
Answer: Azure AI Vision (Read API)
Explanation: The Read API supports handwritten text and offers real-time REST endpoints for mobile apps.

Last-Minute Cram Sheet

Azure AI Vision = Pre-built image/video APIs (OCR, object detection, facial analysis).
Read API = Best for general OCR (printed/handwritten text, PDFs, images).
Form Recognizer = Best for structured documents (invoices, receipts, IDs).
Spatial Analysis = Detects people/objects in video (crowds, queues, social distancing).
Spatial Analysis requires RTSP + IoT Edge for real-time processing.
Max image size for Read API (sync): 4 MB. Use async batch for larger files.
Confidence scores (0–1) filter low-quality OCR results. Always check!
Azure Cognitive Search indexes OCR output for full-text search.
IoT Edge deploys AI Vision models to edge devices (cameras, drones).
Custom Vision = Train your own models; AI Vision = Use pre-built models. Don’t mix them up!

⚡ Recently practiced quizzes in this class

Machine Learning Test Machine Learning: Recommendation Systems Questions Machine Learning 101 Practice Test: Linear Regression Machine Learning Basics Knowledge Test Machine Learning 101 Practice Test: Fundamental Theorem of PAC Learning Machine Learning 101 Practice Test: Kernels And Kernel Trick Machine Learning 101 Practice Test: K-Nearest Neighbor Algorithm and Nearest Neighbor Analysis Machine Learning 101 Practice Test: Neural Networks in Machine Learning Machine Learning 101 Practice Test: Decision Trees Machine Learning 101 Practice Test: Version Spaces, Find-S Algorithm And Candidate Elimination Algorithm

➡️ Next Study Guide

Cloud ML - Azure AI Engineer Associate (Exam AI-102): Azure AI Vision (OCR, Spatial Analysis)

Azure_AI – Azure AI Vision (OCR, Spatial Analysis)

Azure AI Vision (OCR, Spatial Analysis) – AI-102 Exam-Ready Study Guide

What This Is

Key Terms & Services

Step-by-Step / Process Flow

1. OCR with Azure AI Vision (Read API)

2. Spatial Analysis for Retail Foot Traffic

Common Mistakes

Certification Exam Insights

Quick Check Questions

Last-Minute Cram Sheet

❤ If you liked Fatskills, consider supporting us by checking out The Life Manuals You Never Got.

About | Explore | User Guide | Topics | Subjects | Doubt Solver | Career Aptitude Test | Answers | Free Tools | What Should We Know?
Privacy | Terms |

Without work one finishes nothing. - Ralph Waldo Emerson
© 2026 Fatskills.com

All trademarks, logos and brand names are the property of their respective owners. All company, product and service names used in this website are for identification purposes only. Use of these names, trademarks and brands does not imply endorsement.

Cloud ML - Azure AI Engineer Associate (Exam AI-102): Azure AI Vision (OCR, Spatial Analysis)

Azure_AI – Azure AI Vision (OCR, Spatial Analysis)

Azure AI Vision (OCR, Spatial Analysis) – AI-102 Exam-Ready Study Guide

What This Is

Key Terms & Services

Step-by-Step / Process Flow

1. OCR with Azure AI Vision (Read API)

2. Spatial Analysis for Retail Foot Traffic

Common Mistakes

Certification Exam Insights

Quick Check Questions

Last-Minute Cram Sheet

❤ If you liked Fatskills, consider supporting us by checking out The Life Manuals You Never Got.

About | Explore | User Guide | Topics | Subjects | Doubt Solver | Career Aptitude Test | Answers | Free Tools | What Should We Know? Privacy | Terms |

Without work one finishes nothing. - Ralph Waldo Emerson© 2026 Fatskills.com

All trademarks, logos and brand names are the property of their respective owners. All company, product and service names used in this website are for identification purposes only. Use of these names, trademarks and brands does not imply endorsement.

About | Explore | User Guide | Topics | Subjects | Doubt Solver | Career Aptitude Test | Answers | Free Tools | What Should We Know?
Privacy | Terms |

Without work one finishes nothing. - Ralph Waldo Emerson
© 2026 Fatskills.com