Fatskills
Practice. Master. Repeat.
Study Guide: Cloud ML - Azure AI Engineer Associate (Exam AI-102): Video Indexer (Face, Transcription, Sentiment, Scene Detection)
Source: https://www.fatskills.com/hesi/chapter/cloud-ml-cert-azure-ai-video-indexer-face-transcription-sentiment-scene-detection

Cloud ML - Azure AI Engineer Associate (Exam AI-102): Video Indexer (Face, Transcription, Sentiment, Scene Detection)

By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.

⏱️ ~7 min read

Azure_AI – Video Indexer (Face, Transcription, Sentiment, Scene Detection)

Azure AI-102 Study Guide: Video Indexer (Face, Transcription, Sentiment, Scene Detection)

What This Is

Azure Video Indexer (VI) is a cloud-based AI service that extracts deep insights from video and audio content—including faces, speech-to-text (transcription), sentiment, emotions, topics, and scene changes—without requiring ML expertise. It’s critical in media analytics, compliance monitoring, content moderation, and accessibility pipelines (e.g., automatically generating subtitles, detecting inappropriate content, or analyzing customer sentiment in call center recordings). For example, a news agency could use VI to auto-tag videos with named entities (people, locations), detect emotional tone in interviews, and generate searchable transcripts for archival purposes.


Key Terms & Services

  • Azure Video Indexer (VI): Microsoft’s pre-built AI service for video/audio analysis. Extracts faces, transcripts, sentiment, keywords, scenes, and OCR from videos. Best for batch or real-time processing of media files (MP4, WAV, etc.).

  • Face Detection & Identification: VI detects faces in video frames and can match them against a custom face list (e.g., celebrities, employees). Uses Azure Face API under the hood but simplifies integration.

  • Speech-to-Text (Transcription): Converts spoken words into time-stamped text with speaker diarization (who spoke when). Supports multiple languages and custom vocabularies (e.g., medical/legal terms).

  • Sentiment & Emotion Analysis: Detects positive/negative/neutral sentiment and emotions (happy, sad, angry) from speech and facial expressions. Useful for customer experience analytics.

  • Scene & Shot Detection: Identifies scene changes (cuts, fades) and key frames in videos. Helps in video summarization (e.g., generating thumbnails or highlights).

  • Optical Character Recognition (OCR): Extracts text from video frames (e.g., signs, captions, subtitles). Useful for compliance monitoring (e.g., detecting logos or trademarks).

  • Custom Models (Custom Vision + Speech): VI can integrate with Azure Custom Vision (for custom object detection) and Custom Speech (for domain-specific transcription). Example: detecting company logos in ads.

  • Video Indexer API & Widget:

  • REST API: Programmatic access to upload, analyze, and retrieve insights.
  • Widget: Embeddable UI for searching and playing indexed videos (e.g., in a CMS).

  • Azure Blob Storage Integration: VI reads videos from Blob Storage (or uploads directly) and stores results in JSON format. Supports private/secure access via SAS tokens.

  • Azure Cognitive Services Dependencies: VI relies on Azure Speech, Face, and Text Analytics but abstracts complexity. Example: Sentiment analysis uses Text Analytics API.

  • Pricing Model:

  • Pay-as-you-go (per minute of video processed).
  • Free tier (limited minutes/month).
  • No upfront costs (unlike training custom models).

  • Compliance & Privacy:

  • GDPR/CCPA compliant (data processed in chosen Azure region).
  • No face data stored unless explicitly enabled for identification.

Step-by-Step / Process Flow

1. Set Up Azure Video Indexer

  • Create an Azure Video Indexer account in the Azure Portal.
  • Choose Free (trial) or Paid tier.
  • Select region (e.g., West US for low latency).
  • Link to Azure Blob Storage (optional but recommended for large-scale processing).
  • Create a Storage Account-Container-SAS token (for secure access).

2. Upload & Analyze a Video

  • Option 1: Portal UI
  • Go to Video Indexer Portal-Upload-Select video.
  • Choose analysis options (e.g., "Faces," "Transcription," "Sentiment").
  • Click Index and wait for processing (status: Processing-Processed).

  • Option 2: API (Programmatic) ```bash # Get an access token curl -X POST "https://api.videoindexer.ai/auth//Accounts//AccessToken" \ -H "Ocp-Apim-Subscription-Key: "

# Upload a video curl -X POST "https://api.videoindexer.ai//Accounts//Videos" \ -H "Authorization: Bearer " \ -F "[email protected]" \ -F "name=MyVideo" \ -F "privacy=Private" ```

3. Retrieve & Use Insights

  • View results in the Portal UI (timeline with faces, transcripts, sentiment).
  • Download JSON output (via API or Portal) for further processing. json { "videos": [{ "insights": { "transcript": [{ "text": "Hello world", "speakerId": 1 }], "faces": [{ "name": "John Doe", "appearances": [...] }], "sentiments": [{ "averageScore": 0.8, "sentimentType": "Positive" }] } }] }
  • Integrate with downstream apps:
  • Power BI (for sentiment dashboards).
  • Azure Cognitive Search (for video search).
  • Custom ML pipelines (e.g., feeding transcripts into a Language Understanding (LUIS) model).

4. Customize & Enhance (Optional)

  • Add a custom face list (for face identification):
  • Upload images of known people (e.g., employees) via Azure Face API.
  • Link the face list to VI in the Portal-Settings-People.
  • Use a custom speech model (for domain-specific terms):
  • Train a Custom Speech model in Azure Speech Studio.
  • Reference it in VI via API parameters.
  • Enable OCR for text extraction:
  • Toggle OCR in analysis settings (e.g., for detecting subtitles).

5. Automate with Azure Functions (Advanced)

  • Trigger VI analysis when a new video is uploaded to Blob Storage:
  • Create an Azure Function with a Blob Storage trigger.
  • Call the VI API to start indexing.
  • Store results in Cosmos DB or Azure SQL for analytics.

Common Mistakes

Mistake Correction
Assuming VI can train custom models VI uses pre-built models (Face, Speech, Text Analytics). For custom models, use Azure Custom Vision or Custom Speech and integrate via API.
Not enabling speaker diarization By default, VI does not separate speakers. Enable speaker diarization in settings to distinguish "Speaker 1" vs. "Speaker 2".
Ignoring region-specific compliance VI processes data in the selected Azure region. For GDPR, choose a EU region (e.g., West Europe).
Overlooking cost for long videos VI charges per minute of video. A 2-hour movie (~120 mins) costs ~$12 (at $0.10/min). Use free tier for testing.
Forgetting to secure API keys VI API keys should not be hardcoded. Use Azure Key Vault or Managed Identity for secure access.

Certification Exam Insights

What the AI-102 Exam Tests

  1. Service Selection Traps
  2. When to use Video Indexer vs. Azure Media Services (AMS) vs. Azure Cognitive Services:

    • Video Indexer: Best for AI-powered insights (faces, transcripts, sentiment).
    • Azure Media Services: Best for video encoding, streaming, and DRM (not AI).
    • Azure Cognitive Services (Face, Speech, Text Analytics): Best for custom ML pipelines (VI is a wrapper around these).
  3. Key Constraints

  4. Video length limit: 4 hours max (longer videos must be split).
  5. Supported formats: MP4, MOV, WMV, AVI, WAV (but not MKV or FLV).
  6. Face identification limit: 1M faces per account (for custom face lists).

  7. Tricky Scenarios

  8. "Which service extracts text from video frames?"
    • Answer: Video Indexer (OCR feature) (not Azure Form Recognizer, which is for documents).
  9. "How do you analyze sentiment in a video call recording?"
    • Answer: Use Video Indexer’s sentiment analysis (not Text Analytics alone, since it needs audio).
  10. "How do you detect a specific person in a video?"

    • Answer: Upload a custom face list to Video Indexer (not just Face API, since VI handles the video context).
  11. Cost Optimization

  12. Free tier: 600 minutes/month (enough for small projects).
  13. Batch processing: Upload multiple videos at once to reduce API calls.
  14. Storage costs: VI does not store videos by default (only metadata). Use Blob Storage lifecycle policies to delete old videos.

Quick Check Questions

Question 1

A media company wants to automatically generate subtitles, detect faces of actors, and analyze emotional tone in thousands of archived videos. They need a fully managed solution with minimal ML expertise. Which Azure service should they use?

Answer: Azure Video Indexer ? Explanation: VI provides pre-built AI for transcription, face detection, and sentiment analysis without requiring custom model training.


Question 2

A call center wants to analyze customer sentiment in recorded calls and identify which agent spoke when. They need speaker separation and time-stamped transcripts. Which Video Indexer feature should they enable?

Answer: Speaker Diarization ? Explanation: Speaker diarization distinguishes between speakers (e.g., "Agent" vs. "Customer") in the transcript.


Question 3

A security team needs to detect unauthorized personnel in surveillance footage by matching faces against a database of employees. Which combination of services should they use?

Answer: Azure Video Indexer + Azure Face API (Custom Face List) ? Explanation: VI detects faces in video, while Face API’s custom face list allows matching against known employees.


Last-Minute Cram Sheet

  1. Video Indexer = Pre-built AI for video/audio insights (faces, transcripts, sentiment, scenes).
  2. Supports MP4, MOV, WAV ( not MKV/FLV).
  3. Max video length: 4 hours (split longer videos).
  4. Free tier: 600 minutes/month (enough for small projects).
  5. Speaker diarization must be enabled (not on by default).
  6. OCR extracts text from video frames (e.g., subtitles, signs).
  7. Custom face lists require Azure Face API (VI integrates with it).
  8. Sentiment analysis uses Azure Text Analytics under the hood.
  9. Not for custom model training (use Custom Vision/Speech instead).
  10. Region matters for compliance (GDPR-use EU regions).

Next Steps: - Try the Video Indexer Portal with a sample video. - Review the Video Indexer API docs. - Practice integrating VI with Azure Functions for automation.