Fatskills
Practice. Master. Repeat.
Study Guide: Cloud ML - Azure AI Engineer Associate (Exam AI-102): Custom Vision (Classification vs. Object Detection, Training, Exporting)
Source: https://www.fatskills.com/hesi/chapter/cloud-ml-cert-azure-ai-custom-vision-classification-vs-object-detection-training-exporting

Cloud ML - Azure AI Engineer Associate (Exam AI-102): Custom Vision (Classification vs. Object Detection, Training, Exporting)

By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.

⏱️ ~7 min read

Azure_AI – Custom Vision (Classification vs Object Detection, Training, Exporting)

Azure AI-102 Study Guide: Custom Vision (Classification vs. Object Detection, Training, Exporting)

What This Is

Azure Custom Vision is a no-code/low-code computer vision service that lets you train, deploy, and export custom image classification and object detection models. It’s critical in ML pipelines where pre-built models (like Azure Computer Vision) don’t fit domain-specific needs—e.g., identifying defective parts on a manufacturing line, classifying retail products by brand, or detecting safety gear in construction sites. Unlike training from scratch (which requires deep ML expertise), Custom Vision abstracts away infrastructure, hyperparameter tuning, and model optimization, letting engineers focus on data and business logic.


Key Terms & Services

  • Custom Vision Service (Azure): Azure’s managed service for training image classification (labeling entire images) and object detection (finding and labeling objects within images) models. Best for quick prototyping and edge deployment (via ONNX, TensorFlow, or Docker containers).

  • Image Classification: Assigns a single label to an entire image (e.g., "cat" or "dog"). Used when the whole image represents one concept (e.g., medical X-ray diagnosis, product categorization).

  • Object Detection: Identifies multiple objects in an image, drawing bounding boxes around them and labeling each (e.g., detecting helmets, vests, and tools in a construction photo). Used when spatial location matters (e.g., autonomous vehicles, retail shelf audits).

  • Training vs. Prediction (Inference) Resources:

  • Training Resource: Used to train a model (billed per compute hour).
  • Prediction Resource: Used to deploy a model for inference (billed per API call or container runtime).

  • ONNX (Open Neural Network Exchange): An open format for model interoperability. Custom Vision exports models in ONNX for edge deployment (e.g., IoT devices, mobile apps).

  • TensorFlow Lite: A lightweight version of TensorFlow for mobile/embedded devices. Custom Vision exports object detection models in this format for Android/iOS apps.

  • Docker Container Export: Custom Vision provides pre-built Docker images for deploying models on Azure Container Instances (ACI), Azure Kubernetes Service (AKS), or on-premises servers.

  • Precision/Recall/F1 Score:

  • Precision: % of predicted positives that are correct (e.g., "How many detected helmets are actually helmets?").
  • Recall: % of actual positives correctly predicted (e.g., "Did we miss any helmets?").
  • F1 Score: Harmonic mean of precision and recall (best for imbalanced datasets).

  • Active Learning: Custom Vision suggests images to label next based on model uncertainty, improving training efficiency.

  • Domain-Specific Models: Pre-trained models optimized for specific scenarios (e.g., "Retail," "Landmarks," "Food"). Use these to reduce training data needs.

  • Azure IoT Edge: Deploys Custom Vision models to edge devices (e.g., cameras, drones) for low-latency inference without cloud dependency.

  • Azure Machine Learning (Azure ML): While Custom Vision is no-code, Azure ML is used for advanced scenarios (e.g., custom training loops, hyperparameter tuning, or multi-modal models).


Step-by-Step / Process Flow

1. Choose Between Classification vs. Object Detection

  • Use Classification if:
  • The entire image represents one concept (e.g., "is this a defective part?").
  • You don’t need spatial location of objects.
  • Use Object Detection if:
  • You need to locate and label multiple objects (e.g., "where are the helmets and vests in this photo?").
  • You need bounding box coordinates for downstream tasks (e.g., counting objects, triggering actions).

2. Set Up Resources in Azure Portal

  1. Create a Custom Vision Resource:
  2. Go to Azure Portal-Create Resource-AI + Machine Learning-Custom Vision.
  3. Choose Training (for model training) and Prediction (for inference) resources.
  4. Select F0 (Free tier) for testing or S0 (Standard) for production.
  5. Note the Keys & Endpoints:
  6. After creation, go to Keys and Endpoint in the resource. You’ll need these for API calls.

3. Prepare & Upload Training Data

  1. Collect Images:
  2. Classification: 50+ images per label (e.g., 50 "defective," 50 "non-defective").
  3. Object Detection: 15+ images per object, with bounding box annotations (use Custom Vision’s web UI or VoTT for labeling).
  4. Upload to Custom Vision Portal:
  5. Go to Custom Vision Portal-New Project.
  6. Select Classification or Object Detection-Choose your domain (e.g., "Retail," "General").
  7. Upload images and tag them (for classification) or draw bounding boxes (for object detection).

4. Train & Evaluate the Model

  1. Start Training:
  2. Click Train-Choose Quick Training (faster, less accurate) or Advanced Training (slower, more accurate).
  3. Wait for training to complete (minutes to hours, depending on dataset size).
  4. Review Performance Metrics:
  5. Check Precision, Recall, and F1 Score per label.
  6. Goal: F1 > 0.8 for most use cases (adjust if too low).
  7. Test with New Images:
  8. Use the Quick Test feature to upload a new image and verify predictions.

5. Deploy the Model for Inference

  1. Publish the Model:
  2. Click Publish-Give it a name (e.g., "defect-detection-v1").
  3. Select your Prediction Resource (created in Step 2).
  4. Get the Prediction Endpoint & Key:
  5. After publishing, note the Prediction URL and Prediction Key (used for API calls).
  6. Test the API:
  7. Use Postman or Python SDK to send an image to the endpoint: python from azure.cognitiveservices.vision.customvision.prediction import CustomVisionPredictionClient predictor = CustomVisionPredictionClient("<Prediction-Key>", "<Prediction-Endpoint>") with open("test.jpg", "rb") as image: results = predictor.classify_image("<Project-ID>", "<Published-Model-Name>", image.read()) print(results.predictions)

6. Export the Model for Edge Deployment

  1. Export Options:
  2. ONNX: For Windows/Linux edge devices (e.g., Raspberry Pi, NVIDIA Jetson).
  3. TensorFlow Lite: For mobile apps (Android/iOS).
  4. Docker Container: For cloud/on-prem deployment (ACI, AKS, or local servers).
  5. Download the Model:
  6. Go to Performance-Export-Choose format-Download.
  7. Deploy to Edge:
  8. For ONNX/TensorFlow Lite, integrate into your app (e.g., using OpenCV for object detection).
  9. For Docker, run: bash docker run -p 80:80 -e PredictionEndpoint=<Endpoint> -e PredictionKey=<Key> <CustomVision-Container-Image>

Common Mistakes

Mistake Correction
Using Classification for Object Detection (or vice versa). Classification labels the whole image; object detection labels objects within the image. If you need bounding boxes, use object detection.
Not using a domain-specific model. Domains (e.g., "Retail," "Landmarks") reduce training data needs. Always check if a domain fits your use case.
Ignoring precision/recall tradeoffs. If false positives are costly (e.g., medical diagnosis), optimize for precision. If false negatives are costly (e.g., safety gear detection), optimize for recall.
Deploying to a Prediction Resource without publishing. You must publish a trained model before deploying it to a Prediction Resource.
Exporting a model without testing inference first. Always test the API endpoint before exporting to ensure the model behaves as expected.

Certification Exam Insights

  1. Service Selection Trap:
  2. Custom Vision vs. Azure Computer Vision:
    • Use Custom Vision for custom labels (e.g., "defective parts").
    • Use Azure Computer Vision for pre-built models (e.g., OCR, celebrity recognition).
  3. Custom Vision vs. Azure ML:

    • Use Custom Vision for no-code/low-code image tasks.
    • Use Azure ML for custom training loops, multi-modal models, or non-vision tasks.
  4. Key Constraints:

  5. Free Tier (F0): Limited to 2 projects, 5,000 training images, and 10,000 predictions/month.
  6. Standard Tier (S0): Supports unlimited projects, 1M training images, and 10M predictions/month.
  7. Export Limits: Free tier allows 1 export/month; Standard tier allows unlimited exports.

  8. Tricky Scenario:

  9. Question: "A retail company wants to detect misplaced products on shelves in real-time. Which Azure service should they use?"
  10. Answer: Custom Vision (Object Detection) because they need bounding boxes to locate products.
  11. Distractor: Azure Computer Vision (doesn’t support custom labels).

  12. Edge Deployment Gotchas:

  13. ONNX vs. TensorFlow Lite:
    • Use ONNX for Windows/Linux edge devices.
    • Use TensorFlow Lite for mobile apps.
  14. Docker Containers: Require Prediction Resource keys at runtime (not just the model file).

Quick Check Questions

  1. A manufacturing plant needs to classify images of parts as "defective" or "non-defective" with minimal ML expertise. Which Azure service should they use?
  2. Answer: Custom Vision (Classification). It’s a no-code solution for custom image labeling.
  3. Why? Azure Computer Vision doesn’t support custom labels, and Azure ML is overkill for this use case.

  4. A construction company wants to detect safety gear (helmets, vests) in real-time from camera feeds. They need bounding box coordinates for each item. Which Custom Vision feature should they use?

  5. Answer: Object Detection. It provides bounding boxes and labels for multiple objects in an image.
  6. Why? Classification only labels the whole image, not individual objects.

  7. A mobile app team wants to deploy a Custom Vision model to an Android app. Which export format should they use?

  8. Answer: TensorFlow Lite. It’s optimized for mobile devices.
  9. Why? ONNX is better for Windows/Linux edge devices, not mobile apps.

Last-Minute Cram Sheet

  1. Custom Vision = No-code/low-code for custom image models.
  2. Classification = 1 label per image; Object Detection = multiple labels + bounding boxes.
  3. Domains (e.g., "Retail," "Landmarks") reduce training data needs.
  4. F0 (Free) = 2 projects, 5K training images; S0 (Standard) = unlimited.
  5. Publish model-Deploy to Prediction Resource-Get API endpoint.
  6. Export formats: ONNX (edge), TensorFlow Lite (mobile), Docker (cloud/on-prem).
  7. Free tier allows only 1 export/month.
  8. Object Detection requires bounding box annotations (not just tags).
  9. Always test the API endpoint before exporting.
  10. Azure Computer Vision-Custom Vision (pre-built vs. custom labels).