Fatskills
Practice. Master. Repeat.
Study Guide: AI Literacy: Embeddings and semantic search
Source: https://www.fatskills.com/ai-for-work/chapter/ai-ai-literacy-embeddings-and-semantic-search

AI Literacy: Embeddings and semantic search

By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.

⏱️ ~5 min read

Embeddings and Semantic Search: Study Guide

What This Is

Embeddings are numerical representations of data (text, images, etc.) that capture meaning in a dense vector space. Semantic search uses these embeddings to find relevant information based on meaning rather than exact keyword matches. This matters in everyday work because it powers smarter search, recommendations, and automation—like finding similar customer support tickets, retrieving relevant documents for legal research, or improving chatbot responses. Example: A law firm uses semantic search to instantly surface past case files about "wrongful termination" even if the exact phrase isn’t in the document.


Key Facts & Principles

  • Embedding: A vector (list of numbers) that represents data in a high-dimensional space, where similar items are closer together. Example: The word "king" might be embedded near "queen" and "monarch" but far from "apple."
  • Vector space: A mathematical space where embeddings live; distance between vectors measures similarity (e.g., cosine similarity). Example: In a 3D space, "cat" and "dog" might be closer than "cat" and "car."
  • Semantic search: Retrieves results based on meaning, not keywords. Example: Searching "best phone for photography" returns reviews of iPhones and Pixels, not just pages with the exact phrase.
  • Pre-trained embeddings: Off-the-shelf models (e.g., text-embedding-ada-002, sentence-transformers) that convert text to vectors without training. Example: Use OpenAI’s embedding API to encode a product description into a vector.
  • Chunking: Splitting text into smaller pieces (e.g., paragraphs, sentences) before embedding to improve accuracy. Example: A 10-page report is split into 1-paragraph chunks for embedding, not processed as one block.
  • Indexing: Storing embeddings in a database optimized for fast similarity search (e.g., FAISS, Pinecone, Weaviate). Example: A company indexes all customer emails to quickly find similar complaints.
  • Fine-tuning embeddings: Adjusting pre-trained embeddings with domain-specific data to improve performance. Example: A hospital fine-tunes embeddings on medical records to better match patient symptoms to diagnoses.
  • Dimensionality: The length of the embedding vector (e.g., 384, 768, 1536 dimensions). Higher dimensions capture more nuance but require more storage/compute. Example: OpenAI’s text-embedding-3-small uses 1536 dimensions.
  • Zero-shot retrieval: Finding relevant items without prior examples, using only the embedding space. Example: A startup uses semantic search to categorize support tickets into topics it’s never seen before.

Step-by-Step Application

  1. Define your use case
  2. Identify what you’re searching (documents, products, code) and why (e.g., "find similar customer complaints to auto-suggest solutions").
  3. Example: A SaaS company wants to reduce support response time by retrieving past solutions for new tickets.

  4. Choose an embedding model

  5. Pick a pre-trained model based on trade-offs (cost, speed, accuracy). For text:
    • Fast/cheap: sentence-transformers/all-MiniLM-L6-v2 (384 dimensions, free).
    • High accuracy: OpenAI’s text-embedding-3-large (3072 dimensions, paid).
  6. Example: Use all-MiniLM-L6-v2 for an internal wiki search tool.

  7. Preprocess and chunk your data

  8. Clean text (remove HTML, normalize case) and split into chunks (e.g., 256 tokens per chunk).
  9. Example: Split a 5-page contract into 1-paragraph chunks to avoid embedding noise.

  10. Generate and store embeddings

  11. Use the model to convert chunks into vectors and store them in a vector database (e.g., Pinecone, Weaviate, or FAISS for local testing).
  12. Example: Encode 10,000 support tickets into vectors and index them in Pinecone.

  13. Implement semantic search

  14. Convert the user’s query into an embedding, then search the vector database for the nearest neighbors.
  15. Example: A user searches "how to reset password"; the system returns the top 3 most semantically similar help articles.

  16. Evaluate and refine

  17. Test with real queries and measure precision/recall. Adjust chunking, model, or indexing as needed.
  18. Example: If "refund policy" returns unrelated results, add metadata filters (e.g., "topic:billing").

Common Mistakes

  • Mistake: Using raw text without chunking. Correction: Split documents into logical chunks (e.g., paragraphs) to avoid embedding noise. Why: A 10-page document embedded as one vector loses granularity.

  • Mistake: Ignoring metadata (e.g., dates, categories). Correction: Combine embeddings with metadata filters (e.g., "only search documents from 2023"). Why: Semantic search alone can’t filter by structured data.

  • Mistake: Assuming embeddings are "plug-and-play" for all domains. Correction: Fine-tune embeddings on domain-specific data if performance is poor. Why: A legal embedding model will outperform a general one for contract analysis.

  • Mistake: Using cosine similarity for all tasks. Correction: For some use cases (e.g., recommendation systems), try other metrics like Euclidean distance or dot product. Why: Cosine similarity ignores vector magnitude, which can matter for ranking.

  • Mistake: Not updating the index when data changes. Correction: Set up a pipeline to re-embed and re-index new/updated data. Why: Stale embeddings lead to irrelevant results.


Practical Tips

  • Start small: Test embeddings on a subset of data (e.g., 1,000 documents) before scaling. Use FAISS for local prototyping.
  • Combine with keywords: Hybrid search (semantic + keyword) often works better than either alone. Example: Use BM25 for exact matches + embeddings for meaning.
  • Monitor drift: Embeddings can degrade if your data evolves (e.g., new slang, products). Retrain periodically.
  • Optimize for latency: For real-time search, use smaller embeddings (e.g., 384 dimensions) or approximate nearest neighbor (ANN) indexes.

Quick Practice Scenario

Scenario: Your e-commerce team wants to improve product search. A user types "lightweight running shoes for flat feet," but the current keyword search returns heavy hiking boots. How would you use embeddings to fix this?

Answer: Encode all product descriptions into embeddings, then convert the user’s query into an embedding and retrieve the nearest neighbors in the vector space. Explanation: Semantic search captures the intent ("lightweight," "flat feet") rather than matching keywords like "shoes."


Last-Minute Cram Sheet

  1. Embedding = numerical vector capturing meaning; similar items are closer in vector space.
  2. Semantic search = finds results by meaning, not keywords. Example: "cheap flights" matches "affordable airfare."
  3. Chunking = split text into smaller pieces before embedding (e.g., paragraphs, 256 tokens).
  4. Pre-trained embeddings = off-the-shelf models (e.g., OpenAI, sentence-transformers) for quick use.
  5. Vector database = stores embeddings for fast similarity search (e.g., Pinecone, FAISS).
  6. Cosine similarity = measures angle between vectors (0 to 1); higher = more similar.
  7. Fine-tuning = adjust embeddings with domain data for better performance. Overfitting risk.
  8. Hybrid search = combine semantic + keyword search for best results.
  9. Dimensionality trade-off = higher dimensions = more nuance but slower/expensive.
  10. Zero-shot retrieval = find relevant items without prior examples. May struggle with niche domains.