Fatskills
Practice. Master. Repeat.
Study Guide: Forward Deployed Engineer 101: Working with Ambiguity and Incomplete Requirements
Source: https://www.fatskills.com/forward-deployed-engineer-fde/chapter/forward-deployed-engineer-working-with-ambiguity-and-incomplete-requirements

Forward Deployed Engineer 101: Working with Ambiguity and Incomplete Requirements

By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.

⏱️ ~10 min read

Working with Ambiguity and Incomplete Requirements



Working with Ambiguity and Incomplete Requirements


What This Is

As a Forward Deployed Engineer (FDE), you’ll rarely get a perfect spec. Instead, you’ll deploy ML models in classified networks with no internet, build data pipelines for disaster response with shifting priorities, or debug a live customer escalation during a go-live week—all while requirements change hourly. This guide teaches you how to clarify, adapt, and deliver when the problem (and solution) are unclear. Example: You’re on-site at a military base, and the customer says, “We need real-time threat detection,” but their network can’t send data to the cloud. Your job is to figure out what they actually need (e.g., edge inference with offline model updates) and build it before the next missile test.


Key Terms & Concepts

  • Ask vs. Infer:
    The customer’s stated request (“We need a dashboard”) vs. the real problem (their ops team can’t correlate sensor data fast enough). FDEs must dig for the latter.

  • Spike & Stabilize:
    A rapid prototype (e.g., a Jupyter notebook or Flask app) to validate assumptions, followed by hardening (e.g., rewriting in Go, adding tests, containerizing with Docker).

  • Air-Gapped Deployment:
    Deploying software in a network with no internet access. Requires:

  • Pre-downloaded dependencies (e.g., pip download -r requirements.txt → transfer via USB).
  • Offline package managers (e.g., apt-offline, conda-pack).
  • Manual approval chains for every binary.

  • Shadow Requirements:
    Unspoken constraints (e.g., “The system must run on Windows 7 because the customer’s IT won’t upgrade”). Discover these by asking, “What’s the oldest OS/hardware we might encounter?”

  • Minimum Viable Deployment (MVD):
    The smallest working version of a system that can be deployed today (e.g., a single Python script that processes CSV files instead of a full Spark pipeline). Use tools like cron, systemd, or Kubernetes Jobs to schedule it.

  • Customer Proxy:
    A technical contact on the customer’s team who can unblock you (e.g., get firewall rules changed, explain undocumented APIs). Identify them early.

  • Data Contract:
    A written agreement (even a Slack message) defining:

  • Input/output formats (e.g., “CSV with columns X, Y, Z”).
  • Latency expectations (e.g., “95% of requests < 500ms”).
  • Failure modes (e.g., “If the sensor drops, log an error and retry 3x”).

  • Chaos Testing:
    Intentionally breaking things to find edge cases (e.g., kubectl delete pod --all to test recovery, or tc qdisc add dev eth0 root netem loss 30% to simulate packet loss).

  • Hotfix vs. Patch:

  • Hotfix: Immediate, temporary fix (e.g., a sed command to modify a config file).
  • Patch: Permanent, tested solution (e.g., a PR to the main branch with unit tests).

  • ATO (Authorization to Operate):
    A security approval required for government systems. If you’re missing it, your deployment is dead on arrival. Ask early: “What’s the ATO process here?”

  • ACO (Authority to Connect):
    Permission to plug into a network. Without it, you can’t even ping the server. Get this before traveling on-site.

  • IAM (Identity and Access Management):
    Who can do what (e.g., “Only users in group analysts can read the S3 bucket”). Misconfigured IAM is a top cause of deployment failures.


Step-by-Step / Field Process


1. Discovery: Turn Ambiguity into a Data Contract

Goal: Write down what you think the requirements are, then validate them.
Actions:
- Ask the “5 Whys”: For every requirement, ask “Why?” 5 times to get to the root problem.
- Customer: “We need a dashboard.” - You: “Why?” → “To track inventory.” - You: “Why track inventory?” → “Because supplies are running out during missions.” - You: “Why do supplies run out?” → “Because we don’t know what’s in stock.” - You: “Why don’t you know?” → “Because the data is in 3 different spreadsheets.” - Real problem: Need a unified inventory API, not a dashboard.
- Write a Data Contract: Draft a 1-pager (or Slack message) with: - Inputs (e.g., “CSV files from sensors A, B, C”).
- Outputs (e.g., “JSON with fields threat_level, confidence”).
- SLAs (e.g., “99% uptime, <1s latency”).
- Failure modes (e.g., “If sensor A fails, use sensor B’s last known value”).
- Get Sign-Off: Have the customer proxy review and approve it. No sign-off? No code.

2. Build the MVD (Minimum Viable Deployment)

Goal: Deploy something that works in 24–48 hours.
Actions:
- Start with the Hardest Part: If the customer’s network is air-gapped, test dependency downloads first: bash # On a machine with internet: mkdir -p /tmp/deps && cd /tmp/deps pip download -r requirements.txt --dest .
tar -czvf deps.tar.gz .
# Transfer deps.tar.gz to the air-gapped machine via USB.
- Use the Simplest Stack:
- Data pipeline? Start with a Python script + cron (not Spark).
- API? Start with Flask (not Kubernetes).
- UI? Start with a CLI (not React).
- Deploy Early: Even if it’s just a curl endpoint that returns {"status": "ok"}.
- Document Assumptions: Add a README.md with: markdown # Assumptions - Runs on Ubuntu 20.04 (tested).
- Requires Python 3.8+.
- Input files must be <10MB.
- No internet access (all deps pre-downloaded).

3. Validate with Chaos Testing

Goal: Break your MVD to find edge cases.
Actions:
- Simulate Failures:
bash # Kill the process randomly: pkill -f "python app.py" && sleep 5 && python app.py & # Simulate network latency: tc qdisc add dev eth0 root netem delay 200ms 50ms # Fill up disk: dd if=/dev/zero of=/tmp/fill_disk bs=1M count=1000 - Check Logs: Tail logs (journalctl -u myapp -f or kubectl logs -f <pod>) and look for: - Timeouts.
- Permission errors.
- Silent failures (e.g., a try/except that swallows errors).
- Fix One Thing at a Time: If the app crashes under load, don’t rewrite it—add a retry loop first: ```python from tenacity import retry, stop_after_attempt

@retry(stop=stop_after_attempt(3)) def fetch_data():
# Your code here ```

4. Harden the Deployment

Goal: Turn the MVD into a production-ready system.
Actions:
- Containerize (If Allowed):
dockerfile FROM python:3.8-slim COPY deps.tar.gz /tmp/ RUN tar -xzvf /tmp/deps.tar.gz -C / && pip install --no-index --find-links=/deps -r /deps/requirements.txt COPY app.py /app/ CMD ["python", "/app/app.py"] - Air-gapped? Build the image on a machine with internet, then transfer it:
bash
docker save myapp:latest > myapp.tar
# Transfer myapp.tar to the air-gapped machine.
docker load < myapp.tar
- Add Monitoring:
- Basic: curl -s http://localhost:8080/health | grep "ok" in a cron job.
- Advanced: Prometheus + Grafana (if allowed).
- Write Runbooks: A RUNBOOK.md with: markdown # How to Restart 1. SSH into the server: `ssh [email protected] -i ~/.ssh/customer_key`.
2. Check logs: `journalctl -u myapp -n 50`.
3. Restart: `sudo systemctl restart myapp`.

5. Handle Scope Creep (Without Dying)

Goal: Say “no” (or “not now”) without burning bridges.
Actions:
- The “Parking Lot” Technique: When the customer asks for a new feature: - Write it down in a shared doc (e.g., “Parking Lot” in Google Docs).
- Say: “This is important—let’s prioritize it after we stabilize the current deployment.” - Trade-Offs: If they insist, ask: - “What’s the impact if we don’t do this now?” - “Can we deliver this in Phase 2?” - Document Everything: If you do add scope, update the Data Contract and get sign-off.

6. Handoff (The Most Critical Step)

Goal: Ensure the customer can run the system without you.
Actions:
- Train the Customer Proxy:
- Walk them through the RUNBOOK.md.
- Have them restart the service while you watch.
- Leave a “Break Glass” Script: A one-liner to fix common issues: bash # Example: Reset the database if it crashes.
curl -X POST http://localhost:8080/reset_db
- Schedule a Follow-Up: Set a calendar invite for 1 week later to check in.


Common Mistakes

Mistake Correction Why
Assuming the customer knows what they want. Use the “5 Whys” and write a Data Contract. Customers often describe symptoms, not problems.
Building for your lab, not the customer’s environment. Test in the exact customer environment (OS, network, hardware). A model that works on your MacBook may fail on their Windows 7 machine.
Ignoring shadow requirements. Ask: “What’s the oldest OS/hardware we might encounter?” You’ll waste days debugging if you assume modern hardware.
Not documenting assumptions. Add a README.md with assumptions (e.g., “Runs on Ubuntu 20.04”). Future you (or the customer) will thank you.
Over-engineering the MVD. Start with a Python script + cron, not Kubernetes. You’ll throw away 80% of the MVD code anyway.


FDE Interview / War Story Insights


Interview Questions They’ll Ask

  1. “You’re on-site, and the customer demands a feature that wasn’t in the original scope. How do you respond?”
  2. Answer: “I’d ask, ‘What’s the impact if we don’t do this now?’ If it’s critical, I’d add it to the ‘Parking Lot’ and prioritize it after stabilizing the current deployment. If they insist, I’d negotiate trade-offs (e.g., ‘We can do this, but it’ll delay the security audit by 2 days’).”

  3. “You’re deploying to an air-gapped network, and your Docker image fails to run. What do you do?”

  4. Answer: “First, I’d check the logs (docker logs <container>) for missing dependencies. If it’s a network issue, I’d rebuild the image with all deps pre-downloaded (docker build --no-cache) and transfer it via USB. If that fails, I’d fall back to a Python script + cron.”

  5. “The customer’s data is in a format you’ve never seen before. How do you proceed?”

  6. Answer: “I’d write a quick Python script to validate the data (e.g., pandas.read_csv()), then ask the customer proxy to explain the schema. If it’s messy, I’d build a data cleaning script first.”

War Stories (How to Frame Your Experience)

  • The “We Need This Yesterday” Deployment:
    “During a disaster response mission, the customer needed a real-time data pipeline to track supply deliveries. The catch? Their network was air-gapped, and the data was in 5 different formats. I built a Python script that normalized the data, ran it on a Raspberry Pi, and handed it off in 48 hours. The key was focusing on the MVD—no Kubernetes, no fancy UI, just a script that worked.”

  • The Scope Creep Nightmare:
    “A customer kept adding ‘small’ features during a go-live week. I used the ‘Parking Lot’ technique to defer them, but when they insisted on a last-minute change, I negotiated: ‘We can do this, but it’ll delay the security audit. Is that acceptable?’ They backed down.”


Quick Check Questions

  1. You’re deploying to an environment where you can’t run standard Docker images due to security restrictions. What’s your first step?
  2. Answer: Check if they allow podman (a Docker alternative) or if you need to fall back to a Python script + systemd. Explanation: Always ask what’s allowed before assuming you can use your preferred tools.

  3. The customer says, “We need a dashboard,” but their data is in 3 different spreadsheets. What do you build first?

  4. Answer: A Python script to unify the spreadsheets into a single API (e.g., Flask). Explanation: The real problem is data silos, not the dashboard.

  5. You’re on-site, and the customer’s IT team says, “We can’t give you root access.” How do you proceed?

  6. Answer: Ask for a user with sudo privileges for specific commands (e.g., sudo systemctl restart myapp), or fall back to a non-root deployment (e.g., ~/.local/bin). Explanation: Never assume you’ll get full access—plan for restrictions.

Last-Minute Cram Sheet

  1. Always ask: “What’s the oldest OS/hardware we might encounter?” (⚠️ Shadow requirements kill deployments.)
  2. MVD > Perfect: Start with a Python script + cron before Kubernetes.
  3. Air-gapped? pip download -r requirements.txt → transfer via USB.
  4. No Docker? Try podman or fall back to systemd.
  5. Data Contract: Inputs, outputs, SLAs, failure modes. Get sign-off.
  6. Chaos test: tc qdisc add dev eth0 root netem loss 30% (simulate packet loss).
  7. Hotfix vs. Patch: sed for hotfixes, PRs for patches.
  8. ATO/ACO: Ask early—no approval = no deployment.
  9. Parking Lot: Defer scope creep to a shared doc.
  10. ⚠️ Test in the exact customer environment—your lab is a lie.


ADVERTISEMENT