By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.
As a Forward Deployed Engineer (FDE), you’ll rarely get a perfect spec. Instead, you’ll deploy ML models in classified networks with no internet, build data pipelines for disaster response with shifting priorities, or debug a live customer escalation during a go-live week—all while requirements change hourly. This guide teaches you how to clarify, adapt, and deliver when the problem (and solution) are unclear. Example: You’re on-site at a military base, and the customer says, “We need real-time threat detection,” but their network can’t send data to the cloud. Your job is to figure out what they actually need (e.g., edge inference with offline model updates) and build it before the next missile test.
Ask vs. Infer: The customer’s stated request (“We need a dashboard”) vs. the real problem (their ops team can’t correlate sensor data fast enough). FDEs must dig for the latter.
Spike & Stabilize: A rapid prototype (e.g., a Jupyter notebook or Flask app) to validate assumptions, followed by hardening (e.g., rewriting in Go, adding tests, containerizing with Docker).
Air-Gapped Deployment: Deploying software in a network with no internet access. Requires:
pip download -r requirements.txt
apt-offline
conda-pack
Manual approval chains for every binary.
Shadow Requirements: Unspoken constraints (e.g., “The system must run on Windows 7 because the customer’s IT won’t upgrade”). Discover these by asking, “What’s the oldest OS/hardware we might encounter?”
Minimum Viable Deployment (MVD): The smallest working version of a system that can be deployed today (e.g., a single Python script that processes CSV files instead of a full Spark pipeline). Use tools like cron, systemd, or Kubernetes Jobs to schedule it.
cron
systemd
Kubernetes Jobs
Customer Proxy: A technical contact on the customer’s team who can unblock you (e.g., get firewall rules changed, explain undocumented APIs). Identify them early.
Data Contract: A written agreement (even a Slack message) defining:
Failure modes (e.g., “If the sensor drops, log an error and retry 3x”).
Chaos Testing: Intentionally breaking things to find edge cases (e.g., kubectl delete pod --all to test recovery, or tc qdisc add dev eth0 root netem loss 30% to simulate packet loss).
kubectl delete pod --all
tc qdisc add dev eth0 root netem loss 30%
Hotfix vs. Patch:
sed
Patch: Permanent, tested solution (e.g., a PR to the main branch with unit tests).
ATO (Authorization to Operate): A security approval required for government systems. If you’re missing it, your deployment is dead on arrival. Ask early: “What’s the ATO process here?”
ACO (Authority to Connect): Permission to plug into a network. Without it, you can’t even ping the server. Get this before traveling on-site.
ping
IAM (Identity and Access Management): Who can do what (e.g., “Only users in group analysts can read the S3 bucket”). Misconfigured IAM is a top cause of deployment failures.
analysts
Goal: Write down what you think the requirements are, then validate them.Actions:- Ask the “5 Whys”: For every requirement, ask “Why?” 5 times to get to the root problem. - Customer: “We need a dashboard.” - You: “Why?” → “To track inventory.” - You: “Why track inventory?” → “Because supplies are running out during missions.” - You: “Why do supplies run out?” → “Because we don’t know what’s in stock.” - You: “Why don’t you know?” → “Because the data is in 3 different spreadsheets.” - Real problem: Need a unified inventory API, not a dashboard.- Write a Data Contract: Draft a 1-pager (or Slack message) with: - Inputs (e.g., “CSV files from sensors A, B, C”). - Outputs (e.g., “JSON with fields threat_level, confidence”). - SLAs (e.g., “99% uptime, <1s latency”). - Failure modes (e.g., “If sensor A fails, use sensor B’s last known value”).- Get Sign-Off: Have the customer proxy review and approve it. No sign-off? No code.
threat_level
confidence
Goal: Deploy something that works in 24–48 hours.Actions:- Start with the Hardest Part: If the customer’s network is air-gapped, test dependency downloads first: bash # On a machine with internet: mkdir -p /tmp/deps && cd /tmp/deps pip download -r requirements.txt --dest . tar -czvf deps.tar.gz . # Transfer deps.tar.gz to the air-gapped machine via USB. - Use the Simplest Stack: - Data pipeline? Start with a Python script + cron (not Spark). - API? Start with Flask (not Kubernetes). - UI? Start with a CLI (not React).- Deploy Early: Even if it’s just a curl endpoint that returns {"status": "ok"}.- Document Assumptions: Add a README.md with: markdown # Assumptions - Runs on Ubuntu 20.04 (tested). - Requires Python 3.8+. - Input files must be <10MB. - No internet access (all deps pre-downloaded).
bash # On a machine with internet: mkdir -p /tmp/deps && cd /tmp/deps pip download -r requirements.txt --dest . tar -czvf deps.tar.gz . # Transfer deps.tar.gz to the air-gapped machine via USB.
curl
{"status": "ok"}
README.md
markdown # Assumptions - Runs on Ubuntu 20.04 (tested). - Requires Python 3.8+. - Input files must be <10MB. - No internet access (all deps pre-downloaded).
Goal: Break your MVD to find edge cases.Actions:- Simulate Failures: bash # Kill the process randomly: pkill -f "python app.py" && sleep 5 && python app.py & # Simulate network latency: tc qdisc add dev eth0 root netem delay 200ms 50ms # Fill up disk: dd if=/dev/zero of=/tmp/fill_disk bs=1M count=1000 - Check Logs: Tail logs (journalctl -u myapp -f or kubectl logs -f <pod>) and look for: - Timeouts. - Permission errors. - Silent failures (e.g., a try/except that swallows errors).- Fix One Thing at a Time: If the app crashes under load, don’t rewrite it—add a retry loop first: ```python from tenacity import retry, stop_after_attempt
bash # Kill the process randomly: pkill -f "python app.py" && sleep 5 && python app.py & # Simulate network latency: tc qdisc add dev eth0 root netem delay 200ms 50ms # Fill up disk: dd if=/dev/zero of=/tmp/fill_disk bs=1M count=1000
journalctl -u myapp -f
kubectl logs -f <pod>
try/except
@retry(stop=stop_after_attempt(3)) def fetch_data(): # Your code here ```
Goal: Turn the MVD into a production-ready system.Actions:- Containerize (If Allowed): dockerfile FROM python:3.8-slim COPY deps.tar.gz /tmp/ RUN tar -xzvf /tmp/deps.tar.gz -C / && pip install --no-index --find-links=/deps -r /deps/requirements.txt COPY app.py /app/ CMD ["python", "/app/app.py"] - Air-gapped? Build the image on a machine with internet, then transfer it: bash docker save myapp:latest > myapp.tar # Transfer myapp.tar to the air-gapped machine. docker load < myapp.tar - Add Monitoring: - Basic: curl -s http://localhost:8080/health | grep "ok" in a cron job. - Advanced: Prometheus + Grafana (if allowed).- Write Runbooks: A RUNBOOK.md with: markdown # How to Restart 1. SSH into the server: `ssh [email protected] -i ~/.ssh/customer_key`. 2. Check logs: `journalctl -u myapp -n 50`. 3. Restart: `sudo systemctl restart myapp`.
dockerfile FROM python:3.8-slim COPY deps.tar.gz /tmp/ RUN tar -xzvf /tmp/deps.tar.gz -C / && pip install --no-index --find-links=/deps -r /deps/requirements.txt COPY app.py /app/ CMD ["python", "/app/app.py"]
bash docker save myapp:latest > myapp.tar # Transfer myapp.tar to the air-gapped machine. docker load < myapp.tar
curl -s http://localhost:8080/health | grep "ok"
RUNBOOK.md
markdown # How to Restart 1. SSH into the server: `ssh [email protected] -i ~/.ssh/customer_key`. 2. Check logs: `journalctl -u myapp -n 50`. 3. Restart: `sudo systemctl restart myapp`.
Goal: Say “no” (or “not now”) without burning bridges.Actions:- The “Parking Lot” Technique: When the customer asks for a new feature: - Write it down in a shared doc (e.g., “Parking Lot” in Google Docs). - Say: “This is important—let’s prioritize it after we stabilize the current deployment.” - Trade-Offs: If they insist, ask: - “What’s the impact if we don’t do this now?” - “Can we deliver this in Phase 2?” - Document Everything: If you do add scope, update the Data Contract and get sign-off.
Goal: Ensure the customer can run the system without you.Actions:- Train the Customer Proxy: - Walk them through the RUNBOOK.md. - Have them restart the service while you watch.- Leave a “Break Glass” Script: A one-liner to fix common issues: bash # Example: Reset the database if it crashes. curl -X POST http://localhost:8080/reset_db - Schedule a Follow-Up: Set a calendar invite for 1 week later to check in.
bash # Example: Reset the database if it crashes. curl -X POST http://localhost:8080/reset_db
Answer: “I’d ask, ‘What’s the impact if we don’t do this now?’ If it’s critical, I’d add it to the ‘Parking Lot’ and prioritize it after stabilizing the current deployment. If they insist, I’d negotiate trade-offs (e.g., ‘We can do this, but it’ll delay the security audit by 2 days’).”
“You’re deploying to an air-gapped network, and your Docker image fails to run. What do you do?”
Answer: “First, I’d check the logs (docker logs <container>) for missing dependencies. If it’s a network issue, I’d rebuild the image with all deps pre-downloaded (docker build --no-cache) and transfer it via USB. If that fails, I’d fall back to a Python script + cron.”
docker logs <container>
docker build --no-cache
“The customer’s data is in a format you’ve never seen before. How do you proceed?”
pandas.read_csv()
The “We Need This Yesterday” Deployment: “During a disaster response mission, the customer needed a real-time data pipeline to track supply deliveries. The catch? Their network was air-gapped, and the data was in 5 different formats. I built a Python script that normalized the data, ran it on a Raspberry Pi, and handed it off in 48 hours. The key was focusing on the MVD—no Kubernetes, no fancy UI, just a script that worked.”
The Scope Creep Nightmare: “A customer kept adding ‘small’ features during a go-live week. I used the ‘Parking Lot’ technique to defer them, but when they insisted on a last-minute change, I negotiated: ‘We can do this, but it’ll delay the security audit. Is that acceptable?’ They backed down.”
Answer: Check if they allow podman (a Docker alternative) or if you need to fall back to a Python script + systemd. Explanation: Always ask what’s allowed before assuming you can use your preferred tools.
podman
The customer says, “We need a dashboard,” but their data is in 3 different spreadsheets. What do you build first?
Answer: A Python script to unify the spreadsheets into a single API (e.g., Flask). Explanation: The real problem is data silos, not the dashboard.
You’re on-site, and the customer’s IT team says, “We can’t give you root access.” How do you proceed?
sudo
sudo systemctl restart myapp
~/.local/bin
Join 4M+ learners. Unlock unlimited quizzes, wrong-answer tracking, flashcards + reminders, study guides, and 1-on-1 challenges.