Fatskills
Practice. Master. Repeat.
Study Guide: Forward Deployed Engineer 101: Discovery and Requirements Gathering (Running Workshops, Technical Deep‑Dives, Asking Why)
Source: https://www.fatskills.com/forward-deployed-engineer-fde/chapter/forward-deployed-engineer-discovery-and-requirements-gathering-running-workshops-technical-deepdives-asking-why

Forward Deployed Engineer 101: Discovery and Requirements Gathering (Running Workshops, Technical Deep‑Dives, Asking Why)

By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.

⏱️ ~12 min read

Discovery and Requirements Gathering (Running Workshops, Technical Deep‑Dives, Asking Why)



Discovery & Requirements Gathering: Field-Ready Study Guide

For Forward Deployed Engineers (FDEs) in Defense, Intelligence, and Enterprise

What This Is

Discovery and requirements gathering is the process of uncovering what the customer actually needs—not just what they say they want—before writing a single line of code. As an FDE, you’ll often work in high-stakes, constrained environments (e.g., classified networks, disaster zones, or enterprise systems with strict compliance rules) where misaligned requirements can mean mission failure, security breaches, or wasted months of work.

Field Example:
You’re deployed to a military base to integrate a computer vision model into a drone surveillance system. The customer says, “We need real-time object detection.” But after digging deeper, you learn: - The drone feeds are air-gapped (no cloud, no internet).
- The model must run on edge hardware with 4GB RAM and no GPU.
- The output must feed into a legacy C2 (Command & Control) system that only accepts XML over a serial port.
- The customer’s “real-time” definition is <500ms latency, but their network introduces 300ms of jitter.

If you’d built the “real-time object detection” they asked for without this context, you’d have delivered a useless system. Instead, you: 1. Ran a technical deep-dive workshop to map the data flow.
2. Asked “Why?” five times to uncover hidden constraints.
3. Prototyped a lightweight ONNX model that runs on CPU and outputs XML.
4. Deployed it via sneakernet (USB drives) because the network was air-gapped.

This is discovery in action.


Key Terms & Concepts

  • Ask vs. Infer:
  • Ask: What the customer explicitly requests (e.g., “We need a dashboard”).
  • Infer: What the data/mission actually requires (e.g., “The dashboard is useless without role-based access control (RBAC) because analysts can’t see classified data”).
  • Tool: Use Miro or Excalidraw to whiteboard the gap between the two.

  • Technical Deep-Dive (TDD):
    A structured workshop where you dissect a system’s architecture, data flows, and constraints. Goal: Identify how the customer’s environment will break your solution.

  • Example Agenda:
    1. Current state (diagram their system).
    2. Pain points (what’s broken today?).
    3. Constraints (network, security, hardware).
    4. Success criteria (how will they measure “done”?).
  • Tool: Lucidchart (for diagrams), Jupyter Notebooks (for live data exploration).

  • Five Whys:
    A root-cause analysis technique. Keep asking “Why?” until you hit the real problem (not the symptom).

  • Example:


    • Customer: “We need a faster database.”
    • Why? “Queries are slow.”
    • Why? “The dashboard times out.”
    • Why? “The query joins 10 tables.”
    • Why? “The schema wasn’t designed for analytics.”
    • Real Problem: They need a data warehouse, not a faster database.
  • Constraints Mapping:
    Documenting hard vs. soft constraints (e.g., “Must run on RHEL 7” vs. “Would prefer Python 3.9”).

  • Tool: Markdown table or Notion for tracking:
    | Constraint | Type | Impact | Workaround |
    |---------------------|--------|---------------------------------|--------------------------|
    | No internet access | Hard | Can’t pull Docker images | Air-gapped registry |
    | FIPS 140-2 | Hard | No MD5 hashing | Use SHA-256 |

  • Stakeholder Matrix:
    A grid of who influences the project (e.g., end users, security teams, budget owners) and their priorities.

  • Example:
    | Stakeholder | Role | Priority | Pain Point |
    |-------------------|--------------------|------------------------|--------------------------|
    | SOC Analyst | End User | Faster alerts | False positives |
    | CISO | Approver | Zero CVEs | No unpatched software |
    | Program Manager | Budget Owner | On-time delivery | Scope creep |

  • Pre-Mortem:
    A workshop where the team imagines the project failed and brainstorms why. Forces proactive risk identification.

  • Prompt: “It’s 6 months from now, and this project is a disaster. What went wrong?”
  • Tool: Mural or Google Jamboard for collaborative brainstorming.

  • Data Contract:
    A written agreement between you and the customer on:

  • Data format (e.g., “CSV with columns X, Y, Z”).
  • Latency (e.g., “95% of requests <200ms”).
  • Ownership (e.g., “Customer provides labeled data by EOD Friday”).
  • Tool: Google Docs (for versioning) or Confluence (for enterprise).

  • Shadowing:
    Observing end users in their actual workflow to spot inefficiencies. Often reveals requirements the customer didn’t articulate.

  • Example: Watching a SOC analyst manually copy-paste IP addresses from emails into a SIEM tool → reveals they need email-to-SIEM automation.

  • Spike Solution:
    A throwaway prototype to validate a technical approach before committing to a full build.

  • Example: “Can we run TensorFlow Lite on this ARM device?” → Build a 1-hour spike to test.
  • Tool: Python + FastAPI (for quick APIs), Docker (for portability).

  • ATO (Authority to Operate):
    The formal approval to deploy software in a government/enterprise environment. Requires documentation (e.g., System Security Plan (SSP)) and often takes months.

  • Field Trap: Assume ATO is someone else’s problem → you’ll get blocked at deployment.
  • Tool: eMASS (DoD), NIST SP 800-53 (controls checklist).

  • ACO (Authority to Connect):
    Permission to connect your system to another (e.g., “Can we send data to the customer’s SIEM?”). Often requires MOUs (Memoranda of Understanding) or ISA (Interconnection Security Agreements).

  • Example: “We need to send logs to Splunk.” → Customer’s Splunk team may require TLS 1.2+, specific log format, and IP whitelisting.

  • Sneakernet:
    Physically transporting data/media (e.g., USB drives, hard drives) because the network is air-gapped.

  • Tool: Rufus (for bootable USBs), VeraCrypt (for encrypted drives).


Step-by-Step / Field Process


1. Pre-Workshop: Build the Stakeholder Matrix & Agenda

  • Action:
  • Email the customer: “Who are the key stakeholders for this project? We’d like to invite [list roles] to the workshop.”
  • Draft a 1-page agenda (example below) and send it 48 hours in advance.
  • Tool: Google Docs (collaborative editing), Calendly (scheduling).
  • Example Agenda:
    ```
  • Introductions (10 min) – Who’s in the room and what’s their role?
  • Current State (30 min) – Walk us through your workflow today.
  • Pain Points (30 min) – What’s broken or inefficient?
  • Constraints (20 min) – Network, security, hardware, compliance.
  • Success Criteria (20 min) – How will we know this is “done”?
  • Next Steps (10 min) – Who owns what by when? ```

2. Workshop Day: Run the Technical Deep-Dive

  • Action:
  • Whiteboard the current state (even if it’s ugly). Ask: “Show us how data flows from A to B.”
  • Ask “Why?” five times for every requirement. Example:
    • Customer: “We need a Kafka cluster.”
    • You: “Why Kafka?”
    • Customer: “Because we need to process 10K events/sec.”
    • You: “Why 10K events/sec?”
    • Customer: “Because our current system drops 30% of events.”
    • Real Problem: They need reliable event processing, not necessarily Kafka.
  • Map constraints (e.g., “No cloud,” “FIPS 140-2,” “Must run on RHEL 7”).
  • Define success criteria in measurable terms (e.g., “Reduce false positives by 40%” vs. “Improve accuracy”).
  • Tool: Excalidraw (for diagrams), Miro (for sticky notes), Zoom (for remote workshops).

3. Post-Workshop: Document & Validate

  • Action:
  • Write a 1-page “Discovery Summary” (example below) and send it to the customer within 24 hours. Ask: “Does this capture everything correctly?”
  • Build a spike solution to validate the riskiest assumption (e.g., “Can we run this model on their hardware?”).
  • Tool: Markdown (for docs), Jupyter Notebook (for spikes), GitHub/GitLab (for versioning).
  • Example Discovery Summary:
    ```markdown # Discovery Summary: Drone Surveillance Project Stakeholders:
  • SOC Team (end users)
  • Base IT (network/security)
  • Program Manager (budget)

Current State:
- Drone feeds → Local server → Manual review → C2 system (XML over serial)

Pain Points:
- Manual review is slow (20 min per feed).
- No object detection → high false negatives.

Constraints:
- Air-gapped network (no internet).
- Edge device: NVIDIA Jetson TX2 (4GB RAM, no GPU).
- Output must be XML over serial (legacy C2 system).

Success Criteria:
- Latency <500ms (including network jitter).
- False positives <10%.
- Deployable via USB (sneakernet).

Next Steps:
- [You] Build ONNX model prototype (EOD Friday).
- [Customer] Provide 10 sample drone feeds (EOD Thursday).
```

4. Follow-Up: Align on the Data Contract

  • Action:
  • Draft a data contract (example below) and get sign-off from the customer.
  • Schedule a “Constraints Review” meeting to walk through the spike solution.
  • Tool: Google Docs (for contracts), Zoom (for reviews).
  • Example Data Contract:
    ```markdown # Data Contract: Drone Surveillance Model Input:
  • Format: MP4 (H.264, 1080p, 30fps).
  • Size: <500MB per feed.
  • Delivery: USB drive (encrypted with VeraCrypt).

Output:
- Format: XML (schema attached).
- Latency: <500ms (P99).
- Fields: object_type, confidence, timestamp, coordinates.

Ownership:
- [Customer] Provides 50 labeled feeds by 2023-11-15.
- [You] Delivers model binary by 2023-12-01.
```

5. Escalation Plan: When Things Go Wrong

  • Action:
  • Identify the “blocker owner” (e.g., “Who can unblock us if the ATO is delayed?”).
  • Set up a weekly 15-minute sync with the customer to surface risks early.
  • Tool: Slack (for quick questions), Jira (for tracking blockers).


Common Mistakes


Mistake 1: Taking Requirements at Face Value

  • What Happens: You build what the customer asks for, not what they need.
  • Example: Customer says, “We need a dashboard.” You build a dashboard → they never use it because they needed automated alerts, not a UI.
  • Correction:
  • Ask “Why?” five times to uncover the real problem.
  • Shadow end users to see their actual workflow.
  • Field Tip: “Show me how you do this today.” (Often reveals manual workarounds.)

Mistake 2: Ignoring Constraints Until Deployment

  • What Happens: You assume the customer’s environment is like your lab → your solution breaks in production.
  • Example: You develop a Python app that works locally → fails in the customer’s RHEL 7 environment because they don’t have glibc 2.28.
  • Correction:
  • Map constraints first (network, OS, security, hardware).
  • Build a spike solution to validate the riskiest assumption.
  • Field Tip: “Can I get a VM that matches your production environment?”

Mistake 3: Skipping the Stakeholder Matrix

  • What Happens: You miss a key decision-maker → your project gets blocked at the last minute.
  • Example: You deliver a solution that works for end users → the security team rejects it because it doesn’t meet FIPS 140-2.
  • Correction:
  • Build a stakeholder matrix before the workshop.
  • Invite the “blockers” (security, compliance, budget owners) to the deep-dive.
  • Field Tip: “Who else needs to approve this?”

Mistake 4: Not Defining Success Criteria

  • What Happens: The customer says “It’s not working” → you have no way to prove it’s “done.”
  • Example: You deliver a model with 90% accuracy → the customer says “It’s not good enough.” → no baseline was set.
  • Correction:
  • Define success in measurable terms (e.g., “Reduce false positives by 40%” vs. “Improve accuracy”).
  • Get sign-off on the data contract before building.
  • Field Tip: “How will we know this is successful?”

Mistake 5: Assuming the Customer Knows Their Own System

  • What Happens: The customer gives you incorrect info → your solution fails.
  • Example: Customer says “Our network allows outbound HTTPS” → you build a cloud-dependent app → it fails because their firewall blocks TLS 1.3.
  • Correction:
  • Validate everything (e.g., “Can you show me the firewall rules?”).
  • Run a “network test” spike (e.g., “Can I curl https://google.com from your server?”).
  • Field Tip: “Can I see the logs from your last outage?”


FDE Interview / War Story Insights


1. “The Customer Demands a Feature That Violates Scope”

  • Interviewer’s Goal: Test your ability to push back while keeping the customer happy.
  • How to Answer:
  • Acknowledge the ask: “I understand why this is important to you.”
  • Clarify the “Why”: “Can you help me understand the problem this solves?”
  • Propose alternatives: “Instead of adding this feature now, could we [alternative]?”
  • Escalate if needed: “Let me check with my team on the trade-offs.”
  • Field Example:
  • Customer: “We need this model to run in the cloud.”
  • You: “I get why that’s easier, but our contract specifies air-gapped deployment. Can we explore [edge deployment] instead?”
  • Customer: “But the cloud is faster!”
  • You: “Let’s test both. I’ll build a spike for the edge version by EOD tomorrow, and we can compare performance.”

2. “You’re On-Site and the Customer’s Environment is Nothing Like They Described”

  • Interviewer’s Goal: Test your adaptability and problem-solving under chaos.
  • How to Answer:
  • Stay calm: “This is different from what we expected—let’s figure it out.”
  • Diagnose quickly: “Can I get SSH access to one of your servers to check the OS/network?”
  • Propose a workaround: “We can [adjust the solution] to work with [constraint].”
  • Communicate early: “I’ll update my team on the new constraints so we can adjust the timeline.”
  • Field Example:
  • You: “You said this was RHEL 8, but it’s actually RHEL 7.”
  • Customer: “Oh, did we? Sorry about that.”
  • You: “No problem. Let me check if our dependencies support RHEL 7. If not, we can [cross-compile/static link/build a container].”

3. “The Customer Won’t Give You Access to Their Data”

  • Interviewer’s Goal: Test your ability to work around data restrictions.
  • How to Answer:
  • Ask for synthetic data: “Can you generate fake data that matches the schema?”
  • Use public datasets: “Here’s a similar dataset we can prototype with.”
  • Build a data generator: “I’ll write a script to simulate your data.”
  • Escalate: “Without data, we can’t validate the solution. Can we get a sample under an NDA?”
  • Field Example:
  • Customer: “We can’t share our logs—they’re classified.”
  • You: “No problem. Can you generate fake logs with the same format? Or I can write a script to simulate them.”

4. “The Customer’s ‘Real-Time’ Definition is Different from Yours”

  • Interviewer’s Goal: Test your ability to align on ambiguous terms.
  • How to Answer:
  • Define terms explicitly: “When you say ‘real-time,’ do you mean <100ms, <1s, or <10s?”
  • Use examples: “For example, if a drone detects a threat, how quickly should the operator see it?”
  • Document the definition: “Let’s add this to the data contract.”
  • Field Example:
  • Customer: “We need real-time alerts.”
  • You: “Got it. What’s the maximum acceptable delay between an event and the alert?”
  • Customer: “Oh, under 5 minutes is fine.”
  • You: “Great—that’s not actually ‘real-time’ in engineering terms. Let’s call it ‘near-real-time’ to avoid confusion.”


Quick Check Questions


1. You’re deploying to an environment where you can’t run standard Docker images due to security restrictions. What’s your first step?

  • Answer: Ask the customer for their approved container runtime (e.g., Podman, Singularity) or base image (e.g., “Do you have a hardened RHEL image we can use?”).
  • Why: Never assume you can use Docker—many secure environments ban it.

2. The customer says, “We need a machine learning model to detect fraud.” What’s the first question you ask?

  • Answer: “What’s the current process for detecting fraud, and where does it fail?”
  • Why: You need to understand the baseline to define success (e.g., “Reduce false positives by 30%” vs. “Build a model”).

3. You’re in a workshop, and the customer’s security team says, “Your solution must be FIPS 140-2 compliant.” What do you do next?

  • Answer: Ask for their FIPS 140-2 compliance checklist and validate your dependencies (e.g., “Does OpenSSL 1.1.1 support FIPS mode?”).
  • Why: FIPS compliance is non-negotiable in government/enterprise—you need to know the exact requirements.


Last-Minute Cram Sheet

  1. Always ask “Why?” five times to uncover the real problem.
  2. Shadow end users—what they say ≠ what they do.
  3. Map constraints first: OS, network, security, hardware.
  4. Define success in measurable terms (e.g., “<500ms latency” vs. “fast”).
  5. Build a spike solution to validate the riskiest assumption.
  6. Get a VM that matches the customer’s environment before writing code.
  7. Document everything (discovery summary, data contract, constraints).
  8. ⚠️ Never assume the customer’s environment matches your lab.
  9. FIPS 140-2 = no MD5, no SHA-1, only approved crypto modules.
  10. ATO (Authority to Operate) can take months—start early.


ADVERTISEMENT