Fatskills
Practice. Master. Repeat.
Study Guide: Forward Deployed Engineer 101: Project Management in the Field (Agile in a Customer Context, Backlogs, Prioritization)
Source: https://www.fatskills.com/forward-deployed-engineer-fde/chapter/forward-deployed-engineer-project-management-in-the-field-agile-in-a-customer-context-backlogs-prioritization

Forward Deployed Engineer 101: Project Management in the Field (Agile in a Customer Context, Backlogs, Prioritization)

By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.

⏱️ ~13 min read

Project Management in the Field (Agile in a Customer Context, Backlogs, Prioritization)


Project Management in the Field: Agile in a Customer Context

A Field-Ready Study Guide for Forward Deployed Engineers (FDEs)


What This Is

Project management in the field isn’t about Jira sprints or Scrum ceremonies—it’s about delivering working software in chaotic, high-stakes environments while keeping the customer mission alive. As an FDE, you’ll often work in classified networks, disaster zones, or enterprise environments with strict security constraints. Your "Agile" might mean: - Deploying an ML model on-premise inside a classified DoD network where you can’t use cloud APIs, Docker is restricted, and every dependency must be air-gapped.
- Building a data pipeline for disaster response where the schema changes hourly, the customer’s "requirements" are vague, and the system must work offline.
- De-escalating a customer during go-live week when their CTO is screaming because the system crashed, and you have to triage, debug, and deploy a fix in 2 hours—while documenting everything for compliance.

This guide gives you field-tested patterns for managing backlogs, prioritizing work, and keeping customers (and your sanity) intact.


Key Terms & Concepts

  • Customer Backlog vs. Engineering Backlog
  • Customer Backlog: Features, bugs, and requests the customer explicitly asks for (e.g., "We need a dashboard showing real-time sensor data").
  • Engineering Backlog: Technical debt, security patches, and infrastructure work the customer doesn’t see but will break things if ignored (e.g., "Upgrade Python 3.7 → 3.11 before the EOL deadline").
  • Field Tip: Always keep these separate in your tracking tool (e.g., Jira epics labeled [CUSTOMER] vs. [ENG]). Customers prioritize their backlog; you prioritize yours.

  • Ask vs. Infer

  • Ask: What the customer says they need (e.g., "We need a report showing daily sales").
  • Infer: What the data/mission actually requires (e.g., "The sales team is gaming the system—we need anomaly detection to flag suspicious entries").
  • Field Tip: Use discovery scripts (Python/Pandas) to validate the "ask" before writing a line of production code.

  • Field-Ready Agile (Fragile)

  • Traditional Agile assumes stable environments, predictable sprints, and co-located teams. In the field:
    • Sprints are measured in hours/days, not weeks.
    • Standups happen ad-hoc (e.g., "We’re deploying at 0300—sync in 10 mins").
    • Definition of Done (DoD) includes "works in customer’s environment" (not just "passes tests in CI").
  • Tools: Use GitLab/GitHub Issues (for lightweight tracking), Slack threads (for async updates), and Markdown docs (for runbooks).

  • Priority Matrix (Mission vs. Noise)

  • A 2x2 grid to triage requests:
    | High Mission Impact | Low Mission Impact |
    |---------------------|--------------------|
    | Do Now (e.g., "The pipeline is down—no data is flowing") | Delegate (e.g., "The CEO wants a new button color—assign to PM") |
    | Schedule (e.g., "Upgrade Kubernetes cluster next quarter") | Ignore (e.g., "Can we add a dark mode?") |
  • Field Tip: Use emoji in Slack/Jira to signal priority (? = Do Now, ⏳ = Schedule, ?️ = Ignore).

  • Air-Gapped Deployment

  • Installing software in an environment with no internet access. Requires:
    • Offline dependencies (e.g., pip download -r requirements.txt → burn to USB).
    • Manual approval chains (e.g., "This USB must be scanned by IT before use").
    • Local mirrors (e.g., apt-mirror for Debian packages, nexus for Maven).
  • Tools: dpkg, rpm, docker save/load, terraform init -backend-config=local.hcl.

  • Customer Escalation Playbook

  • A script for when things go wrong (e.g., "The system is down, and the customer is yelling"). Steps:
    1. Acknowledge ("I see the issue—we’re on it").
    2. Isolate ("Is this happening for all users or just one?").
    3. Triage ("Let’s check the logs—can you send me the last 100 lines?").
    4. Mitigate ("Here’s a workaround while we fix the root cause").
    5. Postmortem ("Here’s what happened, how we fixed it, and how we’ll prevent it").
  • Field Tip: Always record the call (with permission) for compliance.

  • Technical Debt in the Field

  • In the field, tech debt isn’t "we’ll fix it later"—it’s "this will break the mission." Examples:
    • Hardcoded credentials in a script (→ security breach).
    • No retry logic in a data pipeline (→ lost data during network blips).
    • No health checks (→ silent failures).
  • Field Tip: Use /health endpoints (e.g., FastAPI’s @app.get("/health")) and dead man’s switches (e.g., "If this job doesn’t run in 24h, alert the team").

  • Deployment Constraints

  • Common field restrictions:
    • No root access → Use sudo sparingly, or containerize with podman (rootless Docker).
    • No internet → Pre-download everything (e.g., docker pulldocker savedocker load).
    • No persistent storage → Use tmpfs or ephemeral volumes.
    • No outbound traffic → Whitelist IPs, use curl --resolve for testing.
  • Tools: strace, tcpdump, nslookup, dig.

  • Change Control Board (CCB)

  • A group (often IT/security) that must approve changes before deployment. In the field:
    • Submit changes early (e.g., "We’ll need to open port 443—here’s the ticket").
    • Have a fallback plan (e.g., "If CCB rejects this, we’ll use port 8443 instead").
    • Document everything (e.g., "CCB Ticket #12345 approved this change at 14:30").
  • Field Tip: Always carry a printed copy of approvals—digital records can disappear.

  • Field-Ready Documentation

  • Not Confluence or Notion—Markdown files in the repo (e.g., RUNBOOK.md, DEPLOYMENT.md).
  • Must include:
    • Prerequisites (e.g., "Python 3.9+, 4GB RAM").
    • Step-by-step deployment (e.g., "1. git clone, 2. pip install -r requirements.txt, 3. python main.py --config prod.yaml").
    • Troubleshooting (e.g., "If the app crashes, check /var/log/app.log").
  • Tools: mkdocs, pandoc, gitbook.

  • Customer Proxy

  • A single point of contact (POC) who translates between you and the customer. In the field:


    • Never go around them (e.g., don’t email the CEO directly).
    • Use them as a shield (e.g., "Can you ask the team if they’ve seen this error before?").
    • Keep them in the loop (e.g., "Here’s the status update—can you forward to the stakeholders?").
  • Field-Ready Testing

  • No time for unit tests—focus on integration and end-to-end (E2E) tests.
  • Examples:
    • Smoke test: curl http://localhost:8080/health200 OK.
    • Data validation: python validate_data.py --input customer_data.csv.
    • Load test: ab -n 1000 -c 100 http://localhost:8080/api.
  • Tools: pytest, locust, curl, jq.


Step-by-Step / Field Process


1. Discovery: Separate the Ask from the Infer

Goal: Understand what the customer thinks they need vs. what they actually need.
Steps: 1. Run a discovery script (Python/Pandas) to validate the "ask."
python
# Example: Validate if the customer's "daily sales report" is even possible
import pandas as pd
df = pd.read_csv("customer_data.csv")
print(f"Missing values: {df.isnull().sum()}")
print(f"Unique dates: {df['date'].nunique()}")
2. Interview stakeholders (use the 5 Whys technique):
- "Why do you need this report?" → "To track sales."
- "Why do you need to track sales?" → "To hit quarterly targets."
- "Why are you missing targets?" → "Because fraudulent transactions are inflating numbers."
- Infer: They don’t need a report—they need anomaly detection.
3. Write a one-pager (Markdown) summarizing:
- The ask (e.g., "Daily sales report").
- The infer (e.g., "Fraud detection model").
- The data sources (e.g., "CSV from ERP system").
- The constraints (e.g., "No cloud APIs, must run on-premise").

2. Build the Backlog (Customer + Engineering)

Goal: Create a prioritized list of work that balances customer needs and technical reality.
Steps: 1. List all asks (from discovery) in a Jira/GitHub Issues board.
- Label them [CUSTOMER] or [ENG].
- Example:
- [CUSTOMER] Add fraud detection to sales dashboard (? High priority).
- [ENG] Upgrade Python 3.7 → 3.11 (⏳ Schedule).
2. Prioritize using the Mission vs. Noise matrix (see Key Terms).
- Do Now: Critical bugs, mission-critical features.
- Schedule: Tech debt, non-urgent features.
- Delegate: Low-impact asks (assign to PM).
- Ignore: Noise (e.g., "Can we add a dark mode?").
3. Break down the top 3 "Do Now" items into field-ready tasks:
- Example for "Add fraud detection":
- [ENG] Write data validation script (Python).
- [ENG] Train anomaly detection model (scikit-learn).
- [CUSTOMER] Integrate model into dashboard (Streamlit).
- [ENG] Deploy model to on-premise server (Docker + docker save).

3. Deploy in the Customer’s Environment

Goal: Get the software running in the customer’s environment without breaking anything.
Steps: 1. Pre-deployment checklist:
- [ ] Dependencies: pip download -r requirements.txt (for air-gapped).
- [ ] Configuration: prod.yaml (not dev.yaml).
- [ ] Permissions: chmod 644 config.yaml.
- [ ] Health checks: curl http://localhost:8080/health.
2. Deploy in stages:
- Stage 1 (Dev): Deploy to a customer-provided dev environment (e.g., dev.customer.internal).
bash
scp -i ~/.ssh/customer_key app.tar.gz [email protected]:/tmp
ssh -i ~/.ssh/customer_key [email protected] "tar -xzf /tmp/app.tar.gz && cd app && ./deploy.sh"

- Stage 2 (Staging): Deploy to a mirror of prod (e.g., staging.customer.internal).
- Stage 3 (Prod): Deploy to prod (e.g., prod.customer.internal).
- Always have a rollback plan (e.g., git checkout v1.0 && ./deploy.sh).
3. Validate in prod:
- Smoke test: curl http://prod.customer.internal:8080/health.
- Data test: python validate_data.py --input prod_data.csv.
- User test: "Can you log in and click around?"

4. Handle Escalations (When Things Go Wrong)

Goal: Fix the issue, keep the customer calm, and document everything.
Steps: 1. Acknowledge the issue (even if you don’t know the cause yet):
- "I see the issue—we’re investigating." 2. Isolate the problem:
- Is it user error? (e.g., "Did you enter the correct API key?")
- Is it environment-specific? (e.g., "Does this happen in staging?")
- Is it data-specific? (e.g., "Can you send me the last 100 lines of logs?") 3. Triage:
- Check logs: tail -n 100 /var/log/app.log.
- Reproduce: curl -v http://localhost:8080/api.
- Mitigate: "Here’s a workaround while we fix the root cause." 4. Fix and deploy:
- Hotfix: git commit -m "fix: null pointer in API" && git push && ./deploy.sh.
- Rollback if needed: git checkout v1.0 && ./deploy.sh.
5. Postmortem:
- Write a Markdown doc (POSTMORTEM.md) with:
- What happened.
- Root cause.
- How it was fixed.
- How to prevent it in the future.

5. Maintain the System (Tech Debt & Upgrades)

Goal: Keep the system running without accumulating mission-breaking tech debt.
Steps: 1. Schedule tech debt sprints (e.g., "Every 3rd sprint is for upgrades").
- Example tasks:
- [ENG] Upgrade Python 3.7 → 3.11.
- [ENG] Add retry logic to data pipeline.
- [ENG] Rotate API keys.
2. Automate health checks:
- Add a /health endpoint (e.g., FastAPI):
python
@app.get("/health")
def health():
return {"status": "ok", "version": "1.0.0"}

- Set up dead man’s switches (e.g., "If this job doesn’t run in 24h, alert the team").
3. Document everything:
- Update RUNBOOK.md with new troubleshooting steps.
- Add deployment notes to DEPLOYMENT.md.


Common Mistakes


Mistake 1: Treating the Customer’s Ask as the Final Requirement

  • What happens: You build exactly what the customer asked for—only to find out it doesn’t solve their problem.
  • Correction:
  • Always validate the ask with data (e.g., "Does this report even make sense with the current data?").
  • Use the 5 Whys to uncover the real need.
  • Why: The customer often doesn’t know what they need—they know what they think they need.

Mistake 2: Ignoring the Engineering Backlog

  • What happens: You focus only on customer asks and ignore tech debt—until the system crashes during a critical mission.
  • Correction:
  • Split your backlog into [CUSTOMER] and [ENG].
  • Schedule tech debt sprints (e.g., "Every 3rd sprint is for upgrades").
  • Why: In the field, tech debt isn’t "we’ll fix it later"—it’s "this will break the mission."

Mistake 3: Assuming Your Lab Environment Matches the Customer’s

  • What happens: Your code works in your lab but fails in the customer’s environment (e.g., different Python version, missing dependencies, firewall rules).
  • Correction:
  • Test in the customer’s dev environment first.
  • Use the same OS, dependencies, and network conditions.
  • Why: What works in your lab will break behind their firewall.

Mistake 4: Not Having a Rollback Plan

  • What happens: You deploy a "small fix" that breaks prod, and now you’re scrambling to revert.
  • Correction:
  • Always have a rollback plan (e.g., git checkout v1.0 && ./deploy.sh).
  • Test the rollback before deploying.
  • Why: In the field, there are no do-overs—you must be able to revert in minutes.

Mistake 5: Not Documenting Deployments

  • What happens: The system breaks, and no one knows how to fix it because the deployment process was "in someone’s head."
  • Correction:
  • Write a DEPLOYMENT.md with step-by-step instructions.
  • Update it after every deployment.
  • Why: In the field, you won’t always be there—someone else must be able to deploy.


FDE Interview / War Story Insights


1. "You’re on site and the customer demands a feature that violates the original scope. How do you respond?"

  • What they’re testing: Can you push back while keeping the customer happy?
  • How to answer:
  • Acknowledge the request: "I understand why this is important."
  • Clarify the impact: "This would require X weeks of work and delay the current timeline."
  • Offer alternatives: "Here’s a workaround that gives you 80% of the value in 20% of the time."
  • Escalate if needed: "Let me check with my team—can we schedule a call to discuss?"
  • Field example: A customer demanded a real-time dashboard, but their data pipeline only updated daily. Instead of building a real-time system (which would take months), we added a "last updated" timestamp to the existing dashboard and set up email alerts for delays.

2. "The system is down, and the customer is yelling. What’s your first step?"

  • What they’re testing: Can you triage under pressure?
  • How to answer:
  • Acknowledge: "I see the issue—we’re on it."
  • Isolate: "Is this happening for all users or just one?"
  • Triage: "Can you send me the last 100 lines of logs?"
  • Mitigate: "Here’s a workaround while we fix the root cause."
  • Field example: During a go-live, the customer’s API started returning 500 errors. We checked the logs, found a null pointer, deployed a hotfix in 30 mins, and added a unit test to prevent it in the future.

3. "How do you prioritize work when everything is ‘urgent’?"

  • What they’re testing: Can you separate signal from noise?
  • How to answer:
  • Use the Mission vs. Noise matrix (see Key Terms).
  • Ask for impact: "What happens if we don’t do this?"
  • Negotiate: "We can do X now or Y later—which is more critical?"
  • Field example: A customer had 10 "urgent" requests. We ranked them by mission impact, did the top 3, and delegated the rest to the PM.

4. "You’re deploying to an air-gapped environment. What’s your first step?"

  • What they’re testing: Do you plan for constraints?
  • How to answer:
  • Pre-download dependencies: pip download -r requirements.txt.
  • Use offline tools: docker savedocker load.
  • Test in a mirror environment: "Can we get a VM that matches prod?"
  • Field example: Deploying to a classified DoD network required burning dependencies to a USB, getting it scanned by IT, and manually installing everything.


Quick Check Questions


1. You’re deploying to an environment where you can’t run standard Docker images due to security restrictions. What’s your first step?

  • Answer: Use rootless containers (e.g., podman) or build a custom image that complies with the customer’s security policies (e.g., no sudo, minimal base image).
  • Why: Docker requires root, which is often blocked in secure environments.

2. The customer’s "urgent" request will take 2 weeks, but they need it in 2 days. What do you do?

  • Answer: Negotiate scope—offer a minimum viable version (e.g., "We can give you 80% of the value in 2 days, and the rest in 2 weeks").
  • Why: In the field, perfect is the enemy of good—deliver something usable now and iterate later.

3. You’re debugging a production issue, but the customer’s logs are unreadable (e.g., ERROR: Something went wrong). What’s your next step?

  • Answer: Add structured logging (e.g., logging.error("Failed to connect to DB: %s", e)) and redeploy with debug mode (e.g., DEBUG=True).
  • Why: Unstructured logs are useless in the field—you need actionable data.


Last-Minute Cram Sheet

  1. Always validate the "ask" with data—use Python/Pandas to check if the customer’s request even makes sense.
  2. Split your backlog into [CUSTOMER] and [ENG]—ignore tech debt at your peril.
  3. Test in the customer’s environment first—what works in your lab will break behind their firewall. ⚠️
  4. Always have a rollback plangit checkout v1.0 && ./deploy.sh.
  5. Document everythingRUNBOOK.md, DEPLOYMENT.md, POSTMORTEM.md.
  6. Use the Mission vs. Noise matrix to prioritize—? = Do Now, ⏳ = Schedule, ?️ = Ignore.
  7. Air-gapped deployments? Pre-download everything: pip download -r requirements.txt, docker save, apt-mirror.
  8. No root access? Use podman (rootless Docker) or sudo sparingly.
  9. Escalation playbook: Acknowledge → Isolate → Triage → Mitigate → Postmortem.
  10. Field-ready testing: Smoke tests (curl /health), data validation (python validate.py), load tests (ab -n 1000).

Final Field Tip: In the field, your job isn’t to write perfect code—it’s to keep the mission alive. Prioritize ruthlessly, document everything, and always have a rollback plan. ?



ADVERTISEMENT