Fatskills
Practice. Master. Repeat.
Study Guide: Forward Deployed Engineer 101: Infrastructure as Code (Terraform, Ansible, CloudFormation)
Source: https://www.fatskills.com/forward-deployed-engineer-fde/chapter/forward-deployed-engineer-infrastructure-as-code-terraform-ansible-cloudformation

Forward Deployed Engineer 101: Infrastructure as Code (Terraform, Ansible, CloudFormation)

By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.

⏱️ ~8 min read

Infrastructure as Code (Terraform, Ansible, CloudFormation)



Infrastructure as Code (Terraform, Ansible, CloudFormation) – Field-Ready Study Guide


What This Is

Infrastructure as Code (IaC) is the practice of defining and managing compute, network, and storage resources through machine-readable files (not manual clicks or scripts). For a Forward Deployed Engineer (FDE), IaC is non-negotiable—it’s how you deploy repeatable, auditable, and secure systems in chaotic environments (e.g., a classified on-prem cluster for a defense customer, a disaster-response data pipeline in a war zone, or a last-minute hotfix during a go-live escalation). Example: You’re on-site at a DoD base where the customer’s air-gapped Kubernetes cluster keeps failing. Instead of debugging manually, you use Terraform to redeploy the entire stack with a single command, then Ansible to patch the nodes—all while documenting every change for the ATO (Authority to Operate) review.


Key Terms & Concepts

  • Terraform (TF): Declarative IaC tool for provisioning cloud/on-prem infrastructure (AWS, Azure, GCP, vSphere). Uses HCL (HashiCorp Configuration Language) and maintains state in a backend (S3, Consul, local file).
  • Ansible: Agentless configuration management tool for software installation, patching, and orchestration. Uses YAML playbooks and inventory files (static or dynamic). Critical for air-gapped environments (e.g., ansible-playbook -i inventory.ini site.yml).
  • CloudFormation (CFN): AWS-native IaC tool using JSON/YAML templates. Stacks are the unit of deployment; Drift Detection catches manual changes (common in classified environments where admins "tweak" things).
  • Idempotency: Running the same IaC code multiple times produces the same result. Non-idempotent operations (e.g., useradd without checks) break deployments.
  • State File (Terraform): A JSON file tracking the real-world state of your infrastructure. Never edit manually—corruption = outage. Use terraform state commands to modify safely.
  • Dynamic Inventory (Ansible): Automatically discovers hosts (e.g., AWS EC2 tags, Kubernetes nodes). Example: ansible -i aws_ec2.yml all -m ping.
  • Immutable Infrastructure: Servers are never modified after deployment—just replaced. Critical for security (e.g., DoD STIG compliance). Use packer to bake AMIs/images.
  • GitOps: IaC + Git workflows (e.g., ArgoCD, Flux). Field use: Deploy to a classified network by pushing to a local Git repo, then syncing via sneakernet.
  • Drift: When real infrastructure diverges from IaC definitions. Field fix: terraform plan -detailed-exitcode (exit code 2 = drift) → terraform apply -auto-approve.
  • Secrets Management: Never hardcode credentials. Use Vault, AWS Secrets Manager, or Ansible Vault (ansible-vault encrypt_string).
  • Modules (Terraform): Reusable IaC components (e.g., a "secure VPC" module for DoD IL5 compliance). Field tip: Vendor modules from trusted sources (e.g., Terraform Registry, DoD’s Iron Bank).
  • ATO (Authority to Operate): Government approval to deploy. IaC helps by generating audit trails (e.g., terraform show -json > ato_evidence.json).


Step-by-Step / Field Process

1. Discovery & Requirements Gathering

  • Action: Meet the customer (e.g., a SOC team, disaster-response NGO, or classified program office). Ask:
  • What’s the ask (e.g., "We need a Kubernetes cluster") vs. the infer (e.g., "They actually need a hardened, air-gapped cluster with FIPS 140-2 crypto").
  • What’s the deployment target? (Cloud? On-prem? Classified? Air-gapped?)
  • What’s the ATO status? (If none, IaC must generate artifacts for the RMF process.)
  • Tools: Pen/paper, Lucidchart (for network diagrams), Jira (to track requirements).
  • Field example: Customer says, "We need a database." You infer: "They need a PostgreSQL RDS instance with encryption at rest, IAM auth, and a 30-day backup retention policy for HIPAA compliance."

2. Design & IaC Scaffolding

  • Action:
  • Sketch the architecture (e.g., VPC → subnets → EKS → RDS → S3).
  • Choose tools:
    • Cloud? Terraform (multi-cloud) or CloudFormation (AWS-only).
    • On-prem? Ansible (for config) + Terraform (for VMs/storage).
    • Air-gapped? Use Terraform Enterprise (local backend) or Ansible with offline repos.
  • Write a minimal viable template:
    hcl
    # main.tf (Terraform)
    provider "aws" {
    region = "us-east-1"
    }
    resource "aws_vpc" "main" {
    cidr_block = "10.0.0.0/16"
    enable_dns_support = true
    tags = {
    Name = "mission-critical-vpc"
    }
    }
  • Field tip: Start with a single resource (e.g., a VPC) and test it before adding complexity.

3. Local Testing & Validation

  • Action:
  • Run terraform init (downloads providers) → terraform plan (dry run).
  • For Ansible: ansible-playbook --check --diff site.yml (dry run with diffs).
  • Validate syntax:
    • Terraform: terraform validate
    • CloudFormation: aws cloudformation validate-template --template-body file://template.yml
  • Field trap: Never test in production. Use a sandbox account or localstack (for AWS).
  • Example command:
    bash # Test Terraform locally with localstack export AWS_ACCESS_KEY_ID=test export AWS_SECRET_ACCESS_KEY=test terraform init terraform apply -auto-approve

4. Deployment & Iteration

  • Action:
  • First deploy: terraform apply (or ansible-playbook site.yml).
  • Monitor: Use terraform show or aws cloudformation describe-stack-events.
  • Debug:
    • Terraform: terraform state listterraform state show <resource>.
    • Ansible: ansible -m debug -a "var=hostvars[inventory_hostname]".
  • Field example: Customer’s on-prem vSphere cluster fails to provision VMs. You SSH into the ESXi host, check /var/log/vmkernel.log, and find a misconfigured datastore. Update the Terraform vsphere_virtual_machine resource and redeploy.
  • Pro tip: Use Terraform workspaces or Ansible tags to manage environments (dev/staging/prod).

5. Documentation & Handoff

  • Action:
  • Generate docs:
    • Terraform: terraform-docs markdown . > README.md.
    • Ansible: ansible-doc -t module <module_name>.
  • ATO artifacts: Export state (terraform show -json > evidence.json) and logs.
  • Runbook: Write a 1-pager with:
    • How to deploy (terraform apply).
    • How to debug (ansible -m ping all).
    • Who to call (customer’s SME, your team’s on-call).
  • Field example: For a classified deployment, burn the runbook to a CD-ROM and hand it to the ISSO (Information System Security Officer).


Common Mistakes

Mistake Correction Why
Hardcoding secrets (e.g., password = "admin123" in Terraform). Use Vault, AWS Secrets Manager, or Ansible Vault. Secrets in Git = security incident. Classified environments will fail ATO.
Ignoring state file backups (e.g., storing Terraform state locally). Use a remote backend (S3 + DynamoDB lock, Consul, Terraform Cloud). Local state = single point of failure. Lost state = manual rebuild.
Not testing in the customer’s environment (e.g., assuming AWS GovCloud = commercial AWS). Always test in the exact target environment. Use terraform plan to catch drift. GovCloud has different endpoints, IAM policies, and compliance rules.
Over-engineering (e.g., writing a 1,000-line Terraform module for a 1-week project). Start small, iterate. Use modules from trusted sources (e.g., Terraform Registry, DoD’s Iron Bank). Time is limited in the field. Perfect is the enemy of shipped.
Assuming idempotency (e.g., Ansible playbook that runs yum install without state: present). Always use idempotent modules (e.g., package module in Ansible). Non-idempotent operations break redeploys and violate ATO requirements.


FDE Interview / War Story Insights

1. The "We Need This Now" Escalation

  • Scenario: You’re on-site for a go-live. The customer’s CIO demands a new feature (e.g., a VPN gateway) that wasn’t in the original scope. The ATO is due in 24 hours.
  • How to respond:
  • Acknowledge: "I understand the urgency. Let’s assess the impact."
  • Scope: "This will require a new Terraform module and an ATO update. Here’s the timeline."
  • Mitigate: "We can deploy a temporary solution (e.g., OpenVPN on an EC2 instance) while we work on the permanent fix."
  • Document: "I’ll update the runbook and notify the ISSO."
  • Interviewer’s probe: "How do you balance speed and compliance?"

2. The Air-Gapped Deployment

  • Scenario: You’re deploying to a classified network with no internet access. The customer’s admins insist on manual installs.
  • Field approach:
  • Prep: Build a local mirror of dependencies (e.g., yum repo, Docker images, Terraform providers).
  • IaC: Use Ansible with offline repos or Terraform with a local backend.
  • Transfer: Burn to a DVD-ROM or use a data diode (one-way transfer).
  • Validate: ansible -i inventory.ini all -m ping (check connectivity).
  • Interviewer’s probe: "How do you handle dependency management in an air-gapped environment?"

3. The Drift Disaster

  • Scenario: The customer’s team manually "fixed" a misconfigured security group. Now Terraform fails with Error: Provider produced inconsistent final plan.
  • Field fix:
  • Detect drift: terraform plan -detailed-exitcode (exit code 2 = drift).
  • Import the resource: terraform import aws_security_group.bad_sg sg-12345678.
  • Reconcile: Update the Terraform config to match the real state.
  • Apply: terraform apply.
  • Interviewer’s probe: "How do you handle manual changes in a IaC-managed environment?"


Quick Check Questions

1. You’re deploying to an environment where you can’t run standard Docker images due to security restrictions. What’s your first step?

  • Answer: Check if the customer has an approved container registry (e.g., DoD’s Iron Bank, AWS ECR with image scanning). If not, build a custom image with their approved base OS (e.g., RHEL with STIGs applied) and push it to their local registry.
  • Why: Security restrictions often block public registries (Docker Hub). Always validate the customer’s container policy first.

2. A customer’s Terraform deployment fails with Error: Provider configuration not present. What’s the most likely cause?

  • Answer: The Terraform provider block is missing or misconfigured (e.g., wrong region, missing credentials). Run terraform init to reinitialize providers.
  • Why: Terraform needs explicit provider configurations (e.g., provider "aws" { region = "us-gov-west-1" } for GovCloud).

3. You’re using Ansible to patch 100 servers in a classified network, but the playbook fails on 5 hosts. What’s your next move?

  • Answer: Run ansible-playbook --limit @site.retry site.yml to retry only failed hosts. Then debug with ansible -m debug -a "var=ansible_facts" <failed_host>.
  • Why: Classified networks often have firewall rules or missing dependencies on specific hosts. Always isolate failures before re-running.


Last-Minute Cram Sheet

  1. Terraform:
  2. terraform initterraform plan -out=tfplanterraform apply tfplan.
  3. ⚠️ Never edit state manually—use terraform state mv or terraform import.
  4. Backend: terraform { backend "s3" { bucket = "my-bucket" key = "path/to/state" } }.
  5. Workspaces: terraform workspace new prod (for multi-environment deployments).

  6. Ansible:

  7. ansible-playbook -i inventory.ini site.yml --limit "webservers" (run on subset).
  8. Vault: ansible-vault encrypt_string "secret" --name "db_password".
  9. Dynamic inventory: ansible -i aws_ec2.yml all -m ping.

  10. CloudFormation:

  11. aws cloudformation deploy --template-file template.yml --stack-name my-stack.
  12. Drift detection: aws cloudformation detect-stack-drift --stack-name my-stack.
  13. Change sets: aws cloudformation create-change-set --stack-name my-stack --template-body file://template.yml.

  14. Field Traps:

  15. ⚠️ GovCloud ≠ Commercial AWS (different endpoints, IAM policies, compliance).
  16. ⚠️ Air-gapped = no internet (mirror all dependencies locally).
  17. ⚠️ ATO requires audit trails (export Terraform state as JSON).
  18. ⚠️ Idempotency is non-negotiable (always use state: present in Ansible).

  19. Ports & Protocols:

  20. SSH: 22 (Ansible, Terraform remote exec).
  21. HTTPS: 443 (Terraform providers, AWS APIs).
  22. WinRM: 5986 (Ansible for Windows hosts).
  23. Consul: 8500 (Terraform backend).

  24. Acronyms:

  25. ATO: Authority to Operate (government approval).
  26. RMF: Risk Management Framework (DoD security process).
  27. STIG: Security Technical Implementation Guide (DoD hardening standards).
  28. IAM: Identity and Access Management (AWS/GCP permissions).
  29. ACO: Approval Chain of Operations (who signs off on deployments).


ADVERTISEMENT