Fatskills
Practice. Master. Repeat.
Study Guide: Terraform Refresh and Drift Detection - Zero-Fluff, Hands-On Guide
Source: https://www.fatskills.com/cloud-application-developer/chapter/tech-terraform-refresh-drift-detection-zero-fluff-hands-on-guide

Terraform Refresh and Drift Detection - Zero-Fluff, Hands-On Guide

By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.

⏱️ ~7 min read

Terraform Refresh & Drift Detection: Zero-Fluff, Hands-On Guide

For engineers who need to keep infrastructure in sync with reality—before it breaks in production.


1. What This Is & Why It Matters

Refresh in Terraform means syncing your state file (terraform.tfstate) with the actual cloud resources. Drift detection is the process of identifying when real-world infrastructure diverges from what’s defined in your Terraform code.

Why This Matters in Production

  • Scenario 1: You inherit a legacy Terraform repo. Someone manually changed an EC2 instance type in the AWS Console. Now, your next terraform apply either fails (because the state doesn’t match) or silently overwrites the manual change.
  • Scenario 2: A security team updates a Security Group rule via AWS CLI. Your Terraform code still says "allow port 80," but the real rule now allows port 8080. Drift = security risk.
  • Scenario 3: You’re on-call. A terraform plan shows no changes, but the app is broken. Drift is the silent killer of IaC reliability.

Superpower: Refresh and drift detection let you: ? Catch manual changes before they cause outages. ? Automate compliance (e.g., "No public S3 buckets allowed"). ? Avoid "works on my machine" disasters when deploying to prod.


2. Core Concepts & Components

1. terraform.tfstate (State File)

  • Definition: A JSON file tracking the last known state of your infrastructure.
  • Production Insight: If this file is corrupted or outdated, Terraform can’t detect drift. Always back it up (e.g., S3 + DynamoDB locking).

2. terraform refresh

  • Definition: Updates the state file to match real-world resources without modifying infrastructure.
  • Production Insight: Run this before terraform plan to avoid false negatives (e.g., "No changes detected" when drift exists).

3. terraform plan -refresh-only (Terraform 0.15.4+)

  • Definition: Shows what would change if you ran terraform refresh, without actually updating the state.
  • Production Insight: Use this in CI/CD to audit drift before applying changes.

4. Drift

  • Definition: When real-world infrastructure differs from Terraform’s state.
  • Production Insight: Drift is inevitable in shared environments. Automate detection (e.g., nightly terraform plan -refresh-only).

5. terraform state Commands

  • Definition: Subcommands to manually inspect/modify the state file (e.g., terraform state list, terraform state show).
  • Production Insight: Use terraform state rm to remove resources from state (e.g., when deleting a resource manually).

6. Remote State (S3, Terraform Cloud, etc.)

  • Definition: Storing state remotely (not locally) to enable team collaboration.
  • Production Insight: Always enable state locking (e.g., DynamoDB for S3) to prevent race conditions.

7. ignore_changes (Lifecycle Meta-Argument)

  • Definition: Tells Terraform to ignore specific attributes during drift detection.
  • Production Insight: Use sparingly! Example: Ignoring tags if they’re managed externally. hcl lifecycle { ignore_changes = [tags] }

8. terraform import

  • Definition: Brings existing resources under Terraform management.
  • Production Insight: Critical for brownfield migrations (e.g., "We have 100 EC2 instances not in Terraform").

3. Step-by-Step Hands-On: Detect & Fix Drift

Prerequisites

  • AWS account with admin IAM permissions.
  • Terraform installed (>= 1.0.0).
  • An existing EC2 instance (not managed by Terraform).

Goal

Detect drift on an EC2 instance that was manually modified in the AWS Console.


Step 1: Create a Terraform Config for the EC2 Instance

# main.tf
provider "aws" {
  region = "us-east-1"
}

resource "aws_instance" "example" {
  ami           = "ami-0c55b159cbfafe1f0" # Amazon Linux 2
  instance_type = "t2.micro"
  tags = {
    Name = "terraform-example"
  }
}

Step 2: Import the Existing EC2 Instance

  1. Find the instance ID in the AWS Console (e.g., i-1234567890abcdef0).
  2. Run: bash terraform import aws_instance.example i-1234567890abcdef0
  3. Verify the state: bash terraform state show aws_instance.example

Step 3: Manually Introduce Drift

  1. Go to the AWS Console-EC2-Instances.
  2. Change the instance type from t2.micro to t2.small.
  3. Add a tag: Environment = "staging".

Step 4: Detect Drift

Run a refresh-only plan:

terraform plan -refresh-only

Expected Output:

aws_instance.example: Refreshing state... [id=i-1234567890abcdef0]

Terraform detected the following changes made outside of Terraform:

  # aws_instance.example has been changed
  ~ resource "aws_instance" "example" {
        id               = "i-1234567890abcdef0"
      ~ instance_type    = "t2.micro" -> "t2.small"
      ~ tags             = {
          + "Environment" = "staging"
            "Name"        = "terraform-example"
        }
        # (other unchanged attributes)
    }

This is a refresh-only plan, so Terraform will not take any actions to undo these. If you were expecting these changes then you can apply this plan to record the updated values in the Terraform state without changing any remote objects.

Step 5: Fix Drift (Option 1: Revert to Terraform State)

terraform apply

Terraform will revert the instance type to t2.micro and remove the Environment tag.

Step 6: Fix Drift (Option 2: Update Terraform Config)

If the manual change was intentional, update main.tf:

resource "aws_instance" "example" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t2.small" # Updated to match reality
  tags = {
    Name        = "terraform-example"
    Environment = "staging" # Added
  }
}

Then run:

terraform apply

4.-Production-Ready Best Practices

Security

  • State File Protection: Store state in S3 with:
  • Encryption at rest (SSE-S3 or SSE-KMS).
  • Versioning enabled (to recover from corruption).
  • DynamoDB locking (to prevent concurrent writes).
  • Least Privilege: Use IAM roles with minimal permissions for Terraform (e.g., no * in policies).

Cost Optimization

  • Drift Detection Automation: Run terraform plan -refresh-only in CI/CD (e.g., GitHub Actions) to catch unauthorized changes early.
  • Ignore Ephemeral Attributes: Use ignore_changes for attributes like tags if they’re managed externally (e.g., by a tagging tool).

Reliability & Maintainability

  • Tagging: Always tag resources for cost tracking and ownership. hcl tags = { Owner = "team-infra" Environment = "prod" Terraform = "true" }
  • State Backups: Enable S3 versioning and take periodic backups (e.g., aws s3 cp s3://my-bucket/terraform.tfstate ./backups/).

Observability

  • Logging: Enable Terraform debug logs for drift investigations: bash export TF_LOG=DEBUG terraform plan -refresh-only
  • Alerting: Set up CloudWatch alarms for terraform plan failures in CI/CD.

5. Common Mistakes & Traps

Mistake Symptom Fix/Prevention
Not using -refresh-only terraform plan shows no changes, but infrastructure is broken. Always run terraform plan -refresh-only before apply.
Manual changes in prod Drift causes outages during deployments. Enforce "Terraform-only" changes via IAM policies (deny Console/CLI access).
State file corruption terraform plan fails with "state mismatch". Enable S3 versioning + DynamoDB locking. Test state recovery in staging.
Ignoring ignore_changes Terraform keeps reverting valid manual changes. Use ignore_changes for attributes managed outside Terraform (e.g., tags).
No state backups Accidental terraform state rm deletes critical resources. Enable S3 versioning + automate backups (e.g., aws s3 sync).

6.-Exam/Certification Focus

Typical Question Patterns

  1. "What does terraform refresh do?"
  2. Trap: "It updates infrastructure to match the state file." (? Wrong! It updates the state file to match reality.)
  3. Correct: "It syncs the state file with real-world resources."

  4. "How do you detect drift?"

  5. Trap: "Run terraform apply." (? This fixes drift, not detects it.)
  6. Correct: "Run terraform plan -refresh-only."

  7. "What happens if you delete a resource manually?"

  8. Trap: "Terraform will recreate it on the next apply." (? Only if the resource is still in the state file.)
  9. Correct: "Terraform will show the resource as 'to be created' in the next plan. Use terraform state rm to remove it from state first."

Key Distinctions

Concept What It Does Exam Trap
terraform refresh Updates state file to match reality. Confused with terraform apply (which modifies infrastructure).
terraform plan -refresh-only Shows drift without modifying state. Confused with terraform plan (which shows both drift and proposed changes).
ignore_changes Tells Terraform to ignore specific attributes during drift detection. Overused (e.g., ignoring security_group_ids can hide security risks).

7.-Hands-On Challenge

Scenario: You have an S3 bucket (my-company-logs) that was manually configured to enable versioning. Your Terraform code doesn’t have versioning enabled. Detect and fix the drift.

Solution:
1. Import the bucket into state: bash terraform import aws_s3_bucket.logs my-company-logs
2. Run a refresh-only plan: bash terraform plan -refresh-only
3. Update your Terraform config to match reality: hcl resource "aws_s3_bucket" "logs" { bucket = "my-company-logs" versioning { enabled = true # Added to match manual change } }
4. Apply: bash terraform apply

Why It Works: terraform import brings the bucket under Terraform management, and plan -refresh-only reveals the drift. Updating the config ensures future apply commands won’t revert the change.


8.-Rapid-Reference Crib Sheet

Command/Snippet Purpose Exam Trap
terraform plan -refresh-only Detect drift without modifying state. Not the same as terraform plan (which shows proposed changes too).
terraform refresh Update state file to match reality. Doesn’t modify infrastructure (unlike apply).
terraform state list List all resources in state. Doesn’t show drift—only what’s in state.
terraform state show aws_instance.web Inspect a specific resource’s state. Shows state, not real-world values (use plan -refresh-only for drift).
terraform import aws_s3_bucket.logs my-bucket Bring an existing resource under Terraform management. Doesn’t generate config—you must write it manually.
lifecycle { ignore_changes = [tags] } Ignore drift for specific attributes. Overuse can hide critical drift (e.g., security groups).
terraform state rm aws_instance.old Remove a resource from state (e.g., after manual deletion). Doesn’t delete the resource—just removes it from Terraform’s tracking.

9.-Where to Go Next

  1. Terraform State Documentation – Official guide to state management.
  2. Terraform Import Tutorial – Hands-on guide to importing resources.
  3. AWS Drift Detection with Terraform – AWS’s approach to drift detection.
  4. Terraform Best Practices (Gruntwork) – Production-grade state management.