By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.
Hyper-practical, zero-fluff guide for engineers, cloud ops, and certification prep
You’re a security engineer at a mid-sized e-commerce company. Last night, your primary database server crashed due to a failed disk. The outage lasted 6 hours, and your team scrambled to restore from backups. Customers couldn’t check out, orders were lost, and your CFO is now demanding answers:
If you can’t answer these before a disaster, you’re flying blind. A Business Impact Analysis (BIA) is your disaster recovery playbook—it forces you to quantify risks, set recovery targets, and justify budgets. Ignore it, and you’ll either: - Over-spend (e.g., replicating every database to 3 regions when 1 would suffice). - Under-prepare (e.g., assuming a 1-hour RTO when your restore process takes 8 hours).
This guide will teach you how to: ? Run a BIA (step-by-step, with templates). ? Set realistic RTO/RPO/MTTR (and avoid "aspirational" targets that fail in production). ? Map recovery strategies to business needs (e.g., "Do we need hot standby, or is a daily backup enough?"). ? Pass Security+ questions (which love testing RTO vs. RPO traps).
Goal: List what the business can’t live without.
How:1. Interview stakeholders: - "What processes generate revenue?" (e.g., checkout, inventory management). - "What’s required for compliance?" (e.g., PCI DSS, GDPR). - "What’s legally required?" (e.g., payroll, tax filings).2. Template:
? Pro Tip: - Start with revenue-generating functions—they’re usually the most critical. - Avoid "everything is critical"—prioritize or you’ll drown in work.
Goal: Identify what breaks if a system fails.
How:1. For each critical function, list: - Hardware (servers, switches, load balancers). - Software (databases, APIs, third-party services). - People (who can fix it? Are they on call?). - Data (where is it stored? Is it backed up?).2. Example (Order Processing):
? Pro Tip: - Use a dependency map (e.g., Lucidchart) to visualize relationships. - Flag SPOFs—these are your highest-risk items.
Goal: Assign dollar values to downtime.
How:1. Financial Impact: - "How much revenue is lost per hour?" (e.g., $10K/hour for order processing). - "What’s the cost of overtime to recover?" (e.g., $200/hour for engineers).2. Reputational Impact: - "Will customers leave?" (e.g., 5% churn if down > 4 hours). - "Will we lose contracts?" (e.g., SLA penalties of $50K/day).3. Legal/Compliance Impact: - "Will we be fined?" (e.g., GDPR = 4% of global revenue). - "Will we face lawsuits?" (e.g., breach of contract).4. Template:
? Pro Tip: - Use real numbers—stakeholders care about dollars, not "high/medium/low." - Prioritize by total impact—this justifies budget for redundancy.
Goal: Define recovery objectives based on impact.
How:1. RTO (Recovery Time Objective): - "How fast must this recover?" - Rule of thumb: - Critical systems (e.g., payment processing): 5–30 minutes. - Important but not urgent (e.g., reporting): 4–24 hours. - Non-critical (e.g., internal wiki): 1–7 days.2. RPO (Recovery Point Objective): - "How much data can we lose?" - Rule of thumb: - 0 RPO: Synchronous replication (e.g., databases). - 15–60 min RPO: Asynchronous replication (e.g., hourly backups). - 24-hour RPO: Daily backups (e.g., logs, archives).3. MTTR (Mean Time To Repair): - "How long does recovery actually take?" - Measure this in drills—don’t guess!4. Template:
? Pro Tip: - RTO-MTTR—if MTTR > RTO, your plan is broken. - Test RPO with backups—restore a database and check how much data is lost.
Goal: Match recovery methods to RTO/RPO.
? Pro Tip: - Start with the cheapest option that meets RTO/RPO. - Document failover steps—don’t rely on tribal knowledge.
Goal: Test your plan without breaking production.
How:1. Scenario: "Your primary database fails at 2 AM. Walk through recovery."2. Questions to ask: - Who gets paged? (On-call rotation?) - Where are backups stored? (Are they encrypted?) - How long does restore take? (Test it!) - What’s the fallback plan if restore fails? (Manual process?)3. Template:
? Pro Tip: - Run these quarterly—people forget, and systems change. - Record the exercise—use it to improve documentation.
c5.4xlarge
Environment=DR
RTO=15min
Trap: "RTO is about data" (No—it’s about downtime).
"Which recovery strategy has the fastest RTO?"
Trap: "Pilot light" is not a hot site (it’s minimal standby).
"What’s the first step in a BIA?"
Trap: "Assess risks" (that’s risk assessment, not BIA).
"Your RPO is 1 hour. What’s the best backup strategy?"
Trap: "Real-time replication" (overkill for 1-hour RPO).
"What’s the relationship between MTD, RTO, and WRT?"
"Your company’s order processing system has: - RTO = 30 minutes - RPO = 5 minutes - Current MTTR = 2 hours (manual restore from backups).
You’re using AWS RDS PostgreSQL. How do you meet the RTO/RPO targets?"
Why it works: - Multi-AZ reduces RTO to < 2 minutes (meets 30-minute target). - Read replicas keep RPO at ~5 minutes (asynchronous replication lag). - Testing ensures MTTR stays low.
Join 4M+ learners. Unlock unlimited quizzes, wrong-answer tracking, flashcards + reminders, study guides, and 1-on-1 challenges.