By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.
As an FDE, you’ll design, deploy, and debug databases in high-stakes environments—think classified networks, disaster response ops, or enterprise migrations where downtime = mission failure. You’re not just writing SQL; you’re modeling data for real-time decision-making, optimizing queries under security constraints, and fixing schema drift during a go-live crisis. Example: You’re on-site at a defense contractor when their analytics dashboard fails during a live demo. The root cause? A NoSQL collection missing an index, and the customer’s air-gapped network blocks your usual monitoring tools. You need to diagnose, patch, and validate in 30 minutes—without internet access.
OLAP (Online Analytical Processing): Complex aggregations, reporting, and analytics (e.g., Redshift, BigQuery, Snowflake). FDEs often bridge both (e.g., syncing OLTP data to OLAP for dashboards).
ACID vs. BASE
BASE (Basically Available, Soft state, Eventual consistency): Trade-offs for scalability (e.g., DynamoDB, Cassandra). Critical for distributed systems in the field (e.g., edge deployments with intermittent connectivity).
Schema-on-Write vs. Schema-on-Read
Schema-on-Read (NoSQL): Flexible schema (e.g., MongoDB, Parquet files). Useful for rapid prototyping or messy data (e.g., log files from a disaster zone).
Indexing Strategies
Composite indexes: Multi-column indexes (e.g., CREATE INDEX idx_name ON users(last_name, first_name)). ⚠️ Over-indexing kills write performance—test in the customer’s environment!
CREATE INDEX idx_name ON users(last_name, first_name)
Partitioning vs. Sharding
PARTITION BY RANGE (YEAR(created_at))
Sharding: Split data across multiple machines (e.g., MongoDB shards). Critical for scaling in distributed systems (e.g., global disaster response teams).
Materialized Views
Pre-computed query results stored as tables (e.g., CREATE MATERIALIZED VIEW mv_daily_metrics AS SELECT ...). Useful for slow OLAP queries, but need refresh strategies (e.g., REFRESH MATERIALIZED VIEW mv_daily_metrics).
CREATE MATERIALIZED VIEW mv_daily_metrics AS SELECT ...
REFRESH MATERIALIZED VIEW mv_daily_metrics
Change Data Capture (CDC)
Track database changes (e.g., Debezium, AWS DMS). Critical for syncing data between systems (e.g., replicating a classified database to an unclassified analytics environment).
Data Modeling Patterns
Document Model: Nested JSON (e.g., MongoDB). Good for hierarchical data (e.g., user profiles with nested preferences).
Security Constraints in the Field
pgcrypto
CREATE POLICY user_policy ON users USING (user_id = current_user_id())
Air-Gapped Deployments: No internet = no cloud-managed databases. Use embedded databases (e.g., SQLite, DuckDB) or offline-first sync (e.g., CouchDB).
Query Optimization Tools
EXPLAIN ANALYZE SELECT * FROM users WHERE id = 1
MongoDB Profiler: db.setProfilingLevel(1, { slowms: 50 }) to log slow queries.
db.setProfilingLevel(1, { slowms: 50 })
Data Pipeline Resilience
INSERT ... ON CONFLICT DO NOTHING
Dead Letter Queues (DLQ): Capture failed records for reprocessing (e.g., AWS SQS DLQ, Kafka dead-letter topics).
Customer Data Gotchas
MM/DD/YYYY
DD-MM-YYYY
JSON Schema
ALTER TABLE ... ADD COLUMN IF NOT EXISTS
Actions:- Ask vs. Infer: Don’t just take the customer’s word for it. Example: - Ask: “We need real-time analytics on user logins.” - Infer: Their “real-time” means “sub-5-minute latency,” and their “user logins” table is actually a CSV dump from a legacy system.- Map the Data Flow: - Draw a diagram of where data comes from (e.g., IoT sensors, legacy mainframe, user uploads) and where it goes (e.g., dashboards, ML models, reports). - Identify bottlenecks (e.g., a single-threaded Python script processing 10M rows nightly).- Check Security Constraints: - Is the environment air-gapped? Are there data residency requirements (e.g., EU data can’t leave Germany)? - What’s the ATO (Authorization to Operate) status? Some databases (e.g., MongoDB) may not be approved for classified networks.
Tools:- Whiteboard + Excalidraw (for air-gapped environments, use a physical whiteboard).- SQLite CLI (sqlite3 :memory:) to prototype schemas quickly.- DBeaver (offline-capable database GUI for air-gapped environments).
sqlite3 :memory:
Actions:- Choose SQL if: - Data is structured, relationships matter (e.g., user orders → products → inventory). - ACID compliance is required (e.g., financial transactions). - Example: PostgreSQL with RLS for a healthcare app.- Choose NoSQL if: - Data is hierarchical (e.g., user profiles with nested preferences). - Schema flexibility is critical (e.g., disaster response logs with unpredictable fields). - High write throughput is needed (e.g., IoT sensor data). - Example: MongoDB for a logistics app tracking shipments with dynamic attributes.- Optimize for Queries: - Denormalize if read performance is critical (e.g., duplicate user_name in an orders table to avoid joins). - Normalize if write performance is critical (e.g., avoid duplicate data in a high-frequency trading system).- Plan for Scale: - Partitioning: Split large tables (e.g., by date for time-series data). - Sharding: Distribute data across machines (e.g., MongoDB shards by region). - Caching: Use Redis for frequent queries (e.g., user sessions).
user_name
orders
region
Example Schema (PostgreSQL):
-- Users table (OLTP) CREATE TABLE users ( id SERIAL PRIMARY KEY, email VARCHAR(255) UNIQUE NOT NULL, created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW() ); -- Orders table (OLTP) with partitioning CREATE TABLE orders ( id SERIAL, user_id INTEGER REFERENCES users(id), amount DECIMAL(10, 2), status VARCHAR(20), created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() ) PARTITION BY RANGE (created_at); -- Create monthly partitions CREATE TABLE orders_y2023m01 PARTITION OF orders FOR VALUES FROM ('2023-01-01') TO ('2023-02-01'); -- Materialized view for analytics (OLAP) CREATE MATERIALIZED VIEW mv_daily_revenue AS SELECT DATE(created_at) AS day, SUM(amount) AS revenue, COUNT(*) AS orders FROM orders GROUP BY DATE(created_at);
Actions:- Air-Gapped Deployments: - Use offline installers (e.g., PostgreSQL .rpm files on a USB drive). - Pre-download dependencies (e.g., yum install --downloadonly postgresql). - Use embedded databases (e.g., SQLite, DuckDB) if the customer can’t manage a server.- Security Hardening: - Disable remote root login (e.g., ALTER USER postgres WITH NOLOGIN; in PostgreSQL). - Enable TLS (e.g., ssl = on in postgresql.conf). - Row-Level Security (RLS): Restrict data access (e.g., CREATE POLICY user_policy ON users USING (id = current_user_id())).- Backup & Recovery: - Automated backups (e.g., pg_dump + cron job). - Point-in-Time Recovery (PITR): Enable WAL archiving (e.g., wal_level = replica in PostgreSQL). - Test restores in a staging environment (customers often skip this—you must verify).
.rpm
yum install --downloadonly postgresql
ALTER USER postgres WITH NOLOGIN;
ssl = on
postgresql.conf
CREATE POLICY user_policy ON users USING (id = current_user_id())
pg_dump
wal_level = replica
Example (PostgreSQL Hardening):
# Disable remote root login sudo -u postgres psql -c "ALTER USER postgres WITH NOLOGIN;" # Enable TLS echo "ssl = on" >> /etc/postgresql/14/main/postgresql.conf echo "ssl_cert_file = '/etc/ssl/certs/postgresql.crt'" >> /etc/postgresql/14/main/postgresql.conf echo "ssl_key_file = '/etc/ssl/private/postgresql.key'" >> /etc/postgresql/14/main/postgresql.conf # Restart PostgreSQL sudo systemctl restart postgresql
Actions:- Profile Slow Queries: - Use EXPLAIN ANALYZE (e.g., EXPLAIN ANALYZE SELECT * FROM users WHERE email = '[email protected]'). - Check for full table scans (bad) vs. index usage (good).- Add Indexes (But Test First!): - Example: CREATE INDEX idx_users_email ON users(email); (but test in the customer’s environment—indexes slow down writes).- Optimize Joins: - Avoid SELECT *—fetch only needed columns. - Use covering indexes (indexes that include all columns in the query).- Cache Frequent Queries: - Use Redis for hot data (e.g., user sessions, dashboard metrics). - Example (Python + Redis): python import redis r = redis.Redis(host='localhost', port=6379, db=0) user_data = r.get(f"user:{user_id}") if not user_data: user_data = db.query("SELECT * FROM users WHERE id = %s", user_id) r.setex(f"user:{user_id}", 3600, user_data) # Cache for 1 hour
EXPLAIN ANALYZE
EXPLAIN ANALYZE SELECT * FROM users WHERE email = '[email protected]'
CREATE INDEX idx_users_email ON users(email);
SELECT *
python import redis r = redis.Redis(host='localhost', port=6379, db=0) user_data = r.get(f"user:{user_id}") if not user_data: user_data = db.query("SELECT * FROM users WHERE id = %s", user_id) r.setex(f"user:{user_id}", 3600, user_data) # Cache for 1 hour
Actions:- Idempotency: Ensure reprocessing doesn’t duplicate data. - Example (PostgreSQL): sql INSERT INTO processed_files (file_name, processed_at) VALUES ('data_2023.csv', NOW()) ON CONFLICT (file_name) DO NOTHING; - Dead Letter Queues (DLQ): Capture failed records for manual review. - Example (AWS SQS): ```python import boto3 sqs = boto3.client('sqs') dlq_url = "https://sqs.us-east-1.amazonaws.com/123456789012/my-dlq"
sql INSERT INTO processed_files (file_name, processed_at) VALUES ('data_2023.csv', NOW()) ON CONFLICT (file_name) DO NOTHING;
# Move failed messages to DLQ response = sqs.receive_message(QueueUrl=queue_url, MaxNumberOfMessages=10) for message in response.get('Messages', []): try: process_message(message) except Exception as e: sqs.send_message(QueueUrl=dlq_url, MessageBody=message['Body']) sqs.delete_message(QueueUrl=queue_url, ReceiptHandle=message['ReceiptHandle']) ```
bash docker run -d --name postgres-exporter \ -e DATA_SOURCE_NAME="postgresql://user:pass@localhost:5432/?sslmode=disable" \ -p 9187:9187 \ prometheuscommunity/postgres-exporter
Actions:- Reproduce the Issue: - SSH into the bastion host → tail logs (tail -f /var/log/postgresql/postgresql-14-main.log). - Reproduce the customer’s query (e.g., psql -U user -d db -c "SELECT * FROM slow_query;").- Quick Fixes: - Add an index (if the query is slow). - Kill a runaway query (e.g., SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE query LIKE '%slow_query%';). - Roll back a schema change (e.g., ALTER TABLE users DROP COLUMN new_column;).- Validate the Fix: - Run the customer’s workflow end-to-end. - Check for side effects (e.g., did the index break another query?).- Document & Handoff: - Write a runbook for the customer (e.g., “How to monitor slow queries”). - Example (Markdown runbook): ```markdown # Monitoring Slow Queries
tail -f /var/log/postgresql/postgresql-14-main.log
psql -U user -d db -c "SELECT * FROM slow_query;"
SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE query LIKE '%slow_query%';
ALTER TABLE users DROP COLUMN new_column;
## Steps 1. Log into the database: `psql -U admin -d production` 2. Check active queries: `SELECT pid, query, now() - query_start AS duration FROM pg_stat_activity WHERE state = 'active';` 3. Kill a query: `SELECT pg_terminate_backend(<pid>);` ## Common Fixes - Add an index: `CREATE INDEX idx_name ON table(column);` - Vacuum tables: `VACUUM ANALYZE table;` ```
SELECT COUNT(*) FROM users WHERE email IS NULL;
Scenario: You’re on-site for a go-live, and the customer’s CTO demands a last-minute feature that wasn’t in the original contract. How do you respond? Answer:- Acknowledge the ask: “I understand this is critical for your team.” - Clarify the impact: “Adding this now would delay the go-live by 2 weeks and require re-testing the entire system.” - Offer alternatives: “We can prioritize this in the next sprint, or we can deploy a minimal version today and iterate.” - Escalate if needed: If the customer insists, loop in your project manager to discuss timeline/trade-offs.
Why? FDEs must balance customer satisfaction with mission success—scope creep kills projects.
Scenario: A query runs in 100ms in staging but takes 10s in production. What do you check first? Answer:1. Data volume: SELECT COUNT(*) FROM table; (production may have 100x more data).2. Index differences: SHOW INDEXES FROM table; (staging may have extra indexes).3. Network latency: ping db-server (production may be cross-region).4. Resource contention: top or htop (other processes may be hogging CPU/memory).5. Configuration: SHOW ALL; (production may have different work_mem or shared_buffers).
SELECT COUNT(*) FROM table;
SHOW INDEXES FROM table;
ping db-server
top
htop
SHOW ALL;
work_mem
shared_buffers
Why? Staging ≠ production. Always test in the exact customer environment.
Scenario: Your usual monitoring stack (e.g., Datadog, New Relic) won’t work in the customer’s air-gapped environment. What do you deploy instead? Answer:- Prometheus + Grafana (can run offline, no cloud dependency).- pgBadger (PostgreSQL log analyzer, works offline).- Custom scripts (e.g., Python + cron to scrape metrics and email alerts).
Why? Air-gapped environments require offline-first tools.
Answer: Use offline installers (e.g., PostgreSQL .rpm files on a USB drive) and embedded databases (e.g., SQLite) if the customer can’t manage a server.Explanation: Air-gapped environments require physical media and minimal dependencies.
Answer: Data volume (production has 100x more data) or network latency (cross-region reads).Explanation: Staging ≠ production—always test in the exact customer environment.
Answer: NoSQL (e.g., MongoDB) because the schema is flexible and hierarchical.Explanation: NoSQL handles unpredictable data better than rigid SQL schemas.
5432
3306
27017
REFRESH MATERIALIZED VIEW mv_name;
CREATE POLICY policy_name ON table USING (condition);
Join 4M+ learners. Unlock unlimited quizzes, wrong-answer tracking, flashcards + reminders, study guides, and 1-on-1 challenges.