Fatskills
Practice. Master. Repeat.
Study Guide: Forward Deployed Engineer 101: Finance and Anti‑Money Laundering (AML) (Transaction Monitoring, KYC, Risk Scoring)
Source: https://www.fatskills.com/forward-deployed-engineer-fde/chapter/forward-deployed-engineer-finance-and-antimoney-laundering-aml-transaction-monitoring-kyc-risk-scoring

Forward Deployed Engineer 101: Finance and Anti‑Money Laundering (AML) (Transaction Monitoring, KYC, Risk Scoring)

By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.

⏱️ ~9 min read

Finance and Anti‑Money Laundering (AML) (Transaction Monitoring, KYC, Risk Scoring)


Finance & Anti-Money Laundering (AML) – Field-Ready Study Guide for Forward Deployed Engineers (FDEs)


What This Is

AML systems detect and prevent financial crimes like money laundering, fraud, and terrorist financing. As an FDE, you’ll deploy transaction monitoring, KYC (Know Your Customer), and risk-scoring systems in high-stakes environments—often on-premise, in air-gapped networks, or under strict regulatory constraints. Field example: You’re at a bank’s SOC (Security Operations Center) during a live breach. The customer’s legacy AML system flags 10,000 false positives daily, drowning analysts in noise. You must: 1. Diagnose why the model is overfitting (e.g., outdated thresholds, poor feature engineering).
2. Hotfix by writing a Python script to recalibrate risk scores in real time.
3. Deploy the fix in a locked-down environment where you can’t use cloud services or standard CI/CD pipelines.
4. Train the SOC team to validate the new alerts before the regulator audits them tomorrow.

This guide gives you the practical, field-tested playbook to execute under pressure.


Key Terms & Concepts

  • Transaction Monitoring: Automated systems that flag suspicious financial activity (e.g., structuring, layering, smurfing). Tools: Actimize, SAS AML, Python (Pandas, Scikit-learn), Spark.
  • KYC (Know Your Customer): Verifying customer identities to prevent fraud. Includes ID checks, watchlist screening (OFAC, PEP lists), and risk scoring. Tools: Trulioo, Jumio, Onfido, custom Python/Flask APIs.
  • Risk Scoring: Assigning a numerical risk level to customers/transactions based on behavior, geography, and other factors. Rule-based (e.g., "transactions >$10K = high risk") vs. ML-based (XGBoost, LightGBM).
  • Suspicious Activity Report (SAR): A mandatory report filed with regulators (e.g., FinCEN) when potential money laundering is detected. ⚠️ False positives waste time; false negatives get you fined.
  • Watchlist Screening: Checking customers/transactions against sanctions lists (OFAC, UN, EU). Tools: Refinitiv World-Check, Dow Jones Risk & Compliance, custom SQL queries.
  • Air-Gapped AML: Deploying AML systems in environments with no internet access (e.g., military banks, classified networks). Requires offline model training, manual dependency management, and sneaker-net updates.
  • Regulatory Sandbox: A controlled environment where regulators allow testing of new AML tech without full compliance penalties. Useful for prototyping but not for production.
  • False Positive Rate (FPR): % of legitimate transactions flagged as suspicious. Goal: <5% FPR while catching >90% of true positives.
  • Feature Store: Centralized repository for AML features (e.g., "transaction velocity," "geographic risk"). Tools: Feast, Tecton, custom PostgreSQL tables.
  • Model Drift: When AML models degrade over time due to changing criminal tactics or economic conditions. Monitor with Evidently AI, Arize, or custom Python scripts.
  • Explainability (XAI): Making AML model decisions interpretable for regulators. Tools: SHAP, LIME, custom rule-based overrides.
  • Deployment Constraints: Common in finance:
  • No cloud (must run on-premise).
  • No Docker (use VMs or bare metal).
  • No root access (deploy as a non-privileged user).
  • Data residency laws (e.g., EU GDPR, China’s PIPL).


Step-by-Step / Field Process


1. Discovery: Understand the Customer’s Real Problem (Not Just Their Ask)

Actions:
- Interview stakeholders (compliance officers, SOC analysts, IT admins) to map: - What’s the current pain point? (e.g., "We file 50 SARs/day, but 40 are false positives.") - What’s the regulatory pressure? (e.g., "FinCEN fined us $2M last quarter for missed SARs.") - What’s the tech stack? (e.g., "We use Actimize on-premise, no cloud, no Python 3.9+.") - Audit the data:
bash # Quick data sanity check (run on a bastion host) head -n 1000 transactions.csv | awk -F, '{print $5}' | sort | uniq -c # Check transaction amounts grep -i "fraud" alerts.log | wc -l # Count current alerts - Infer the real need: The customer says, "We need a new ML model." You infer: "They actually need better feature engineering to reduce false positives."

2. Build a Minimal Viable Fix (MVF) – Not a Perfect Solution

Actions:
- Start with rules, not ML:
python # Example: Rule-based risk scoring (run in a Jupyter notebook on-site) def calculate_risk_score(transaction):
score = 0
if transaction.amount > 10000: score += 50
if transaction.country in ["IR", "KP", "SY"]: score += 100
if transaction.frequency > 10: score += 30
return score
- Validate with the SOC team:
- Show them 10 flagged transactions and ask: "Would you investigate these?" - Adjust thresholds based on their feedback.
- Deploy the MVF:
- If no CI/CD: scp the script to the server and run it as a cron job.
- If no Python: Rewrite in SQL (PostgreSQL PL/pgSQL) or Java (for Actimize plugins).

3. Deploy in a Constrained Environment

Actions:
- Check the environment:
bash # On the target server: uname -a # Check OS df -h # Check disk space python --version # Check Python version - Handle dependencies offline:
- Download .whl files for Python packages (e.g., pandas, scikit-learn) on a machine with internet, then transfer via USB.
- Use pip install --no-index --find-links=/path/to/wheels pandas.
- Deploy the model:
- If no Docker: Use a systemd service or Windows Task Scheduler.
- If no root: Deploy to ~/app/ and run as the user.
```bash # Example systemd service (save as /etc/systemd/system/aml-risk-scoring.service) [Unit] Description=AML Risk Scoring Service After=network.target

[Service] User=amluser WorkingDirectory=/home/amluser/app ExecStart=/usr/bin/python3 /home/amluser/app/risk_scoring.py Restart=always

[Install] WantedBy=multi-user.target Then:bash sudo systemctl daemon-reload sudo systemctl start aml-risk-scoring sudo systemctl enable aml-risk-scoring ```

4. Monitor and Iterate

Actions:
- Log everything:
python import logging logging.basicConfig(filename='/var/log/aml/risk_scoring.log', level=logging.INFO) logging.info(f"Transaction {txn_id} scored {score}") - Set up alerts for model drift:
python # Example: Monitor average risk score over time if abs(current_avg_score - baseline_avg_score) > 0.2:
send_alert("Model drift detected!")
- Train the SOC team:
- Walk them through the new alerts: "This is why we flagged this transaction." - Give them a runbook for false positives: "If the customer is a known charity, mark as 'false positive' and adjust the rule."

5. Prepare for the Regulator

Actions:
- Document everything:
- Why you chose certain thresholds.
- How you validated the model (e.g., "Tested on 3 months of historical data, FPR = 3%").
- Who approved the changes (get sign-off from compliance).
- Generate reports for auditors:
sql -- Example: SAR filing report (run in PostgreSQL) SELECT
customer_id,
transaction_id,
risk_score,
reason_for_flag,
investigator_notes FROM alerts WHERE is_sar_filed = TRUE AND date >= '2023-01-01';
- Mock audit: Have the compliance team grill you like a regulator would.


Common Mistakes

Mistake Correction Why
Assuming the customer’s data is clean. Always run a data audit first (e.g., df.describe(), df.isnull().sum()). Real-world data is messy: missing values, duplicates, incorrect formats.
Building a complex ML model when rules would suffice. Start with simple rules, then add ML if needed. Regulators hate "black box" models. Rules are easier to explain and debug.
Ignoring deployment constraints until the last minute. Day 1: Ask, "Can we use Docker? Python 3.9? Cloud?" You don’t want to rewrite your model in Java because the bank’s IT team won’t approve Python.
Not validating with the SOC team early. Show them 10 flagged transactions on Day 2. If they say, "These are all garbage," you need to pivot fast.
Forgetting to monitor for model drift. Set up automated drift detection (e.g., Evidently AI). Criminals change tactics; your model must adapt.


FDE Interview / War Story Insights


1. The "We Need This Yesterday" Trap

Scenario: The CTO says, "We need real-time transaction monitoring by next week, or we’ll fail our audit." How to respond:
- Clarify: "What’s the minimum viable solution? Can we start with batch processing and add real-time later?" - Push back on scope: "If we rush, we’ll deploy a model with 50% FPR, which will make the problem worse." - Propose a phased approach:
1. Week 1: Deploy rule-based monitoring.
2. Week 2: Add ML for high-risk segments.
3. Week 3: Optimize for real-time.

2. The "Our Data Is Perfect" Lie

Scenario: The customer insists, "Our data is clean—no need to validate it." How to respond:
- Show, don’t tell: Run a quick script to find nulls, duplicates, or outliers.
python print(df.isnull().sum()) print(df.duplicated().sum()) - Frame it as risk: "If the data is wrong, the model will be wrong, and the regulator will fine us."

3. The "We Can’t Change Our Legacy System" Excuse

Scenario: The bank uses Actimize 2010, and they won’t upgrade.
How to respond:
- Work within constraints: Write a Python script that exports data from Actimize, processes it, and re-imports it.
- Automate the workaround:
bash # Example: Export data from Actimize, process it, re-import /opt/actimize/bin/export_transactions.sh > transactions.csv python3 process_transactions.py transactions.csv > alerts.csv /opt/actimize/bin/import_alerts.sh alerts.csv

4. The Regulator Surprise Audit

Scenario: The regulator shows up unannounced and asks, "How did you determine this transaction was suspicious?" How to respond:
- Have documentation ready:
- Model training data.
- Feature importance (SHAP values).
- SOC team feedback.
- Show the audit trail:
sql -- Example: Query to show why a transaction was flagged SELECT
t.transaction_id,
t.amount,
t.country,
r.rule_name,
r.threshold,
t.amount > r.threshold AS triggered FROM transactions t JOIN rules r ON t.country = r.country WHERE t.transaction_id = '12345';


Quick Check Questions


1. You’re deploying an AML model to a bank’s on-premise server. They say, "No Docker allowed." What’s your first step?

Answer: Check if they allow VMs, systemd services, or bare-metal Python scripts. If not, rewrite the model in Java or SQL (e.g., PostgreSQL PL/pgSQL).
Why: Docker is often banned for security reasons, but other deployment methods may be allowed.

2. The SOC team says, "Your model is flagging too many false positives." What do you do?

Answer: Recalibrate thresholds based on their feedback, then add a "false positive" feedback loop (e.g., let them mark alerts as "not suspicious").
Why: False positives waste time; the SOC team’s input is critical for tuning.

3. The regulator asks, "How did your model decide this transaction was high-risk?" What do you show them?

Answer: SHAP values, rule-based explanations, and SOC team notes (e.g., "This transaction was flagged because it was >$10K and sent to a high-risk country").
Why: Regulators require explainability; black-box models won’t fly.


Last-Minute Cram Sheet

  1. Key AML acronyms:
  2. SAR = Suspicious Activity Report
  3. KYC = Know Your Customer
  4. PEP = Politically Exposed Person
  5. OFAC = Office of Foreign Assets Control (US sanctions list)
  6. FPR = False Positive Rate (keep <5%)

  7. Common ports:

  8. PostgreSQL: 5432
  9. Actimize: 8080 (HTTP), 8443 (HTTPS)
  10. SSH: 22 (⚠️ often blocked in finance; use a bastion host)

  11. Field commands:
    ```bash
    # Check disk space (critical for on-premise)
    df -h

# Quick data audit (run on a sample)
head -n 1000 transactions.csv | awk -F, '{print $5}' | sort | uniq -c

# Install Python packages offline
pip install --no-index --find-links=/path/to/wheels pandas

# Tail logs (debugging in production)
tail -f /var/log/aml/risk_scoring.log
```


  1. Deployment checklist:
  2. [ ] Can I run Python/Java/SQL in this environment?
  3. [ ] Do I have enough disk space? (⚠️ AML data is huge.)
  4. [ ] Can I log to a file? (⚠️ No cloud logging in air-gapped environments.)
  5. [ ] Does the SOC team understand the new alerts?

  6. Field traps:

  7. ⚠️ Never assume the customer’s data is clean. Always audit first.
  8. ⚠️ Regulators hate ML black boxes. Start with rules.
  9. ⚠️ Air-gapped environments break everything. Test offline dependencies early.
  10. ⚠️ False positives > false negatives. SOC teams will ignore your model if it’s noisy.
  11. ⚠️ Document everything. Regulators will ask for it.

Final Advice

AML is 80% data plumbing, 15% stakeholder management, and 5% ML. Focus on: 1. Making the SOC team’s life easier (reduce false positives).
2. Keeping the regulator happy (document everything).
3. Deploying something that works today (not a perfect model in 6 months).

Now go ship something that stops criminals. ?



ADVERTISEMENT