By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.
From raw data to actionable insights—descriptive, diagnostic, predictive, and prescriptive analytics, visualization, and data mining.
Data analytics extracts meaning from data to drive decisions. Businesses, scientists, and engineers use it to uncover trends, explain outcomes, forecast future events, and recommend actions.
Why use it today?Data is everywhere—sensors, logs, transactions, social media. Analytics turns noise into signals, reducing guesswork in marketing, operations, healthcare, and robotics.
Without analytics, data is just numbers. With it, data becomes a competitive edge.
Key idea: Start with descriptive, then move right. Each type builds on the last.
Example: Mining might find that customers who buy X also buy Y. Analytics would then test if promoting X increases Y sales.
A repeatable framework for analytics projects: 1. Business Understanding – Define the problem (e.g., "Why are customers churning?").2. Data Understanding – Explore data quality, sources, and gaps.3. Data Preparation – Clean, transform, and structure data (80% of the work).4. Modeling – Apply statistical/ML techniques.5. Evaluation – Check if the model solves the problem.6. Deployment – Integrate insights into decisions (e.g., dashboards, APIs).
Pro tip: Iterate. Most projects loop between steps 2–5.
Example:
-- Total sales by product category (SQL) SELECT category, SUM(revenue) FROM sales GROUP BY category;
Tools: SQL JOIN, Python’s pandas.crosstab(), or Power BI’s "Decomposition Tree."
JOIN
pandas.crosstab()
Example (Python):
from sklearn.linear_model import LinearRegression model = LinearRegression() model.fit(X_train, y_train) # X = features, y = target predictions = model.predict(X_test)
Tools: PuLP (Python), OptaPlanner (Java), or Gurobi (commercial).
Example (Association Rules):
from mlxtend.frequent_patterns import apriori frequent_itemsets = apriori(df, min_support=0.05, use_colnames=True)
pandas
numpy
scikit-learn
matplotlib
Goal: Analyze a dataset of video game sales.
Load data: python import pandas as pd df = pd.read_csv("vgsales.csv") # Kaggle dataset
python import pandas as pd df = pd.read_csv("vgsales.csv") # Kaggle dataset
Explore: python print(df.head()) # First 5 rows print(df.describe()) # Summary stats
python print(df.head()) # First 5 rows print(df.describe()) # Summary stats
Aggregate: python top_genres = df.groupby("Genre")["Global_Sales"].sum().sort_values(ascending=False) print(top_genres)
python top_genres = df.groupby("Genre")["Global_Sales"].sum().sort_values(ascending=False) print(top_genres)
Visualize: python import matplotlib.pyplot as plt top_genres.plot(kind="bar", title="Global Sales by Genre") plt.show() Expected outcome: A bar chart showing "Action" and "Sports" as top-selling genres.
python import matplotlib.pyplot as plt top_genres.plot(kind="bar", title="Global Sales by Genre") plt.show()
Goal: Predict house prices using linear regression.
Load and prep data: python from sklearn.datasets import fetch_california_housing data = fetch_california_housing() X, y = data.data, data.target
python from sklearn.datasets import fetch_california_housing data = fetch_california_housing() X, y = data.data, data.target
Split data: python from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
python from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
Train and predict: python from sklearn.linear_model import LinearRegression model = LinearRegression() model.fit(X_train, y_train) predictions = model.predict(X_test)
python from sklearn.linear_model import LinearRegression model = LinearRegression() model.fit(X_train, y_train) predictions = model.predict(X_test)
Evaluate: python from sklearn.metrics import mean_squared_error mse = mean_squared_error(y_test, predictions) print(f"Mean Squared Error: {mse}") Expected outcome: A model with an MSE around 0.5–0.7 (lower = better).
python from sklearn.metrics import mean_squared_error mse = mean_squared_error(y_test, predictions) print(f"Mean Squared Error: {mse}")
Fix: Always run df.info() and df.isnull().sum() first.
df.info()
df.isnull().sum()
Overfitting in predictive models
Fix: Use train-test splits, cross-validation, and regularization (e.g., Ridge regression).
Ridge
Misleading visualizations
Fix: Start y-axis at 0, avoid pie charts for comparisons, and label clearly.
Jumping to predictive without descriptive
Fix: Start with EDA (Exploratory Data Analysis) and diagnostic analytics.
Confusing correlation with causation
df.fillna()
StandardScaler
Comparison: Python vs. R| Feature | Python | R | |---------------|---------------------------------|--------------------------------| | Strengths | General-purpose, ML, production | Statistics, visualization | | Syntax | Readable, object-oriented | Functional, vectorized | | Use Case | Deploying models, automation | Exploratory analysis, research |
A retail company wants to understand why sales dropped last quarter. Which type of analytics should they use first?
A) Predictive B) Prescriptive C) Diagnostic D) Descriptive
Correct Answer: D) Descriptive Explanation: Descriptive analytics answers "What happened?" (e.g., "Sales dropped 15% in Q3"). Diagnostic analytics (C) comes next to answer "Why?" Why the Distractors Are Tempting:- A) Predictive: Tempting because it’s "advanced," but you can’t predict without first describing.- B) Prescriptive: Skips the foundational steps (describe → diagnose → predict → prescribe).- C) Diagnostic: The next step, but you need descriptive first to identify the drop.
You’re building a model to predict house prices. After training, the model performs well on training data but poorly on test data. What’s the most likely issue?
A) Underfitting B) Overfitting C) Incorrect data types D) Missing values
Correct Answer: B) Overfitting Explanation: Overfitting occurs when a model memorizes training data noise, failing to generalize to new data.Why the Distractors Are Tempting:- A) Underfitting: Would perform poorly on both training and test data.- C) Incorrect data types: Would cause errors during training, not poor test performance.- D) Missing values: Would affect all data, not just test performance.
Which visualization is best for comparing proportions of a whole (e.g., market share by company)?
A) Line chart B) Bar chart C) Pie chart D) Scatter plot
Correct Answer: B) Bar chart Explanation: Bar charts compare proportions more accurately than pie charts (especially for >3 categories).Why the Distractors Are Tempting:- A) Line chart: Best for trends over time, not proportions.- C) Pie chart: Common but hard to compare slices (use only for 2–3 categories).- D) Scatter plot: Shows relationships between variables, not proportions.
Practice EDA with pandas and matplotlib.
Descriptive & Diagnostic Analytics
Learn A/B testing and root-cause analysis.
Predictive Analytics
caret
Learn model evaluation (accuracy, precision, recall, ROC curves).
Prescriptive Analytics
Explore tools like PuLP or OptaPlanner.
Advanced Topics
Join 4M+ learners. Unlock unlimited quizzes, wrong-answer tracking, flashcards + reminders, study guides, and 1-on-1 challenges.