| Type | Description | Use Case | Examples |
|---|---|---|---|
| Supervised | Learning from labeled training data | Classification, regression | Image recognition, spam detection |
| Unsupervised | Learning from unlabeled data | Clustering, dimensionality reduction | Customer segmentation, anomaly detection |
| Reinforcement | Learning through interaction with environment | Game playing, robotics | Game AI, autonomous vehicles |
| Semi-Supervised | Combination of labeled and unlabeled data | When labeled data is scarce | Image classification with few labels |
| Self-Supervised | Learning from data itself without human labels | Pre-training for NLP, vision | Language models, image representations |
| Algorithm | Type | Best For | Advantages | Disadvantages |
|---|---|---|---|---|
| Linear Regression | Regression | Continuous output prediction | Simple, interpretable, fast | Assumes linear relationship |
| Logistic Regression | Classification | Binary classification | Simple, provides probabilities | Assumes linear decision boundary |
| Decision Trees | Classification/Regression | Interpretable decisions | Easy to understand, handles non-linear | Prone to overfitting |
| Random Forest | Classification/Regression | High accuracy | Reduces overfitting, handles non-linear | Less interpretable |
| SVM | Classification/Regression | High-dimensional data | Effective in high dimensions | Requires feature scaling |
| K-Nearest Neighbors | Classification/Regression | Local patterns | Simple, no training phase | Slow prediction, sensitive to features |
| Naive Bayes | Classification | Text classification | Fast, works with small datasets | Assumes feature independence |
| Algorithm | Type | Best For | Advantages | Disadvantages |
|---|---|---|---|---|
| K-Means | Clustering | Grouping similar data points | Simple, efficient | Need to specify k, sensitive to initialization |
| Hierarchical Clustering | Clustering | Creating hierarchy of clusters | Creates tree structure, no need to specify k | Computationally expensive |
| DBSCAN | Clustering | Clusters of varying shapes | Finds arbitrary shapes, handles noise | Sensitive to parameters |
| Principal Component Analysis (PCA) | Dimensionality Reduction | Feature reduction, visualization | Reduces noise, visualization | Loss of interpretability |
| Independent Component Analysis (ICA) | Dimensionality Reduction | Feature extraction | Finds independent components | Assumes independence |
| t-SNE | Dimensionality Reduction | Visualization of high-dimensional data | Good for visualization | Computationally expensive |
| Task | Metric | Formula | Best For | Range |
|---|---|---|---|---|
| Classification | Accuracy | TP+TN / (TP+TN+FP+FN) | Balanced datasets | [0,1] |
| Classification | Precision | TP / (TP+FP) | Minimizing false positives | [0,1] |
| Classification | Recall/Sensitivity | TP / (TP+FN) | Minimizing false negatives | [0,1] |
| Classification | F1-Score | 2×(Precision×Recall) / (Precision+Recall) | Imbalanced datasets | [0,1] |
| Classification | AUC-ROC | Area under ROC curve | Threshold-independent evaluation | [0,1] |
| Regression | Mean Squared Error (MSE) | 1/n Σ(yi - ŷi)² | Penalizing large errors | [0,∞) |
| Regression | Mean Absolute Error (MAE) | 1/n Σ|yi - ŷi| | Robust to outliers | [0,∞) |
| Regression | R² (R-squared) | 1 - (SSres/SStot) | Explained variance | (-∞,1] |
| Technique | Purpose | Method | When to Use |
|---|---|---|---|
| Feature Scaling | Normalize feature ranges | Min-Max, Standardization | Distance-based algorithms, neural networks |
| One-Hot Encoding | Convert categorical to numerical | Create binary columns | Categorical features |
| Feature Engineering | Create new informative features | Domain knowledge, transformations | Improve model performance |
| Feature Selection | Remove irrelevant features | Statistical tests, regularization | Reduce overfitting, improve speed |
| Handling Missing Values | Deal with incomplete data | Imputation, deletion | Datasets with missing values |
| Data Imbalance Handling | Deal with unequal class distributions | SMOTE, class weighting | Classification with imbalanced data |
| Method | Description | Advantages | Disadvantages |
|---|---|---|---|
| Train-Test Split | Divide data into training and testing sets | Simple, fast | Single evaluation, variance |
| K-Fold Cross-Validation | Split data into k folds, train/test k times | More robust evaluation | Computationally expensive |
| Stratified K-Fold | K-fold preserving class distribution | Good for imbalanced data | More complex implementation |
| Leave-One-Out | Each sample as test set once | Unbiased estimate | Very computationally expensive |
| Grid Search | Exhaustive search over hyperparameters | Finds best hyperparameters | Computationally expensive |
| Random Search | Random search over hyperparameters | More efficient than grid search | May miss optimal parameters |
| Problem | Definition | Signs | Solutions |
|---|---|---|---|
| Overfitting | Model performs well on training but poorly on test | High training accuracy, low test accuracy | Regularization, dropout, more data, simpler model |
| Underfitting | Model performs poorly on both training and test | Low accuracy on both sets | More complex model, feature engineering, less regularization |
| Bias-Variance Tradeoff | Balance between underfitting and overfitting | High bias (underfitting) vs high variance (overfitting) | Find optimal model complexity |
| Method | Approach | How It Works | Benefits |
|---|---|---|---|
| Bagging | Parallel training of models | Bootstrap samples, average predictions | Reduces variance, prevents overfitting |
| Boosting | Sequential training of models | Focus on misclassified examples | Reduces bias, handles complex patterns |
| Random Forest | Bagging with random features | Multiple decision trees with random features | Reduces overfitting, handles non-linear |
| AdaBoost | Adaptive boosting | Sequentially focus on hard examples | Good for weak learners |
| Gradient Boosting | Boosting with gradient descent | Sequentially minimize loss function | High accuracy, handles complex patterns |
| Voting Classifiers | Combine different models | Majority vote or average predictions | Reduces variance, leverages different models |