Machine Learning Cheat Sheet

Machine Learning Types

TypeDescriptionUse CaseExamples
SupervisedLearning from labeled training dataClassification, regressionImage recognition, spam detection
UnsupervisedLearning from unlabeled dataClustering, dimensionality reductionCustomer segmentation, anomaly detection
ReinforcementLearning through interaction with environmentGame playing, roboticsGame AI, autonomous vehicles
Semi-SupervisedCombination of labeled and unlabeled dataWhen labeled data is scarceImage classification with few labels
Self-SupervisedLearning from data itself without human labelsPre-training for NLP, visionLanguage models, image representations

Supervised Learning Algorithms

AlgorithmTypeBest ForAdvantagesDisadvantages
Linear RegressionRegressionContinuous output predictionSimple, interpretable, fastAssumes linear relationship
Logistic RegressionClassificationBinary classificationSimple, provides probabilitiesAssumes linear decision boundary
Decision TreesClassification/RegressionInterpretable decisionsEasy to understand, handles non-linearProne to overfitting
Random ForestClassification/RegressionHigh accuracyReduces overfitting, handles non-linearLess interpretable
SVMClassification/RegressionHigh-dimensional dataEffective in high dimensionsRequires feature scaling
K-Nearest NeighborsClassification/RegressionLocal patternsSimple, no training phaseSlow prediction, sensitive to features
Naive BayesClassificationText classificationFast, works with small datasetsAssumes feature independence

Unsupervised Learning Algorithms

AlgorithmTypeBest ForAdvantagesDisadvantages
K-MeansClusteringGrouping similar data pointsSimple, efficientNeed to specify k, sensitive to initialization
Hierarchical ClusteringClusteringCreating hierarchy of clustersCreates tree structure, no need to specify kComputationally expensive
DBSCANClusteringClusters of varying shapesFinds arbitrary shapes, handles noiseSensitive to parameters
Principal Component Analysis (PCA)Dimensionality ReductionFeature reduction, visualizationReduces noise, visualizationLoss of interpretability
Independent Component Analysis (ICA)Dimensionality ReductionFeature extractionFinds independent componentsAssumes independence
t-SNEDimensionality ReductionVisualization of high-dimensional dataGood for visualizationComputationally expensive

Model Evaluation Metrics

TaskMetricFormulaBest ForRange
ClassificationAccuracyTP+TN / (TP+TN+FP+FN)Balanced datasets[0,1]
ClassificationPrecisionTP / (TP+FP)Minimizing false positives[0,1]
ClassificationRecall/SensitivityTP / (TP+FN)Minimizing false negatives[0,1]
ClassificationF1-Score2×(Precision×Recall) / (Precision+Recall)Imbalanced datasets[0,1]
ClassificationAUC-ROCArea under ROC curveThreshold-independent evaluation[0,1]
RegressionMean Squared Error (MSE)1/n Σ(yi - ŷi)²Penalizing large errors[0,∞)
RegressionMean Absolute Error (MAE)1/n Σ|yi - ŷi|Robust to outliers[0,∞)
RegressionR² (R-squared)1 - (SSres/SStot)Explained variance(-∞,1]

Data Preprocessing

TechniquePurposeMethodWhen to Use
Feature ScalingNormalize feature rangesMin-Max, StandardizationDistance-based algorithms, neural networks
One-Hot EncodingConvert categorical to numericalCreate binary columnsCategorical features
Feature EngineeringCreate new informative featuresDomain knowledge, transformationsImprove model performance
Feature SelectionRemove irrelevant featuresStatistical tests, regularizationReduce overfitting, improve speed
Handling Missing ValuesDeal with incomplete dataImputation, deletionDatasets with missing values
Data Imbalance HandlingDeal with unequal class distributionsSMOTE, class weightingClassification with imbalanced data

Model Selection & Validation

MethodDescriptionAdvantagesDisadvantages
Train-Test SplitDivide data into training and testing setsSimple, fastSingle evaluation, variance
K-Fold Cross-ValidationSplit data into k folds, train/test k timesMore robust evaluationComputationally expensive
Stratified K-FoldK-fold preserving class distributionGood for imbalanced dataMore complex implementation
Leave-One-OutEach sample as test set onceUnbiased estimateVery computationally expensive
Grid SearchExhaustive search over hyperparametersFinds best hyperparametersComputationally expensive
Random SearchRandom search over hyperparametersMore efficient than grid searchMay miss optimal parameters

Overfitting & Underfitting

ProblemDefinitionSignsSolutions
OverfittingModel performs well on training but poorly on testHigh training accuracy, low test accuracyRegularization, dropout, more data, simpler model
UnderfittingModel performs poorly on both training and testLow accuracy on both setsMore complex model, feature engineering, less regularization
Bias-Variance TradeoffBalance between underfitting and overfittingHigh bias (underfitting) vs high variance (overfitting)Find optimal model complexity

Ensemble Methods

MethodApproachHow It WorksBenefits
BaggingParallel training of modelsBootstrap samples, average predictionsReduces variance, prevents overfitting
BoostingSequential training of modelsFocus on misclassified examplesReduces bias, handles complex patterns
Random ForestBagging with random featuresMultiple decision trees with random featuresReduces overfitting, handles non-linear
AdaBoostAdaptive boostingSequentially focus on hard examplesGood for weak learners
Gradient BoostingBoosting with gradient descentSequentially minimize loss functionHigh accuracy, handles complex patterns
Voting ClassifiersCombine different modelsMajority vote or average predictionsReduces variance, leverages different models