Anomaly Detection Using Gaussian Distribution#
Jupyter Demos#
βΆοΈ Demo | Anomaly Detection - find anomalies in server operational parameters like latency
and threshold
Gaussian (Normal) Distribution#
The normal (or Gaussian) distribution is a very common continuous probability distribution. Normal distributions are important in statistics and are often used in the natural and social sciences to represent real-valued random variables whose distributions are not known. A random variable with a Gaussian distribution is said to be normally distributed and is called a normal deviate.
Letβs say:
If x is normally distributed then it may be displayed as follows.
- mean value,
- variance.
- β~β means that βx is distributed as β¦β
Then Gaussian distribution (probability that some x may be a part of distribution with certain mean and variance) is given by:
Estimating Parameters for a Gaussian#
We may use the following formulas to estimate Gaussian parameters (mean and variation) for ith feature:
- number of training examples.
- number of features.
Density Estimation#
So we have a training set:
We assume that each feature of the training set is normally distributed:
Then:
Anomaly Detection Algorithm#
Choose features that might be indicative of anomalous examples ().
Fit parameters using formulas:
Given new example x, compute p(x):
Anomaly if
- probability threshold.
Algorithm Evaluation#
The algorithm may be evaluated using F1 score.
The F1 score is the harmonic average of the precision and recall, where an F1 score reaches its best value at 1 (perfect precision and recall) and worst at 0.
Where:
tp - number of true positives.
fp - number of false positives.
fn - number of false negatives.