Anomaly Detection Using Gaussian Distribution

Anomaly Detection Using Gaussian Distribution#

Jupyter Demos#

▶️ Demo | Anomaly Detection - find anomalies in server operational parameters like latency and threshold

Gaussian (Normal) Distribution#

The normal (or Gaussian) distribution is a very common continuous probability distribution. Normal distributions are important in statistics and are often used in the natural and social sciences to represent real-valued random variables whose distributions are not known. A random variable with a Gaussian distribution is said to be normally distributed and is called a normal deviate.

Let’s say:

x-in-R

If x is normally distributed then it may be displayed as follows.

Gaussian Distribution

- mean value,

sigma-2 - variance.

x-normal - “~” means that “x is distributed as …”

Then Gaussian distribution (probability that some x may be a part of distribution with certain mean and variance) is given by:

Gaussian Distribution

Estimating Parameters for a Gaussian#

We may use the following formulas to estimate Gaussian parameters (mean and variation) for i^th feature:

mu-i

sigma-i

- number of training examples.

- number of features.

Density Estimation#

So we have a training set:

Training Set

x-in-R

We assume that each feature of the training set is normally distributed:

x-1

x-2

x-n

Then:

p-x

p-x-2

Anomaly Detection Algorithm#

Choose features that might be indicative of anomalous examples ().
Fit parameters using formulas:

mu-i

sigma-i

Given new example x, compute p(x):

p-x-2

Anomaly if anomaly

epsilon - probability threshold.

Algorithm Evaluation#

The algorithm may be evaluated using F1 score.

The F1 score is the harmonic average of the precision and recall, where an F1 score reaches its best value at 1 (perfect precision and recall) and worst at 0.

F1 Score