Reposted from:
https://forworktests.blogspot.com/2023/01/how-to-quickly-find-anomalies-in-number.html
Translated from Russian. The original article is here:
https://forworktests.blogspot.com/2022/12/blog-post.html
In practice, there are issues for the solution of which it is required to find anomalies in the numerical series. For ease of understanding, we can assume that these are values that differ from most numbers in the series in some way (outlier, non-standard value, deviation from the norm). Such tasks are found in various areas:
- cleaning of noisy data in Data Science;
- outlier filtering in the training sample for neural networks in Machine Learning;
- search for abnormal network hacker activity, while monitoring traffic and events in Cybersecurity;
- detection of outliers or tails in the stock data stream in Algorithmic Trading;
- as well as in any anomaly search tasks, where data can be presented as a numerical series.
The concepts of a number series in mathematical analysis and in statistics are different. We accept a numerical series as its statistical understanding, that is, a finite sequence of numbers (analogous to a sample). There are various interpretations of the anomaly in the numerical series. We will consider them further.
The article also shows examples of how to find anomalies quickly and efficiently in numerical series using the modified Hampel method (Hampel F.R.).