Huge amount of data is processed by computers every day to analyze road traffic, network traffic, quality control in product line, customer behavior and market trends among many others. The amount of applications is enormous.
One problem in large-scale analysis is erroneous data values caused by human mistakes or errors in data processing or measurement. They can bias the analysis process or even lead to completely wrong conclusion. They are called outliers. In some cases, the outliers are abnormal signals carrying useful information like fraud in credit card use, network attack, or malign tumours in medical images. If detected, they can save money or even lives.
In this thesis, methods for outlier detection have been developed. Neighborhood-based method called mean-shift was used to detect outliers successfully especially from very noisy data. It can also be used to smoothen the results of existing outlier detection methods, and is shown to improve all existing methods by 9% accuracy on average.
Outlier detection is also tested in two case studies. First, it is shown that standard outlier detection methods do not help when trying to extract average road segment from noisy GPS records. This indicates that data-specific methods should be developed.
Second, method called attention entropy is developed for analyzing heart rate variability. Instead of analyzing the frequency distribution of the data or extracted patterns, significantly more accurate result is achieved by analyzing the frequency distribution of intervals between peak points in the signal. The method improves detection accuracy of the best of the baseline methods from 0.72 to 0.80. It could potentially be applied to early stage detection of Covid-19 before any visible symptoms appear (if appear at all).
The doctoral dissertation of MSc Jiawei Yang, entitled Outlier detection techniques will be examined at the Faculty of Science and Forestry on the 10th of September online. The opponent in the public examination will be Associate Professor Giacomo Boracchi, Politecnico di Milano, Italy, and the custos will be professor Pasi Fränti, University of Eastern Finland. The public examination will be held in English.