Main Content

Data Smoothing and Outlier Detection

Data smoothing refers to techniques for eliminating unwanted noise or behaviors in data, while outlier detection identifies data points that are significantly different from the rest of the data.

Moving Window Methods

移动窗口方法处理数据的方法smaller batches at a time, typically in order to statistically represent a neighborhood of points in the data. The moving average is a common data smoothing technique that slides a window along the data, computing the mean of the points inside of each window. This can help to eliminate insignificant variations from one data point to the next.

For example, consider wind speed measurements taken every minute for about 3 hours. Use themovmeanfunction with a window size of 5 minutes to smooth out high-speed wind gusts.

loadwindData.matmins = 1:length(speed); window = 5; meanspeed = movmean(speed,window); plot(mins,speed,mins,meanspeed) axistightlegend('Measured Wind Speed','Average Wind Speed over 5 min Window',...'location','best') xlabel('Time') ylabel('Speed')

图包含一个坐标轴对象。坐标轴对象有限公司ntains 2 objects of type line. These objects represent Measured Wind Speed, Average Wind Speed over 5 min Window.

Similarly, you can compute the median wind speed over a sliding window using themovmedianfunction.

medianspeed = movmedian(speed,window); plot(mins,speed,mins,medianspeed) axistightlegend('Measured Wind Speed','Median Wind Speed over 5 min Window',...'location','best') xlabel('Time') ylabel('Speed')

图包含一个坐标轴对象。坐标轴对象有限公司ntains 2 objects of type line. These objects represent Measured Wind Speed, Median Wind Speed over 5 min Window.

Not all data is suitable for smoothing with a moving window method. For example, create a sinusoidal signal with injected random noise.

t = 1:0.2:15; A = sin(2*pi*t) + cos(2*pi*0.5*t); Anoise = A + 0.5*rand(1,length(t)); plot(t,A,t,Anoise) axistightlegend('Original Data','Noisy Data','location','best')

图包含一个坐标轴对象。坐标轴对象有限公司ntains 2 objects of type line. These objects represent Original Data, Noisy Data.

Use a moving mean with a window size of 3 to smooth the noisy data.

window = 3; Amean = movmean(Anoise,window); plot(t,A,t,Amean) axistightlegend('Original Data','Moving Mean - Window Size 3')

图包含一个坐标轴对象。坐标轴对象有限公司ntains 2 objects of type line. These objects represent Original Data, Moving Mean - Window Size 3.

The moving mean achieves the general shape of the data, but doesn't capture the valleys (local minima) very accurately. Since the valley points are surrounded by two larger neighbors in each window, the mean is not a very good approximation to those points. If you make the window size larger, the mean eliminates the shorter peaks altogether. For this type of data, you might consider alternative smoothing techniques.

Amean = movmean(Anoise,5); plot(t,A,t,Amean) axistightlegend('Original Data','Moving Mean - Window Size 5',...'location','best')

图包含一个坐标轴对象。坐标轴对象有限公司ntains 2 objects of type line. These objects represent Original Data, Moving Mean - Window Size 5.

Common Smoothing Methods

Thesmoothdatafunction provides several smoothing options such as the Savitzky-Golay method, which is a popular smoothing technique used in signal processing. By default,smoothdatachooses a best-guess window size for the method depending on the data.

Use the Savitzky-Golay method to smooth the noisy signalAnoise, and output the window size that it uses. This method provides a better valley approximation compared tomovmean.

[Asgolay,window] = smoothdata(Anoise,'sgolay'); plot(t,A,t,Asgolay) axistightlegend('Original Data','Savitzky-Golay','location','best')

图包含一个坐标轴对象。坐标轴对象有限公司ntains 2 objects of type line. These objects represent Original Data, Savitzky-Golay.

window
window = 3

The robust Lowess method is another smoothing method that is particularly helpful when outliers are present in the data in addition to noise. Inject an outlier into the noisy data, and use robust Lowess to smooth the data, which eliminates the outlier.

Anoise(36) = 20; Arlowess = smoothdata(Anoise,'rlowess',5); plot(t,Anoise,t,Arlowess) axistightlegend('Noisy Data','Robust Lowess')

图包含一个坐标轴对象。坐标轴对象有限公司ntains 2 objects of type line. These objects represent Noisy Data, Robust Lowess.

Detecting Outliers

Outliers in data can significantly skew data processing results and other computed quantities. For example, if you try to smooth data containing outliers with a moving median, you can get misleading peaks or valleys.

Amedian = smoothdata(Anoise,'movmedian'); plot(t,Anoise,t,Amedian) axistightlegend('Noisy Data','Moving Median')

图包含一个坐标轴对象。坐标轴对象有限公司ntains 2 objects of type line. These objects represent Noisy Data, Moving Median.

Theisoutlierfunction returns a logical 1 when an outlier is detected. Verify the index and value of the outlier inAnoise.

TF = isoutlier(Anoise); ind = find(TF)
ind = 36
Aoutlier = Anoise(ind)
Aoutlier = 20

You can use thefilloutliersfunction to replace outliers in your data by specifying a fill method. For example, fill the outlier inAnoisewith the value of its neighbor immediately to the right.

Afill = filloutliers(Anoise,'next'); plot(t,Anoise,t,Afill) axistightlegend('Noisy Data with Outlier','Noisy Data with Filled Outlier')

图包含一个坐标轴对象。坐标轴对象有限公司ntains 2 objects of type line. These objects represent Noisy Data with Outlier, Noisy Data with Filled Outlier.

Nonuniform Data

Not all data consists of equally spaced points, which can affect methods for data processing. Create adatetimevector that contains irregular sampling times for the data inAirreg. Thetimevector represents samples taken every minute for the first 30 minutes, then hourly over two days.

t0 = datetime(2014,1,1,1,1,1); timeminutes = sort(t0 + minutes(1:30)); timehours = t0 + hours(1:48); time = [timeminutes timehours]; Airreg = rand(1,length(time)); plot(time,Airreg) axistight

图包含一个坐标轴对象。坐标轴对象有限公司ntains an object of type line.

By default,smoothdatasmooths with respect to equally spaced integers, in this case,1,2,...,78. Since integer time stamps do not coordinate with the sampling of the points inAirreg, the first half hour of data still appears noisy after smoothing.

Adefault = smoothdata(Airreg,'movmean',3); plot(time,Airreg,time,Adefault) axistightlegend('Original Data','Smoothed Data with Default Sample Points')

图包含一个坐标轴对象。坐标轴对象有限公司ntains 2 objects of type line. These objects represent Original Data, Smoothed Data with Default Sample Points.

Many data processing functions in MATLAB®, includingsmoothdata,movmean, andfilloutliers, allow you to provide sample points, ensuring that data is processed relative to its sampling units and frequencies. To remove the high-frequency variation in the first half hour of data inAirreg, use the'SamplePoints'option with the time stamps intime.

Asamplepoints = smoothdata(Airreg,'movmean',...hours(3),'SamplePoints',time); plot(time,Airreg,time,Asamplepoints) axistightlegend('Original Data','Smoothed Data with Sample Points')

图包含一个坐标轴对象。坐标轴对象有限公司ntains 2 objects of type line. These objects represent Original Data, Smoothed Data with Sample Points.

See Also

||||

Related Topics