主要内容

Clean Outlier Data

查找、填补生活编辑或删除离群值

描述

TheClean Outlier Data任务使您可以交互处理数据中的异常值。该任务自动生成MATLAB®code for your live script.

Using this task, you can:

  • Find, fill, or remove outliers from data in a workspace variable.

  • 自定义查找和填充异常值的方法。

  • Automatically visualize the outlier data and cleaned data.

Open the Task

To add theClean Outlier Datatask to a live script in the MATLAB Editor:

  • On theLive Editortab, selectTask > Clean Outlier Data

  • 在脚本中的代码块中,键入一个相关关键字,例如outlier或者干净的。选择Clean Outlier Datafrom the suggested command completions.

Parameters

This task operates on data of typesingle或者double包含在矢量或表变量中。在为输入数据提供表或时间表时,请指定全部supported variablesto clean all variables with typesingle或者double, or choose whichsingle或者doublevariables to clean by selectingSpecified variablesand then selecting the variables individually.

Specify the method for filling outliers using one of the following options.

Fill Method 描述
线性插值 相邻的非输入值的线性插值。
Constant value Specified scalar value, which is0默认。
中心价值 中心价值determined by the find method.
Clip to threshold value Fills lower threshold value for elements smaller than the lower threshold determined by the find method. Fills with the upper threshold value for elements larger than the upper threshold determined by the find method.
先前的值 以前的非输入值。
Next value Next nonoutlier value.
Nearest value Nearest nonoutlier value.
样条插值 Piecewise cubic spline interpolation.
Shape-preserving cubic interpolation (PCHIP) Shape-preserving piecewise cubic spline interpolation.
Modified Akima cubic interpolation 修改了Akima立方英尺插值。

指定使用以下选项之一查找异常值的检测方法。

Method 描述
Moving median Outliers are defined as elements more than the specified threshold of local scaled MAD from the local median over a specified window. The default threshold is3
Median Outliers are defined as elements more than the specified threshold of scaled median absolute deviations (MAD) from the median, which is3默认。用于输入数据A,缩放的疯狂被定义为C*中值(ABS(A-Median(a))), wherec=-1/(sqrt(2)*erfcinv(3/2))
意思是 Outliers are defined as elements more than the specified threshold of standard deviations from the mean, which is3默认。This method is faster but less robust thanMedian
Quartiles Outliers are defined as elements more than the specified threshold of interquartile ranges above the upper quartile (75 percent) or below the lower quartile (25 percent), which is1.5默认。This method is useful when the input data is not normally distributed.
Grubbs Outliers are detected using Grubbs’s test, which removes one outlier per iteration based on hypothesis testing. This method assumes that the input data is normally distributed.
普遍的极端学生偏差(GESD) Outliers are detected using the generalized extreme studentized deviate test for outliers. This iterative method is similar toGrubbs,但是当多个异常值互相掩盖时,可以表现更好。
Moving mean Outliers are defined as elements more than the specified threshold of local standard deviations from the local mean over a specified window. The default threshold is3
Percentiles Outliers are defined as elements outside of the percentile range specified by an upper and lower threshold. The default lower percentile threshold is10默认的上百分位数阈值是90。Valid threshold values are in the interval [0,100].

Specify the window type and size when the method for detecting outliers isMoving median或者Moving mean

Window 描述
Centered 指定的窗口长度以当前点为中心。
Asymmetric 指定的窗口包含当前点之前的元素数和当前点之后的元素数。

窗口尺寸相对于X-axisvariable units.

Introduced in R2019b