主要内容

曲线拟合and Distribution Fitting

此示例显示了如何执行曲线拟合和分布拟合,并讨论了每种方法何时合适。

在曲线拟合和分配配件之间进行选择

Curve fitting and distribution fitting are different types of data analysis.

  • Use curve fitting when you want to model a response variable as a function of a predictor variable.

  • Use distribution fitting when you want to model the probability distribution of a single variable.

曲线拟合

在以下实验数据中,预测变量为time的摄入后,时间drug. The response variable isconc,药物在血液中的浓度。假设只有响应数据conc受实验误差的影响。

时间= [0.1 0.1 0.3 0.3 1.3 1.7 2.1 2.6 3.9 3.9 3.9...5.1 5.6 6.2 6.4 7.7 8.1 8.2 8.9 9.0 9.5...9.6 10.2 10.3 10.8 11.2 11.2 11.2 11.7 12.1 12.3...12.3 13.1 13.2 13.4 13.7 14.0 14.3 15.4 16.1 16.1...16.4 16.4 16.7 16.7 17.5 17.6 18.1 18.5 19.3 19.7]'; conc = [0.01 0.08 0.13 0.16 0.55 0.90 1.11 1.62 1.79 1.59...1.83 1.68 2.09 2.17 2.66 2.08 2.26 1.65 1.70 2.39...2.08 2.02 1.65 1.96 1.91 1.30 1.62 1.57 1.32 1.56...1.36 1.05 1.29 1.32 1.20 1.10 0.88 0.63 0.69 0.69...0.49 0.53 0.42 0.48 0.41 0.27 0.36 0.33 0.17 0.20]';

Suppose you want to model blood concentration as a function of time. Plotconc反对time.

plot(time,conc,'o');Xlabel('Time');ylabel('Blood Concentration');

图包含一个轴对象。The axes object contains an object of type line.

Assume thatconcfollows a two-parameter Weibull curve as a function oftime. A Weibull curve has the form and parameters

y = c ( x / a ) ( b - 1 ) e - ( x / a ) b ,

where a is a horizontal scaling, b is a shape parameter, and c is a vertical scaling.

Fit the Weibull model using nonlinear least squares.

modelFun = @(p,x) p(3) .* (x./p(1)).^(p(2)-1) .* exp(-(x./p(1)).^p(2)); startingVals = [10 2 5]; nlModel = fitnlm(time,conc,modelFun,startingVals);

将Weibull曲线绘制到数据上。

Xgrid = linspace(0,20,100)';线(XGRID,PRODUCT(NLMODEL,XGRID),'颜色','r');

图包含一个轴对象。The axes object contains 2 objects of type line.

The fitted Weibull model is problematic.fitnlmassumes the experimental errors are additive and come from a symmetric distribution with constant variance. However, the scatter plot shows that the error variance is proportional to the height of the curve. Furthermore, the additive, symmetric errors imply that a negative blood concentration measurement is possible.

一个更现实的假设是,在日志刻度上,乘法错误是对称的。在该假设下,通过将两侧的对数符合Weibull曲线。使用非线性最小二乘正方形来适合曲线:

日志 ( y ) = 日志 ( c ) + ( b - 1 ) 日志 ( x / a ) - ( x / a ) b .

nlmodel2 = fitnlm(time,log(conc),@(p,x)log(modelfun(p,x)),起点);

将新曲线添加到现有图中。

line(xgrid,exp(predict(nlModel2,xgrid)),'颜色',[0 .5 0],'linestyle','--');legend({'原始数据','Additive Errors Model','Multiplicative Errors Model'});

图包含一个轴对象。The axes object contains 3 objects of type line. These objects represent Raw Data, Additive Errors Model, Multiplicative Errors Model.

模型对象nlModel2包含精度的估计。最好的做法是检查模型的合适性。例如,在日志刻度上制作剩余图,以检查乘法误差的恒定方差的假设。

在此示例中,使用乘法模型对模型预测几乎没有影响。对于模型类型具有更多影响的示例,请参见Pitfalls in Fitting Nonlinear Models by Transforming to Linearity.

曲线拟合功能

  • 统计和机器学习工具箱™ includes these functions for fitting models:fitnlmfor nonlinear least-squares models,fitglmfor generalized linear models,fitrgpfor Gaussian process regression models, andFITRSVMfor support vector machine regression models.

  • 曲线拟合工具箱™提供命令行和图形工具,以简化曲线拟合中的任务。例如,该工具箱为各种模型以及可靠和非参数拟合方法提供了自动选择启动系数值。

  • 优化Toolbox™具有执行复杂类型的曲线拟合分析的功能,例如对系数的约束分析模型。

  • The MATLAB® functionpolyfit适合多项式模型和MATLAB函数fminsearchis useful in other kinds of curve fitting.

分配拟合

假设您想对电气组件寿命的分布进行建模。变量lifemeasures the time to failure for 50 identical electrical components.

life = [ 6.2 16.1 16.3 19.0 12.2 8.1 8.8 5.9 7.3 8.2...16.1 12.8 9.8 11.3 5.1 10.8 6.7 1.2 8.3 2.3...4.3 2.9 14.8 4.6 3.1 13.6 14.5 5.2 5.7 6.5...5.3 6.4 3.5 11.4 9.3 12.4 18.3 15.9 4.0 10.4...8.7 3.0 12.1 3.9 6.5 3.4 8.5 0.9 9.9 7.9]';

用直方图可视化数据。

binWidth = 2; lastVal = ceil(max(life)); binEdges = 0:binWidth:lastVal+1; h = histogram(life,binEdges); xlabel(“失败的时间”);ylabel('频率');ylim([0 10]);

图包含一个轴对象。The axes object contains an object of type histogram.

Because lifetime data often follows a Weibull distribution, one approach might be to use the Weibull curve from the previous curve fitting example to fit the histogram. To try this approach, convert the histogram to a set of points (x,y), where x is a bin center and y is a bin height, and then fit a curve to those points.

counts = histcounts(life,binEdges); binCtrs = binEdges(1:end-1) + binWidth/2; h.FaceColor = [.9 .9 .9]; holdonplot(binCtrs,counts,'o');抓住离开

图包含一个轴对象。轴对象包含2个类型直方图的对象。

但是,将曲线拟合到直方图是有问题的,通常不建议。

  1. The process violates basic assumptions of least-squares fitting. The bin counts are nonnegative, implying that measurement errors cannot be symmetric. Also, the bin counts have different variability in the tails than in the center of the distribution. Finally, the bin counts have a fixed sum, implying that they are not independent measurements.

  2. 如果将Weibull曲线拟合到条高度,则必须约束曲线,因为直方图是经验概率密度函数(PDF)的缩放版本。

  3. For continuous data, fitting a curve to a histogram rather than data discards information.

  4. The bar heights in the histogram are dependent on the choice of bin edges and bin widths.

For many parametric distributions, maximum likelihood is a better way to estimate parameters because it avoids these problems. The Weibull pdf has almost the same form as the Weibull curve:

y = ( b / a ) ( x / a ) ( b - 1 ) e - ( x / a ) b .

然而, b / a replaces the scale parameter c 因为该函数必须集成到1.要使用最大可能性拟合Weibull分布到数据,请使用fitdist并指定“威布尔”as the distribution name. Unlike least squares, maximum likelihood finds a Weibull pdf that best matches the scaled histogram without minimizing the sum of the squared differences between the pdf and the bar heights.

pd = fitdist(life,“威布尔”);

绘制数据的缩放直方图,并将拟合的PDF叠加。

h =直方图(生命,binedges,'Normalization','pdf','FaceColor',[。9.9 .9]);Xlabel(“失败的时间”);ylabel(“概率密度”);ylim([0 0.1]); xgrid = linspace(0,20,100)'; pdfEst = pdf(pd,xgrid); line(xgrid,pdfEst)

图包含一个轴对象。轴对象包含2个类型直方图的对象。

最好的做法是检查模型的合适性。

尽管通常不建议将曲线拟合到直方图,但在某些情况下,该过程是合适的。例如,请参阅Fit Custom Distributions.

分配配件的功能

  • 统计和机器学习工具箱™包括功能fitdist用于将概率分布对象拟合到数据。它还包括专用拟合功能(例如wblfit) for fitting parametric distributions using maximum likelihood, the functionmle用于拟合自定义分布,而无需专用拟合功能,该功能ksdensityfor fitting nonparametric distribution models to data.

  • 统计和机器学习工具箱additionally provides theDistribution FitterApp简化了分发拟合中的许多任务,例如生成可视化和诊断图。

  • Functions in Optimization Toolbox™ enable you to fit complicated distributions, including those with constraints on the parameters.

  • The MATLAB® functionfminsearch提供最大的似然分布拟合。

See Also

|||||||||

Related Topics