曲线拟合and Distribution Fitting
此示例显示了如何执行曲线拟合和分布拟合,并讨论了每种方法何时合适。
在曲线拟合和分配配件之间进行选择
Curve fitting and distribution fitting are different types of data analysis.
Use curve fitting when you want to model a response variable as a function of a predictor variable.
Use distribution fitting when you want to model the probability distribution of a single variable.
曲线拟合
在以下实验数据中,预测变量为time
的摄入后,时间drug. The response variable isconc
,药物在血液中的浓度。假设只有响应数据conc
受实验误差的影响。
时间= [0.1 0.1 0.3 0.3 1.3 1.7 2.1 2.6 3.9 3.9 3.9...5.1 5.6 6.2 6.4 7.7 8.1 8.2 8.9 9.0 9.5...9.6 10.2 10.3 10.8 11.2 11.2 11.2 11.7 12.1 12.3...12.3 13.1 13.2 13.4 13.7 14.0 14.3 15.4 16.1 16.1...16.4 16.4 16.7 16.7 17.5 17.6 18.1 18.5 19.3 19.7]'; conc = [0.01 0.08 0.13 0.16 0.55 0.90 1.11 1.62 1.79 1.59...1.83 1.68 2.09 2.17 2.66 2.08 2.26 1.65 1.70 2.39...2.08 2.02 1.65 1.96 1.91 1.30 1.62 1.57 1.32 1.56...1.36 1.05 1.29 1.32 1.20 1.10 0.88 0.63 0.69 0.69...0.49 0.53 0.42 0.48 0.41 0.27 0.36 0.33 0.17 0.20]';
Suppose you want to model blood concentration as a function of time. Plotconc
反对time
.
plot(time,conc,'o');Xlabel('Time');ylabel('Blood Concentration');
Assume thatconc
follows a two-parameter Weibull curve as a function oftime
. A Weibull curve has the form and parameters
where is a horizontal scaling, is a shape parameter, and is a vertical scaling.
Fit the Weibull model using nonlinear least squares.
modelFun = @(p,x) p(3) .* (x./p(1)).^(p(2)-1) .* exp(-(x./p(1)).^p(2)); startingVals = [10 2 5]; nlModel = fitnlm(time,conc,modelFun,startingVals);
将Weibull曲线绘制到数据上。
Xgrid = linspace(0,20,100)';线(XGRID,PRODUCT(NLMODEL,XGRID),'颜色','r');
The fitted Weibull model is problematic.fitnlm
assumes the experimental errors are additive and come from a symmetric distribution with constant variance. However, the scatter plot shows that the error variance is proportional to the height of the curve. Furthermore, the additive, symmetric errors imply that a negative blood concentration measurement is possible.
一个更现实的假设是,在日志刻度上,乘法错误是对称的。在该假设下,通过将两侧的对数符合Weibull曲线。使用非线性最小二乘正方形来适合曲线:
nlmodel2 = fitnlm(time,log(conc),@(p,x)log(modelfun(p,x)),起点);
将新曲线添加到现有图中。
line(xgrid,exp(predict(nlModel2,xgrid)),'颜色',[0 .5 0],'linestyle','--');legend({'原始数据','Additive Errors Model','Multiplicative Errors Model'});
模型对象nlModel2
包含精度的估计。最好的做法是检查模型的合适性。例如,在日志刻度上制作剩余图,以检查乘法误差的恒定方差的假设。
在此示例中,使用乘法模型对模型预测几乎没有影响。对于模型类型具有更多影响的示例,请参见Pitfalls in Fitting Nonlinear Models by Transforming to Linearity.
曲线拟合功能
统计和机器学习工具箱™ includes these functions for fitting models:
fitnlm
for nonlinear least-squares models,fitglm
for generalized linear models,fitrgp
for Gaussian process regression models, andFITRSVM
for support vector machine regression models.曲线拟合工具箱™提供命令行和图形工具,以简化曲线拟合中的任务。例如,该工具箱为各种模型以及可靠和非参数拟合方法提供了自动选择启动系数值。
优化Toolbox™具有执行复杂类型的曲线拟合分析的功能,例如对系数的约束分析模型。
The MATLAB® function
polyfit
适合多项式模型和MATLAB函数fminsearch
is useful in other kinds of curve fitting.
分配拟合
假设您想对电气组件寿命的分布进行建模。变量life
measures the time to failure for 50 identical electrical components.
life = [ 6.2 16.1 16.3 19.0 12.2 8.1 8.8 5.9 7.3 8.2...16.1 12.8 9.8 11.3 5.1 10.8 6.7 1.2 8.3 2.3...4.3 2.9 14.8 4.6 3.1 13.6 14.5 5.2 5.7 6.5...5.3 6.4 3.5 11.4 9.3 12.4 18.3 15.9 4.0 10.4...8.7 3.0 12.1 3.9 6.5 3.4 8.5 0.9 9.9 7.9]';
用直方图可视化数据。
binWidth = 2; lastVal = ceil(max(life)); binEdges = 0:binWidth:lastVal+1; h = histogram(life,binEdges); xlabel(“失败的时间”);ylabel('频率');ylim([0 10]);
Because lifetime data often follows a Weibull distribution, one approach might be to use the Weibull curve from the previous curve fitting example to fit the histogram. To try this approach, convert the histogram to a set of points (x,y), where x is a bin center and y is a bin height, and then fit a curve to those points.
counts = histcounts(life,binEdges); binCtrs = binEdges(1:end-1) + binWidth/2; h.FaceColor = [.9 .9 .9]; holdonplot(binCtrs,counts,'o');抓住离开
但是,将曲线拟合到直方图是有问题的,通常不建议。
The process violates basic assumptions of least-squares fitting. The bin counts are nonnegative, implying that measurement errors cannot be symmetric. Also, the bin counts have different variability in the tails than in the center of the distribution. Finally, the bin counts have a fixed sum, implying that they are not independent measurements.
如果将Weibull曲线拟合到条高度,则必须约束曲线,因为直方图是经验概率密度函数(PDF)的缩放版本。
For continuous data, fitting a curve to a histogram rather than data discards information.
The bar heights in the histogram are dependent on the choice of bin edges and bin widths.
For many parametric distributions, maximum likelihood is a better way to estimate parameters because it avoids these problems. The Weibull pdf has almost the same form as the Weibull curve:
然而,
replaces the scale parameter
因为该函数必须集成到1.要使用最大可能性拟合Weibull分布到数据,请使用fitdist
并指定“威布尔”
as the distribution name. Unlike least squares, maximum likelihood finds a Weibull pdf that best matches the scaled histogram without minimizing the sum of the squared differences between the pdf and the bar heights.
pd = fitdist(life,“威布尔”);
绘制数据的缩放直方图,并将拟合的PDF叠加。
h =直方图(生命,binedges,'Normalization','pdf','FaceColor',[。9.9 .9]);Xlabel(“失败的时间”);ylabel(“概率密度”);ylim([0 0.1]); xgrid = linspace(0,20,100)'; pdfEst = pdf(pd,xgrid); line(xgrid,pdfEst)
最好的做法是检查模型的合适性。
尽管通常不建议将曲线拟合到直方图,但在某些情况下,该过程是合适的。例如,请参阅Fit Custom Distributions.
分配配件的功能
统计和机器学习工具箱™包括功能
fitdist
用于将概率分布对象拟合到数据。它还包括专用拟合功能(例如wblfit
) for fitting parametric distributions using maximum likelihood, the functionmle
用于拟合自定义分布,而无需专用拟合功能,该功能ksdensity
for fitting nonparametric distribution models to data.统计和机器学习工具箱additionally provides theDistribution FitterApp简化了分发拟合中的许多任务,例如生成可视化和诊断图。
Functions in Optimization Toolbox™ enable you to fit complicated distributions, including those with constraints on the parameters.
The MATLAB® function
fminsearch
提供最大的似然分布拟合。
See Also
fitnlm
|fitglm
|fitrgp
|FITRSVM
|polyfit
|fminsearch
|fitdist
|mle
|ksdensity
|Distribution Fitter