主要内容

Polynomial Curve Fitting

This example shows how to fit polynomials up to sixth degree to some census data using Curve Fitting Toolbox™. It also shows how to fit a single-term exponential equation and compare this to the polynomial models.

这些步骤显示了如何:

  • 加载数据并创建适合使用不同的库models.

  • Search for the best fit by comparing graphical fit results, and by comparing numerical fit results including the fitted coefficients and goodness of fit statistics.

加载和绘制数据

The data for this example is the filecensus.mat

加载census

工作空间包含两个新变量:

  • CDATE是包含1790年至1990年的列矢量,以10年的增量增量。

  • popis a column vector with the U.S. population figures that correspond to the years inCDATE

whosCDATEpop
名称大小字节类属性CDATE 21x1 168双流行21x1 168双重
情节(CDATE,POP,'o')

图包含一个轴对象。轴对象包含一个类型行的对象。

Create and Plot a Quadratic

Use thefitfunction to fit a polynomial to data. You specify a quadratic, or second-degree polynomial, using'poly2'。The first output fromfitis the polynomial, and the second output,gof, contains the goodness of fit statistics you will examine in a later step.

[人口2,gof] = fit(cdate,pop,'poly2');

要绘制拟合,请使用阴谋功能。Add a legend in the top left corner.

情节(人口2,CDATE,POP);传奇('Location','NorthWest');

图包含一个轴对象。轴对象包含2个类型行的对象。These objects represent data, fitted curve.

Create and Plot a Selection of Polynomials

To fit polynomials of different degrees, change the fit type, e.g., for a cubic or third-degree polynomial use'poly3'。输入的规模,CDATE,很大,因此您可以通过核心和缩放数据来获得更好的结果。为此,请使用'Normalize'option.

population3 = fit(cdate,pop,'poly3','Normalize','on');population4 = fit(cdate,pop,'poly4','Normalize','on');population5 = fit(cdate,pop,'poly5','Normalize','on');supers6 = fit(cdate,pop,'poly6','Normalize','on');

一个简单的人口增长模型告诉我们,指数方程应很好地符合该人口普查数据。要适合单个术语指数模型,请使用'exp1'as the fittype.

populationExp = fit(cdate,pop,'exp1');

Plot all the fits at once, and add a meaningful legend in the top left corner of the plot.

抓住on阴谋(population3,'b');情节(人口4,'G');情节(人口5,'m');情节(人口6,'b--');阴谋(populationExp,'r-');抓住离开传奇('cdate v pop','poly2','poly3','poly4','poly5','poly6','exp1',。。。'Location','NorthWest');

图包含一个轴对象。The axes object contains 7 objects of type line. These objects represent cdate v pop, poly2, poly3, poly4, poly5, poly6, exp1.

Plot the Residuals to Evaluate the Fit

要绘制残差,请指定'residuals'as the plot type in the阴谋功能。

情节(人口2,CDATE,POP,'residuals');

图包含一个轴对象。轴对象包含2个类型行的对象。These objects represent data, zero line.

The fits and residuals for the polynomial equations are all similar, making it difficult to choose the best one.

如果残差显示系统的模式,则明显的迹象表明该模型符合数据的较差。

阴谋(populationExp,cdate,pop,'residuals');

图包含一个轴对象。轴对象包含2个类型行的对象。These objects represent data, zero line.

适合和残差一届指数ial equation indicate it is a poor fit overall. Therefore, it is a poor choice and you can remove the exponential fit from the candidates for best fit.

检查拟合超出数据范围

检查the behavior of the fits up to the year 2050. The goal of fitting the census data is to extrapolate the best fit to predict future population values.

默认情况下,拟合在数据范围内绘制。要在不同范围内绘制拟合,请在绘制拟合之前设置轴的X限度。例如,要查看从拟合中推断的值,请将上部X-LIMIT设置为2050。

情节(CDATE,POP,'o');xlim([1900, 2050]); holdon情节(人口6);抓住离开

图包含一个轴对象。轴对象包含2个类型行的对象。该对象表示拟合曲线。

检查the plot. The behavior of the sixth-degree polynomial fit beyond the data range makes it a poor choice for extrapolation and you can reject this fit.

Plot Prediction Intervals

要绘制预测间隔,请使用'predobs'or'prepfun'as the plot type. For example, to see the prediction bounds for the fifth-degree polynomial for a new observation up to year 2050:

情节(CDATE,POP,'o');XLIM([1900,2050])持有on情节(人口5,'predobs');抓住离开

图包含一个轴对象。The axes object contains 4 objects of type line. These objects represent fitted curve, prediction bounds.

直到2050年,立方多项式的情节预测间隔:

情节(CDATE,POP,'o');XLIM([1900,2050])持有on阴谋(population3,'predobs') hold离开

图包含一个轴对象。The axes object contains 4 objects of type line. These objects represent fitted curve, prediction bounds.

检查Goodness-of-Fit Statistics

The structgof显示了拟合优度的统计数据'poly2'fit. When you created the'poly2'适合fit在较早的步骤中功能,您指定了gofoutput argument.

gof
gof =带有字段的结构:SSE:159.0293 RSQUARE:0.9987 DFE:18 ADGRSQUARE:0.9986 RMSE:2.9724

检查the sum of squares due to error (SSE) and the adjusted R-square statistics to help determine the best fit. The SSE statistic is the least-squares error of the fit, with a value closer to zero indicating a better fit. The adjusted R-square statistic is generally the best indicator of the fit quality when you add additional coefficients to your model.

The large SSE for'exp1'表明它是一个差的拟合度,您已经通过检查拟合和残留物来确定它。最低的SSE值与'poly6'。However, the behavior of this fit beyond the data range makes it a poor choice for extrapolation, so you already rejected this fit by examining the plots with new axis limits.

下一个最佳SSE值与第五度多项式拟合有关'poly5', suggesting it might be the best fit. However, the SSE and adjusted R-square values for the remaining polynomial fits are all very close to each other. Which one should you choose?

比较系数和置信界以确定最佳拟合

通过检查剩余拟合的系数和置信界来解决最佳拟合问题:五度多项式和二次。

检查population2andpopulation5by displaying the models, the fitted coefficients, and the confidence bounds for the fitted coefficients:

population2
population2 = Linear model Poly2: population2(x) = p1*x^2 + p2*x + p3 Coefficients (with 95% confidence bounds): p1 = 0.006541 (0.006124, 0.006958) p2 = -23.51 (-25.09, -21.93) p3 = 2.113e+04 (1.964e+04, 2.262e+04)
population5
supers5 =线性模型poly5:supers5(x)= p1*x^5 + p2*x^4 + p3*x^3 + p4*x^2 + p5*x + p6其中x通过平均值1890和std 62.05标准化x系数(具有95%置信度):P1 = 0.5877(-2.305,3.48)P2 = 0.7047(-1.684,3.094)P3 = -0.9193(-10.19,8.356),81.57)p6 = 62.23(59.51,64.95)

You can also get the confidence intervals by using密封:

CI =密封(population5)
CI =2×6-2.3046 -1.6841 -10.1943 17.4213 68.3655 59.5102 3.4801 3.0936 8.3558 29.5199 81.5696 64.94696

系数上的置信度确定了其准确性。检查拟合方程(例如f(x)=p1*x+p2*x...) to see the model terms for each coefficient. Note thatp2指的是p2*xterm in'poly2'and thep2*x^4term in'poly5'。Do not compare normalized coefficients directly with non-normalized coefficients.

边界在p1,p2, 和p3coefficients for the fifth-degree polynomial. This means you cannot be sure that these coefficients differ from zero. If the higher order model terms may have coefficients of zero, they are not helping with the fit, which suggests that this model over fits the census data.

对于每个归一化的多项式方程,与常数,线性和二次项相关的拟合系数几乎相同。但是,随着多项式程度的增加,与较高度术语交叉零相关的系数界限表明拟合。

但是,较小的置信范围不会在p1,p2, 和p3for the quadratic fit, indicating that the fitted coefficients are known fairly accurately.

Therefore, after examining both the graphical and numerical fit results, you should select the quadraticpopulation2as the best fit to extrapolate the census data.

Evaluate the Best Fit at New Query Points

现在您选择了最合适的population2为了推断这些人口普查数据,请评估一些新的查询点:

CDATEFuture = (2000:10:2020).'; popFuture = population2(cdateFuture)
popFuture =3×1274.6221 301.8240 330.3341

To compute 95% confidence bounds on the prediction for the population in the future, use thepredint方法:

Ci = predint(群体2,CDATEFUTURE,0.95,“观察”)
CI =3×2266.9185 282.3257 293.5673 310.0807 321.3979 339.2702

Plot the predicted future population, with confidence intervals, against the fit and data.

情节(CDATE,POP,'o');xlim([1900,2040]) holdon绘图(群体2)h = errorbar(cdatefuture,popfuture,popfuture-ci(:,1),ci(::,2)-popfuture,'.');抓住离开传奇('cdate v pop','poly2','预言',。。。'Location','NorthWest')

图包含一个轴对象。轴对象包含3个类型行的对象,错误栏。这些对象代表CDATE V POP,Poly2,预测。

有关更多信息,请参阅Polynomial Models