最小二乘法多项式曲线拟合,根据给定的n个点,并不要求这条曲线精确地经过这些点,而是通过多项式曲线y=f(x)来近似地拟合这些数据点

原理

设多项式函数f(x)=a0+a1x+a2x2+...+akxkf(x)=a_0+a_1x+a_2x^2+...+a_kx^k,有nn个数据点(x0,y0),(x1,y1),...,(xn1,yn1)(x_0,y_0),(x_1,y_1),...,(x_{n-1},y_{n-1}),使用最小二乘法进行函数拟合,得到误差e=i=0n1[f(xi)yi]2=i=0n1[(a0+a1xi+a2xi2+...+akxik)yi]2e=\sum\limits_{i=0}^{n-1}[f(x_i)-y_i]^2=\sum\limits_{i=0}^{n-1}[(a_0+a_1x_i+a_2x_i^2+...+a_kx_i^k)-y_i]^2

求对a0,a1,...,aka_0,a_1,...,a_k的偏导数,并令其为0,此时误差最小:

ea0=2i=0n1[(a0+a1xi+a2xi2+...+akxik)yi]1=0\frac{\partial e}{\partial a_0}=2\sum\limits_{i=0}^{n-1}[(a_0+a_1x_i+a_2x_i^2+...+a_kx_i^k)-y_i]1=0

ea1=2i=0n1[(a0+a1xi+a2xi2+...+akxik)yi]xi=0\frac{\partial e}{\partial a_1}=2\sum\limits_{i=0}^{n-1}[(a_0+a_1x_i+a_2x_i^2+...+a_kx_i^k)-y_i]x_i=0

ea2=2i=0n1[(a0+a1xi+a2xi2+...+akxik)yi]xi2=0\frac{\partial e}{\partial a_2}=2\sum\limits_{i=0}^{n-1}[(a_0+a_1x_i+a_2x_i^2+...+a_kx_i^k)-y_i]x_i^2=0

\dots

eak=2i=0n1[(a0+a1xi+a2xi2+...+akxik)yi]xik=0\frac{\partial e}{\partial a_k}=2\sum\limits_{i=0}^{n-1}[(a_0+a_1x_i+a_2x_i^2+...+a_kx_i^k)-y_i]x_i^k=0

各式展开得到:

i=0n1a0xi0+i=0n1a1xi1+...+i=0n1akxik=i=0n1yixi0\sum\limits_{i=0}^{n-1}a_0x_i^0+\sum\limits_{i=0}^{n-1}a_1x_i^1+...+\sum\limits_{i=0}^{n-1}a_kx_i^k=\sum\limits_{i=0}^{n-1}y_ix_i^0

i=0n1a0xi1+i=0n1a1xi2+...+i=0n1akxik+1=i=0n1yixi1\sum\limits_{i=0}^{n-1}a_0x_i^1+\sum\limits_{i=0}^{n-1}a_1x_i^2+...+\sum\limits_{i=0}^{n-1}a_kx_i^{k+1}=\sum\limits_{i=0}^{n-1}y_ix_i^1

i=0n1a0xi2+i=0n1a1xi3+...+i=0n1akxik+2=i=0n1yixi2\sum\limits_{i=0}^{n-1}a_0x_i^2+\sum\limits_{i=0}^{n-1}a_1x_i^3+...+\sum\limits_{i=0}^{n-1}a_kx_i^{k+2}=\sum\limits_{i=0}^{n-1}y_ix_i^2

\dots

i=0n1a0xik+i=0n1a1xik+1+...+i=0n1akxik+k=i=0n1yixik\sum\limits_{i=0}^{n-1}a_0x_i^k+\sum\limits_{i=0}^{n-1}a_1x_i^{k+1}+...+\sum\limits_{i=0}^{n-1}a_kx_i^{k+k}=\sum\limits_{i=0}^{n-1}y_ix_i^k

写成矩阵相乘的形式:

[i=0n1xi0i=0n1xi1i=0n1xik i=0n1xi1i=0n1xi2i=0n1xik+1  i=0n1xiki=0n1xik+1i=0n1xik+k]×[a0 a1  ak]=[i=0n1yixi0 i=0n1yixi1  i=0n1yixik]\begin{bmatrix}\sum\limits_{i=0}^{n-1}x_i^0 & \sum\limits_{i=0}^{n-1}x_i^1 & \cdots & \sum\limits_{i=0}^{n-1}x_i^k \\\ \sum\limits_{i=0}^{n-1}x_i^1 & \sum\limits_{i=0}^{n-1}x_i^2 & \cdots & \sum\limits_{i=0}^{n-1}x_i^{k+1} \\\ \vdots & \vdots & \ddots & \vdots \\\ \sum\limits_{i=0}^{n-1}x_i^k & \sum\limits_{i=0}^{n-1}x_i^{k+1} & \cdots & \sum\limits_{i=0}^{n-1}x_i^{k+k}\end{bmatrix}\times\begin{bmatrix}a_0 \\\ a_1 \\\ \vdots \\\ a_k\end{bmatrix}=\begin{bmatrix}\sum\limits_{i=0}^{n-1}y_ix_i^0 \\\ \sum\limits_{i=0}^{n-1}y_ix_i^1 \\\ \vdots \\\ \sum\limits_{i=0}^{n-1}y_ix_i^k\end{bmatrix}

解此非齐次方程组即可得到a0,a1,...,aka_0,a_1,...,a_k的值

显然上式矩阵中每一项都可以表示为向量内积,最终分解为:XTXA=XTYX^TXA=X^TY,其中X=[x00x01x02x0k x10x11x12x1k  xn10xn11xn12xn1k]X=\begin{bmatrix}x_0^0 & x_0^1 & x_0^2 & \cdots & x_0^k \\\ x_1^0 & x_1^1 & x_1^2 & \cdots & x_1^k \\\ \vdots & \vdots & \vdots & \ddots & \vdots \\\ x_{n-1}^0 & x_{n-1}^1 & x_{n-1}^2 & \cdots & x_{n-1}^k\end{bmatrix}Y=[y0 y1  yn1]Y=\begin{bmatrix}y_0 \\\ y_1 \\\ \vdots \\\ y_{n-1}\end{bmatrix}A=[a0 a1  ak]A=\begin{bmatrix}a_0 \\\ a_1 \\\ \vdots \\\ a_k\end{bmatrix}

所以A=(XTX)1XTYA=(X^TX)^{-1}X^TY

实验

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
def fit_poly(x,y,k=2):
n=len(x)
if n!=len(y):
raise ValueError("x's length must be equal to y's length!")
X=np.zeros((n,k+1))
X[:,0]=1
X[:,1]=np.array(x)
for i in range(2,k+1):
X[:,i]=np.power(x,i)
Y=np.array(y)[:,None]
A=(linalg.inv(X.T.dot(X))).dot(X.T).dot(Y).flatten()
expr='y='+'+'.join(['*'.join(['('+str(A[i])+')']+['x' for j in range(i)]) for i in range(k+1)])
def poly(x):
x=np.array(x)
d={'x':x}
exec(expr,globals(),d)
return d.get('y')
return poly,A

拟合实例

最小二乘实现曲线拟合(旧版)

最小二乘法或最小平方法是一种数学优化方法,它通过最小化误差的平方和来寻找数据的最佳函数匹配。该方法最早由阿德里安-马里·勒让德提出

某次实验我们得到了一组样本数据(x,y)(x,y):(25,110)、(27,115)、(31,155)、(33,160)、(35,180),将其绘制到二维直角坐标系中,看上去它们符合某种线性关系,假设该线性关系为:f(x)=ax+bf(x)=a·x+b

如上图所见,我们认为这条虚直线就是隐藏在这批数据中的真实规律,由于存在噪声,每个数据点都或多或少地偏离了原本应处于的位置,若每个数据点都沿着yy方向作垂线,与虚线的距离(取正值)就是“误差”,为了得到尽可能合理的aabb值,根据最小二乘的思想,我们试图最小化这些误差和(由于f(xi)yif(x_i)-y_i的值有正有负,所以取平方),即什么样的aabb可以使得这些数据的平方误差和(记作ϵ\epsilonϵ\epsilon是关于aba、b的二元函数)最小:

mina,bϵ=i(f(xi)yi)2(1)\min_{a,b}\epsilon=\sum_i(f(x_i)-y_i)^2 \tag{1}

根据多元微分知识,当偏导数:

{aϵ=2i(axi+byi)xi=0bϵ=2i(axi+byi)=0\left\{\begin{array}{l}{\frac{\partial}{\partial a} \epsilon=2 \sum\limits_{i}\left(a x_{i}+b-y_{i}\right) x_{i}=0} \\ {\frac{\partial}{\partial b} \epsilon=2 \sum\limits_{i}\left(a x_{i}+b-y_{i}\right)=0}\end{array}\right.

时,ϵ\epsilon取最小值,于是解此二元一次方程组,得到aabb

{4629a+151b=22240151a+5b=720\left\{\begin{array}{l}{4629a+151b=22240} \\ {151a+5b=720}\end{array}\right.

解得:a7.2a\approx 7.2b73b\approx -73

推广
将上述示例推广到更一般的情况,假设有mm个样本,对于其中任一样本(X(j),Y(j))(X^{(j)},Y^{(j)})j{1,2,...,m}j\in\{1,2,...,m\},若X(j)=[x1(j),x2(j),...,xn(j)]TX^{(j)}=[x_1^{(j)},x_2^{(j)},...,x_n^{(j)}]^TY(j)Y^{(j)}是一个标量,设拟合函数:f(X)=f(x1,x2,...,xn)=θ0+θ1x1+θ2x2+...+θnxnf(X)=f(x_1,x_2,...,x_n)=\theta_0+\theta_1x_1+\theta_2x_2+...+\theta_nx_n,为了方便表示,给XX额外增加一个分量x0x_0X=[x0,x1,...,xn]TX=[x_0,x_1,...,x_n]^T,且x0x_0恒为1,于是f(X)=f(x0,x1,...,xn)=i=0nθixif(X)=f(x_0,x_1,...,x_n)=\sum\limits_{i=0}^{n}{\theta_ix_i},根据最小二乘法思想(式(1)(1)),这批样本的平方误差和为:

ϵ=j=1m(f(X(j))Y(j))2=j=1m(i=0nθixi(j)Y(j))2\epsilon=\sum\limits_{j=1}^{m}\left(f(X^{(j)})-Y^{(j)}\right)^2=\sum\limits_{j=1}^{m}\left(\sum\limits_{i=0}^{n}\theta_ix_i^{(j)}-Y^{(j)}\right)^2

ϵ=j=1m(f(X(j))Y(j))2=j=1m(i=0nθixi(j)Y(j))2\epsilon=\sum\limits_{j=1}^{m}\left(f(X^{(j)})-Y^{(j)}\right)^2\\ =\sum\limits_{j=1}^{m}\left(\sum\limits_{i=0}^{n}\theta_ix_i^{(j)}-Y^{(j)}\right)^2

然后利用ϵ\epsilonθi\theta_i的偏导数为0,i{0,1,...,n}i\in\{0,1,...,n\},求解一个n+1元一次方程组(略),从而得到全部待求的参数{θi}i=0,...,n\{\theta_i\}_{i=0,...,n}

矩阵法求解
用矩阵的形式对上述推广进行改写,对全部样本(X,Y)(X,Y),其中X=[x(1),x(2),...,x(m)]X=[\mathbf{x}^{(1)},\mathbf{x}^{(2)},...,\mathbf{x}^{(m)}]Y=[y(1),y(2),...,y(m)]Y=[\mathbf{y}^{(1)},\mathbf{y}^{(2)},...,\mathbf{y}^{(m)}]x(j)=[x0(j),x1(j),...,xn(j)]T\mathbf{x}^{(j)}=[x_0^{(j)},x_1^{(j)},...,x_n^{(j)}]^Ty(j)\mathbf{y}^{(j)}是标量,x0(j)=1x_0^{(j)}=1Θ=[θ0,θ1,...,θn]T\Theta=[\theta_0,\theta_1,...,\theta_n]^T,则有:

ϵ=ΘTXY2=(ΘTXY)(ΘTXY)T\epsilon=\|\Theta^TX-Y\|^2\\ =(\Theta^TX-Y)(\Theta^TX-Y)^T

ϵ=ΘTXY2=(ΘTXY)(ΘTXY)T\epsilon=\|\Theta^TX-Y\|^2=(\Theta^TX-Y)(\Theta^TX-Y)^T

注:2\|·\|^2为向量模的平方,等同于向量自身的内积,而ΘTXY2=(ΘTx(1)y(1))2+(ΘTx(2)y(2))2+...+(ΘTx(m)y(m))22=j=1m(ΘTx(j)y(j))2=j=1m(i=0nθixi(j)y(j))2\|\Theta^TX-Y\|^2=\sqrt{(\Theta^T\mathbf{x}^{(1)}-\mathbf{y}^{(1)})^2+(\Theta^T\mathbf{x}^{(2)}-\mathbf{y}^{(2)})^2+...+(\Theta^T\mathbf{x}^{(m)}-\mathbf{y}^{(m)})^2}^2=\sum\limits_{j=1}^{m}(\Theta^T\mathbf{x}^{(j)}-\mathbf{y}^{(j)})^2\\=\sum\limits_{j=1}^{m}(\sum\limits_{i=0}^{n}\theta_ix_i^{(j)}-\mathbf{y}^{(j)})^2

Θ\Theta形状为(n+1)×1(n+1)\times 1XX形状为(n+1)×m(n+1)\times mYY形状为1×m1\times m,应用矩阵求导术

d(ϵ)=d(ΘTXY)(ΘTXY)T+(ΘTXY)d((ΘTXY)T)d(\epsilon)=d(\Theta^TX-Y)(\Theta^TX-Y)^T+(\Theta^TX-Y)d((\Theta^TX-Y)^T)

d(ϵ)=d(ΘTXY)(ΘTXY)T+(ΘTXY)d((ΘTXY)T)d(\epsilon)=d(\Theta^TX-Y)(\Theta^TX-Y)^T\\ +(\Theta^TX-Y)d((\Theta^TX-Y)^T)

其中d(ΘTXY)=d(ΘT)X+ΘTd(X)d(Y)=d(ΘT)Xd(\Theta^TX-Y)=d(\Theta^T)X+\Theta^Td(X)-d(Y)=d(\Theta^T)Xd((ΘTXY)T)=(d(ΘTXY))T=(d(ΘT)X)T=XT(d(ΘT))T=XT((dΘ)T)T=XTd(Θ)d((\Theta^TX-Y)^T)=(d(\Theta^TX-Y))^T=(d(\Theta^T)X)^T=X^T(d(\Theta^T))^T=X^T((d\Theta)^T)^T=X^Td(\Theta),又d(ΘT)X(ΘTXY)Td(\Theta^T)X(\Theta^TX-Y)^T是标量,所以d(ΘT)X(ΘTXY)T=(d(ΘT)X(ΘTXY)T)T=(ΘTXY)XTd(Θ)d(\Theta^T)X(\Theta^TX-Y)^T=(d(\Theta^T)X(\Theta^TX-Y)^T)^T=(\Theta^TX-Y)X^Td(\Theta),则有:

d(ϵ)=2(ΘTXY)XTd(Θ)=(2X(ΘTXY)T)Td(Θ)d(\epsilon)=2(\Theta^TX-Y)X^Td(\Theta)=(2X(\Theta^TX-Y)^T)^Td(\Theta)

d(ϵ)=2(ΘTXY)XTd(Θ)=(2X(ΘTXY)T)Td(Θ)d(\epsilon)=2(\Theta^TX-Y)X^Td(\Theta)\\ =(2X(\Theta^TX-Y)^T)^Td(\Theta)

根据多元微分df=fx1dx1+fx2dx2+...+fxndxndf=\frac{\partial f}{\partial x_1}dx_1+\frac{\partial f}{\partial x_2}dx_2+...+\frac{\partial f}{\partial x_n}dx_n,其中f=f(x1,x2,...,xn)=f(X)f=f(x_1,x_2,...,x_n)=f(X),即有:

df=i=1nfxidxi=(fX)TdXdf=\sum\limits_{i=1}^{n}\frac{\partial f}{\partial x_i}dx_i=(\frac{\partial f}{\partial X})^TdX

成立,其中fX=[fx1,fx2,...,fxn]T\frac{\partial f}{\partial X}=[\frac{\partial f}{\partial x_1},\frac{\partial f}{\partial x_2},...,\frac{\partial f}{\partial x_n}]^TdX=[dx1,dx2,...,dxn]TdX=[dx_1,dx_2,...,dx_n]^T

于是立即推:

ϵΘ=2X(ΘTXY)T=2X(XTΘYT)\frac{\partial \epsilon}{\partial \Theta}=2X(\Theta^TX-Y)^T=2X(X^T\Theta-Y^T)

ϵΘ=2X(ΘTXY)T=2X(XTΘYT)\frac{\partial \epsilon}{\partial \Theta}=2X(\Theta^TX-Y)^T\\ =2X(X^T\Theta-Y^T)

ϵΘ=0\frac{\partial \epsilon}{\partial \Theta}=0,解出Θ=(XXT)1XYT\Theta=(XX^T)^{-1}XY^T

从结果看,XXTXX^T必须可逆,否则不能直接使用最小二乘法,此时,可以使用其它方法如梯度下降法迭代求解最优解

在三维坐标系中的线性拟合示例(一个平面):

回到正题,仍使用第一个上述示例的(二维)数据,并设多项式函数f(x)=ax2+bx+cf(x)=ax^2+bx+c,使用最小平方法来进行曲线拟合(还是代入到式(1)(1)中),然后令ϵa\frac{\partial \epsilon}{\partial a}ϵb\frac{\partial \epsilon}{\partial b}ϵc\frac{\partial \epsilon}{\partial c}分别等于0,解方程组。类似的,再设f(x)=ax3+bx2+cx+df(x)=ax^3+bx^2+cx+df(x)=ax4+bx3+cx2+dx+ef(x)=ax^4+bx^3+cx^2+dx+e,对于f(x)=ax4+bx3+cx2+dx+ef(x)=ax^4+bx^3+cx^2+dx+e,它完全拟合数据点(误差为0):