推广 将上述示例推广到更一般的情况,假设有m m m 个样本,对于其中任一样本( X ( j ) , Y ( j ) ) (X^{(j)},Y^{(j)}) ( X ( j ) , Y ( j ) ) ,j ∈ { 1 , 2 , . . . , m } j\in\{1,2,...,m\} j ∈ { 1 , 2 , . . . , m } ,若X ( j ) = [ x 1 ( j ) , x 2 ( j ) , . . . , x n ( j ) ] T X^{(j)}=[x_1^{(j)},x_2^{(j)},...,x_n^{(j)}]^T X ( j ) = [ x 1 ( j ) , x 2 ( j ) , . . . , x n ( j ) ] T ,Y ( j ) Y^{(j)} Y ( j ) 是一个标量,设拟合函数:f ( X ) = f ( x 1 , x 2 , . . . , x n ) = θ 0 + θ 1 x 1 + θ 2 x 2 + . . . + θ n x n f(X)=f(x_1,x_2,...,x_n)=\theta_0+\theta_1x_1+\theta_2x_2+...+\theta_nx_n f ( X ) = f ( x 1 , x 2 , . . . , x n ) = θ 0 + θ 1 x 1 + θ 2 x 2 + . . . + θ n x n ,为了方便表示,给X X X 额外增加一个分量x 0 x_0 x 0 :X = [ x 0 , x 1 , . . . , x n ] T X=[x_0,x_1,...,x_n]^T X = [ x 0 , x 1 , . . . , x n ] T ,且x 0 x_0 x 0 恒为1,于是f ( X ) = f ( x 0 , x 1 , . . . , x n ) = ∑ i = 0 n θ i x i f(X)=f(x_0,x_1,...,x_n)=\sum\limits_{i=0}^{n}{\theta_ix_i} f ( X ) = f ( x 0 , x 1 , . . . , x n ) = i = 0 ∑ n θ i x i ,根据最小二乘法思想(式( 1 ) (1) ( 1 ) ),这批样本的平方误差和为:
ϵ = ∑ j = 1 m ( f ( X ( j ) ) − Y ( j ) ) 2 = ∑ j = 1 m ( ∑ i = 0 n θ i x i ( j ) − Y ( j ) ) 2 \epsilon=\sum\limits_{j=1}^{m}\left(f(X^{(j)})-Y^{(j)}\right)^2=\sum\limits_{j=1}^{m}\left(\sum\limits_{i=0}^{n}\theta_ix_i^{(j)}-Y^{(j)}\right)^2 ϵ = j = 1 ∑ m ( f ( X ( j ) ) − Y ( j ) ) 2 = j = 1 ∑ m ( i = 0 ∑ n θ i x i ( j ) − Y ( j ) ) 2
ϵ = ∑ j = 1 m ( f ( X ( j ) ) − Y ( j ) ) 2 = ∑ j = 1 m ( ∑ i = 0 n θ i x i ( j ) − Y ( j ) ) 2 \epsilon=\sum\limits_{j=1}^{m}\left(f(X^{(j)})-Y^{(j)}\right)^2\\ =\sum\limits_{j=1}^{m}\left(\sum\limits_{i=0}^{n}\theta_ix_i^{(j)}-Y^{(j)}\right)^2 ϵ = j = 1 ∑ m ( f ( X ( j ) ) − Y ( j ) ) 2 = j = 1 ∑ m ( i = 0 ∑ n θ i x i ( j ) − Y ( j ) ) 2
然后利用ϵ \epsilon ϵ 对θ i \theta_i θ i 的偏导数为0,i ∈ { 0 , 1 , . . . , n } i\in\{0,1,...,n\} i ∈ { 0 , 1 , . . . , n } ,求解一个n+1元一次方程组(略),从而得到全部待求的参数{ θ i } i = 0 , . . . , n \{\theta_i\}_{i=0,...,n} { θ i } i = 0 , . . . , n
矩阵法求解 用矩阵的形式对上述推广进行改写,对全部样本( X , Y ) (X,Y) ( X , Y ) ,其中X = [ x ( 1 ) , x ( 2 ) , . . . , x ( m ) ] X=[\mathbf{x}^{(1)},\mathbf{x}^{(2)},...,\mathbf{x}^{(m)}] X = [ x ( 1 ) , x ( 2 ) , . . . , x ( m ) ] ,Y = [ y ( 1 ) , y ( 2 ) , . . . , y ( m ) ] Y=[\mathbf{y}^{(1)},\mathbf{y}^{(2)},...,\mathbf{y}^{(m)}] Y = [ y ( 1 ) , y ( 2 ) , . . . , y ( m ) ] ,x ( j ) = [ x 0 ( j ) , x 1 ( j ) , . . . , x n ( j ) ] T \mathbf{x}^{(j)}=[x_0^{(j)},x_1^{(j)},...,x_n^{(j)}]^T x ( j ) = [ x 0 ( j ) , x 1 ( j ) , . . . , x n ( j ) ] T ,y ( j ) \mathbf{y}^{(j)} y ( j ) 是标量,x 0 ( j ) = 1 x_0^{(j)}=1 x 0 ( j ) = 1 ,Θ = [ θ 0 , θ 1 , . . . , θ n ] T \Theta=[\theta_0,\theta_1,...,\theta_n]^T Θ = [ θ 0 , θ 1 , . . . , θ n ] T ,则有:
ϵ = ∥ Θ T X − Y ∥ 2 = ( Θ T X − Y ) ( Θ T X − Y ) T \epsilon=\|\Theta^TX-Y\|^2\\ =(\Theta^TX-Y)(\Theta^TX-Y)^T ϵ = ∥ Θ T X − Y ∥ 2 = ( Θ T X − Y ) ( Θ T X − Y ) T
ϵ = ∥ Θ T X − Y ∥ 2 = ( Θ T X − Y ) ( Θ T X − Y ) T \epsilon=\|\Theta^TX-Y\|^2=(\Theta^TX-Y)(\Theta^TX-Y)^T ϵ = ∥ Θ T X − Y ∥ 2 = ( Θ T X − Y ) ( Θ T X − Y ) T
注:∥ ⋅ ∥ 2 \|·\|^2 ∥ ⋅ ∥ 2 为向量模的平方,等同于向量自身的内积,而∥ Θ T X − Y ∥ 2 = ( Θ T x ( 1 ) − y ( 1 ) ) 2 + ( Θ T x ( 2 ) − y ( 2 ) ) 2 + . . . + ( Θ T x ( m ) − y ( m ) ) 2 2 = ∑ j = 1 m ( Θ T x ( j ) − y ( j ) ) 2 = ∑ j = 1 m ( ∑ i = 0 n θ i x i ( j ) − y ( j ) ) 2 \|\Theta^TX-Y\|^2=\sqrt{(\Theta^T\mathbf{x}^{(1)}-\mathbf{y}^{(1)})^2+(\Theta^T\mathbf{x}^{(2)}-\mathbf{y}^{(2)})^2+...+(\Theta^T\mathbf{x}^{(m)}-\mathbf{y}^{(m)})^2}^2=\sum\limits_{j=1}^{m}(\Theta^T\mathbf{x}^{(j)}-\mathbf{y}^{(j)})^2\\=\sum\limits_{j=1}^{m}(\sum\limits_{i=0}^{n}\theta_ix_i^{(j)}-\mathbf{y}^{(j)})^2 ∥ Θ T X − Y ∥ 2 = ( Θ T x ( 1 ) − y ( 1 ) ) 2 + ( Θ T x ( 2 ) − y ( 2 ) ) 2 + . . . + ( Θ T x ( m ) − y ( m ) ) 2 2 = j = 1 ∑ m ( Θ T x ( j ) − y ( j ) ) 2 = j = 1 ∑ m ( i = 0 ∑ n θ i x i ( j ) − y ( j ) ) 2
Θ \Theta Θ 形状为( n + 1 ) × 1 (n+1)\times 1 ( n + 1 ) × 1 ,X X X 形状为( n + 1 ) × m (n+1)\times m ( n + 1 ) × m ,Y Y Y 形状为1 × m 1\times m 1 × m ,应用矩阵求导术 :
d ( ϵ ) = d ( Θ T X − Y ) ( Θ T X − Y ) T + ( Θ T X − Y ) d ( ( Θ T X − Y ) T ) d(\epsilon)=d(\Theta^TX-Y)(\Theta^TX-Y)^T+(\Theta^TX-Y)d((\Theta^TX-Y)^T) d ( ϵ ) = d ( Θ T X − Y ) ( Θ T X − Y ) T + ( Θ T X − Y ) d ( ( Θ T X − Y ) T )
d ( ϵ ) = d ( Θ T X − Y ) ( Θ T X − Y ) T + ( Θ T X − Y ) d ( ( Θ T X − Y ) T ) d(\epsilon)=d(\Theta^TX-Y)(\Theta^TX-Y)^T\\ +(\Theta^TX-Y)d((\Theta^TX-Y)^T) d ( ϵ ) = d ( Θ T X − Y ) ( Θ T X − Y ) T + ( Θ T X − Y ) d ( ( Θ T X − Y ) T )
其中d ( Θ T X − Y ) = d ( Θ T ) X + Θ T d ( X ) − d ( Y ) = d ( Θ T ) X d(\Theta^TX-Y)=d(\Theta^T)X+\Theta^Td(X)-d(Y)=d(\Theta^T)X d ( Θ T X − Y ) = d ( Θ T ) X + Θ T d ( X ) − d ( Y ) = d ( Θ T ) X ,d ( ( Θ T X − Y ) T ) = ( d ( Θ T X − Y ) ) T = ( d ( Θ T ) X ) T = X T ( d ( Θ T ) ) T = X T ( ( d Θ ) T ) T = X T d ( Θ ) d((\Theta^TX-Y)^T)=(d(\Theta^TX-Y))^T=(d(\Theta^T)X)^T=X^T(d(\Theta^T))^T=X^T((d\Theta)^T)^T=X^Td(\Theta) d ( ( Θ T X − Y ) T ) = ( d ( Θ T X − Y ) ) T = ( d ( Θ T ) X ) T = X T ( d ( Θ T ) ) T = X T ( ( d Θ ) T ) T = X T d ( Θ ) ,又d ( Θ T ) X ( Θ T X − Y ) T d(\Theta^T)X(\Theta^TX-Y)^T d ( Θ T ) X ( Θ T X − Y ) T 是标量,所以d ( Θ T ) X ( Θ T X − Y ) T = ( d ( Θ T ) X ( Θ T X − Y ) T ) T = ( Θ T X − Y ) X T d ( Θ ) d(\Theta^T)X(\Theta^TX-Y)^T=(d(\Theta^T)X(\Theta^TX-Y)^T)^T=(\Theta^TX-Y)X^Td(\Theta) d ( Θ T ) X ( Θ T X − Y ) T = ( d ( Θ T ) X ( Θ T X − Y ) T ) T = ( Θ T X − Y ) X T d ( Θ ) ,则有:
d ( ϵ ) = 2 ( Θ T X − Y ) X T d ( Θ ) = ( 2 X ( Θ T X − Y ) T ) T d ( Θ ) d(\epsilon)=2(\Theta^TX-Y)X^Td(\Theta)=(2X(\Theta^TX-Y)^T)^Td(\Theta) d ( ϵ ) = 2 ( Θ T X − Y ) X T d ( Θ ) = ( 2 X ( Θ T X − Y ) T ) T d ( Θ )
d ( ϵ ) = 2 ( Θ T X − Y ) X T d ( Θ ) = ( 2 X ( Θ T X − Y ) T ) T d ( Θ ) d(\epsilon)=2(\Theta^TX-Y)X^Td(\Theta)\\ =(2X(\Theta^TX-Y)^T)^Td(\Theta) d ( ϵ ) = 2 ( Θ T X − Y ) X T d ( Θ ) = ( 2 X ( Θ T X − Y ) T ) T d ( Θ )
根据多元微分d f = ∂ f ∂ x 1 d x 1 + ∂ f ∂ x 2 d x 2 + . . . + ∂ f ∂ x n d x n df=\frac{\partial f}{\partial x_1}dx_1+\frac{\partial f}{\partial x_2}dx_2+...+\frac{\partial f}{\partial x_n}dx_n d f = ∂ x 1 ∂ f d x 1 + ∂ x 2 ∂ f d x 2 + . . . + ∂ x n ∂ f d x n ,其中f = f ( x 1 , x 2 , . . . , x n ) = f ( X ) f=f(x_1,x_2,...,x_n)=f(X) f = f ( x 1 , x 2 , . . . , x n ) = f ( X ) ,即有:
d f = ∑ i = 1 n ∂ f ∂ x i d x i = ( ∂ f ∂ X ) T d X df=\sum\limits_{i=1}^{n}\frac{\partial f}{\partial x_i}dx_i=(\frac{\partial f}{\partial X})^TdX d f = i = 1 ∑ n ∂ x i ∂ f d x i = ( ∂ X ∂ f ) T d X
成立,其中∂ f ∂ X = [ ∂ f ∂ x 1 , ∂ f ∂ x 2 , . . . , ∂ f ∂ x n ] T \frac{\partial f}{\partial X}=[\frac{\partial f}{\partial x_1},\frac{\partial f}{\partial x_2},...,\frac{\partial f}{\partial x_n}]^T ∂ X ∂ f = [ ∂ x 1 ∂ f , ∂ x 2 ∂ f , . . . , ∂ x n ∂ f ] T ,d X = [ d x 1 , d x 2 , . . . , d x n ] T dX=[dx_1,dx_2,...,dx_n]^T d X = [ d x 1 , d x 2 , . . . , d x n ] T
于是立即推:
∂ ϵ ∂ Θ = 2 X ( Θ T X − Y ) T = 2 X ( X T Θ − Y T ) \frac{\partial \epsilon}{\partial \Theta}=2X(\Theta^TX-Y)^T=2X(X^T\Theta-Y^T) ∂ Θ ∂ ϵ = 2 X ( Θ T X − Y ) T = 2 X ( X T Θ − Y T )
∂ ϵ ∂ Θ = 2 X ( Θ T X − Y ) T = 2 X ( X T Θ − Y T ) \frac{\partial \epsilon}{\partial \Theta}=2X(\Theta^TX-Y)^T\\ =2X(X^T\Theta-Y^T) ∂ Θ ∂ ϵ = 2 X ( Θ T X − Y ) T = 2 X ( X T Θ − Y T )
令∂ ϵ ∂ Θ = 0 \frac{\partial \epsilon}{\partial \Theta}=0 ∂ Θ ∂ ϵ = 0 ,解出Θ = ( X X T ) − 1 X Y T \Theta=(XX^T)^{-1}XY^T Θ = ( X X T ) − 1 X Y T
从结果看,X X T XX^T X X T 必须可逆,否则不能直接使用最小二乘法,此时,可以使用其它方法如梯度下降法迭代求解最优解
在三维坐标系中的线性拟合示例(一个平面):
绘图脚本下载