Least-Squares

Showing that the OLS estimator is scale equivariant?

  • November 1, 2017

I don’t have a formal definition of scale equivariance, but here’s what Introduction to Statistical Learning says about this on p. 217:

The standard least squares coefficients… are scale equivariant: multiplying by a constant simply leads to a scaling of the least squares coefficient estimates by a factor of .

For simplicity, let’s assume the general linear model , where , is a matrix (where ) with all entries in , , and is a -dimensional vector of real-valued random variables with .

From OLS estimation, we know that if has full (column) rank,

Suppose we multiplied a column of , say for some , by a constant . This would be equivalent to the matrix

where all other entries of the matrix above are , and is in the th entry of the diagonal of . Then, has full (column) rank as well, and the resulting OLS estimator using as the new design matrix is

After some work, one can show that

and

How do I go from here to show the claim quoted above (i.e., that )? It’s not clear to me how to compute .

Since the assertion in the quotation is a collection of statements about rescaling the columns of X , you might as well prove them all at once. Indeed, it takes no more work to prove a generalization of the assertion:

When X is right-multiplied by an invertible matrix A , then the new coefficient estimate ˆβA is equal to ˆβ left-multiplied by A1 .

The only algebraic facts you need are the (easily proven, well-known ones) that (AB)=BA for any matrices AB and (AB)1=B1A1 for invertible matrices A and B . (A subtler version of the latter is needed when working with generalized inverses: for invertible A and B and any X , (AXB)=B1XA1 .)


Proof by algebra: ˆβA=((XA)((XA))(XA)y=A1(XX)(A)1Ay=A1ˆβ,

QED. (In order for this proof to be fully general, the superscript refers to a generalized inverse.)


Proof by geometry:

Given bases Ep and En of Rn and Rp , respectively, X represents a linear transformation from Rp to Rn . Right-multiplication of X by A can be considered as leaving this transformation fixed but changing Ep to AEp (that is, to the columns of A ). Under that change of basis, the representation of any vector ˆβRp must change via left-multiplication by A1 , QED.

(This proof works, unmodified, even when XX is not invertible.)


The quotation specifically refers to the case of diagonal matrices A with Aii=1 for ij and Ajj=c .


Connection with least squares

The objective here is to use first principles to obtain the result, with the principle being that of least squares: estimating coefficients that minimize the sum of squares of residuals.

Again, proving a (huge) generalization proves no more difficult and is rather revealing. Suppose ϕ:VpWn

is any map (linear or not) of real vector spaces and suppose Q is any real-valued function on Wn . Let UVp be the (possibly empty) set of points v for which Q(ϕ(v)) is minimized.

Result: U , which is determined solely by Q and ϕ , does not depend on any choice of basis Ep used to represent vectors in Vp .

Proof: QED.

There’s nothing to prove!

Application of the result: Let F be a positive semidefinite quadratic form on Rn , let yRn , and suppose ϕ is a linear map represented by X when bases of Vp=Rp and Wn=Rn are chosen. Define Q(x)=F(y,x) . Choose a basis of Rp and suppose ˆβ is the representation of some vU in that basis. This is least squares: x=Xˆβ minimizes the squared distance F(y,x) . Because X is a linear map, changing the basis of Rp corresponds to right-multiplying X by some invertible matrix A . That will left-multiply ˆβ by A1 , QED.

引用自:https://stats.stackexchange.com/questions/311198