Showing that the OLS estimator is scale equivariant?
I don’t have a formal definition of scale equivariance, but here’s what Introduction to Statistical Learning says about this on p. 217:
The standard least squares coefficients… are scale equivariant: multiplying by a constant simply leads to a scaling of the least squares coefficient estimates by a factor of .
For simplicity, let’s assume the general linear model , where , is a matrix (where ) with all entries in , , and is a -dimensional vector of real-valued random variables with .
From OLS estimation, we know that if has full (column) rank,
Suppose we multiplied a column of , say for some , by a constant . This would be equivalent to the matrix
where all other entries of the matrix above are , and is in the th entry of the diagonal of . Then, has full (column) rank as well, and the resulting OLS estimator using as the new design matrix is
After some work, one can show that
and
How do I go from here to show the claim quoted above (i.e., that )? It’s not clear to me how to compute .
Since the assertion in the quotation is a collection of statements about rescaling the columns of X , you might as well prove them all at once. Indeed, it takes no more work to prove a generalization of the assertion:
When X is right-multiplied by an invertible matrix A , then the new coefficient estimate ˆβA is equal to ˆβ left-multiplied by A−1 .
The only algebraic facts you need are the (easily proven, well-known ones) that (AB)′=B′A′ for any matrices AB and (AB)−1=B−1A−1 for invertible matrices A and B . (A subtler version of the latter is needed when working with generalized inverses: for invertible A and B and any X , (AXB)−=B−1X−A−1 .)
Proof by algebra: ˆβA=((XA)′((XA))−(XA)′y=A−1(X′X)−(A′)−1A′y=A−1ˆβ,
QED. (In order for this proof to be fully general, the − superscript refers to a generalized inverse.)
Proof by geometry:
Given bases Ep and En of Rn and Rp , respectively, X represents a linear transformation from Rp to Rn . Right-multiplication of X by A can be considered as leaving this transformation fixed but changing Ep to AEp (that is, to the columns of A ). Under that change of basis, the representation of any vector ˆβ∈Rp must change via left-multiplication by A−1 , QED.
(This proof works, unmodified, even when X′X is not invertible.)
The quotation specifically refers to the case of diagonal matrices A with Aii=1 for i≠j and Ajj=c .
Connection with least squares
The objective here is to use first principles to obtain the result, with the principle being that of least squares: estimating coefficients that minimize the sum of squares of residuals.
Again, proving a (huge) generalization proves no more difficult and is rather revealing. Suppose ϕ:Vp→Wn
is any map (linear or not) of real vector spaces and suppose Q is any real-valued function on Wn . Let U⊂Vp be the (possibly empty) set of points v for which Q(ϕ(v)) is minimized.Result: U , which is determined solely by Q and ϕ , does not depend on any choice of basis Ep used to represent vectors in Vp .
Proof: QED.
There’s nothing to prove!
Application of the result: Let F be a positive semidefinite quadratic form on Rn , let y∈Rn , and suppose ϕ is a linear map represented by X when bases of Vp=Rp and Wn=Rn are chosen. Define Q(x)=F(y,x) . Choose a basis of Rp and suppose ˆβ is the representation of some v∈U in that basis. This is least squares: x=Xˆβ minimizes the squared distance F(y,x) . Because X is a linear map, changing the basis of Rp corresponds to right-multiplying X by some invertible matrix A . That will left-multiply ˆβ by A−1 , QED.