Showing that the OLS estimator is scale equivariant?
I don’t have a formal definition of scale equivariance, but here’s what Introduction to Statistical Learning says about this on p. 217:
The standard least squares coefficients… are scale equivariant: multiplying by a constant simply leads to a scaling of the least squares coefficient estimates by a factor of .
For simplicity, let’s assume the general linear model , where , is a matrix (where ) with all entries in , , and is a -dimensional vector of real-valued random variables with .
From OLS estimation, we know that if has full (column) rank,
Suppose we multiplied a column of , say for some , by a constant . This would be equivalent to the matrix
where all other entries of the matrix above are , and is in the th entry of the diagonal of . Then, has full (column) rank as well, and the resulting OLS estimator using as the new design matrix is
After some work, one can show that
and
How do I go from here to show the claim quoted above (i.e., that )? It’s not clear to me how to compute .
Since the assertion in the quotation is a collection of statements about rescaling the columns of $ X $ , you might as well prove them all at once. Indeed, it takes no more work to prove a generalization of the assertion:
When $ X $ is right-multiplied by an invertible matrix $ A $ , then the new coefficient estimate $ \hat\beta_A $ is equal to $ \hat \beta $ left-multiplied by $ A^{-1} $ .
The only algebraic facts you need are the (easily proven, well-known ones) that $ (AB)^\prime=B^\prime A^\prime $ for any matrices $ AB $ and $ (AB)^{-1}=B^{-1}A^{-1} $ for invertible matrices $ A $ and $ B $ . (A subtler version of the latter is needed when working with generalized inverses: for invertible $ A $ and $ B $ and any $ X $ , $ (AXB)^{-} = B^{-1}X^{-}A^{-1} $ .)
Proof by algebra: $$ \hat\beta_A = ((XA)^\prime ((XA))^{-}(XA)^\prime y = A^{-1}(X^\prime X)^{-} (A^\prime)^{-1}A^\prime y = A^{-1}\hat \beta, $$
QED. (In order for this proof to be fully general, the $ ^- $ superscript refers to a generalized inverse.)
Proof by geometry:
Given bases $ E_p $ and $ E_n $ of $ \mathbb{R}^n $ and $ \mathbb{R}^p $ , respectively, $ X $ represents a linear transformation from $ \mathbb{R}^p $ to $ \mathbb{R}^n $ . Right-multiplication of $ X $ by $ A $ can be considered as leaving this transformation fixed but changing $ E_p $ to $ AE_p $ (that is, to the columns of $ A $ ). Under that change of basis, the representation of any vector $ \hat\beta\in\mathbb{R}^p $ must change via left-multiplication by $ A^{-1} $ , QED.
(This proof works, unmodified, even when $ X^\prime X $ is not invertible.)
The quotation specifically refers to the case of diagonal matrices $ A $ with $ A_{ii}=1 $ for $ i\ne j $ and $ A_{jj}=c $ .
Connection with least squares
The objective here is to use first principles to obtain the result, with the principle being that of least squares: estimating coefficients that minimize the sum of squares of residuals.
Again, proving a (huge) generalization proves no more difficult and is rather revealing. Suppose $$ \phi:V^p\to W^n $$ is any map (linear or not) of real vector spaces and suppose $ Q $ is any real-valued function on $ W^n $ . Let $ U\subset V^p $ be the (possibly empty) set of points $ v $ for which $ Q(\phi(v)) $ is minimized.
Result: $ U $ , which is determined solely by $ Q $ and $ \phi $ , does not depend on any choice of basis $ E_p $ used to represent vectors in $ V^p $ .
Proof: QED.
There’s nothing to prove!
Application of the result: Let $ F $ be a positive semidefinite quadratic form on $ \mathbb{R}^n $ , let $ y\in\mathbb{R}^n $ , and suppose $ \phi $ is a linear map represented by $ X $ when bases of $ V^p=\mathbb{R}^p $ and $ W^n=\mathbb{R}^n $ are chosen. Define $ Q(x)=F(y,x) $ . Choose a basis of $ \mathbb{R}^p $ and suppose $ \hat\beta $ is the representation of some $ v\in U $ in that basis. This is least squares: $ x=X\hat\beta $ minimizes the squared distance $ F(y,x) $ . Because $ X $ is a linear map, changing the basis of $ \mathbb{R}^p $ corresponds to right-multiplying $ X $ by some invertible matrix $ A $ . That will left-multiply $ \hat\beta $ by $ A^{-1} $ , QED.