Showing that the OLS estimator is scale equivariant?

November 1, 2017

I don’t have a formal definition of scale equivariance, but here’s what Introduction to Statistical Learning says about this on p. 217:

The standard least squares coefficients… are scale equivariant: multiplying by a constant simply leads to a scaling of the least squares coefficient estimates by a factor of .

For simplicity, let’s assume the general linear model , where , is a matrix (where ) with all entries in , , and is a -dimensional vector of real-valued random variables with .

From OLS estimation, we know that if has full (column) rank,

Suppose we multiplied a column of , say for some , by a constant . This would be equivalent to the matrix

where all other entries of the matrix above are , and is in the th entry of the diagonal of . Then, has full (column) rank as well, and the resulting OLS estimator using as the new design matrix is

After some work, one can show that

and

How do I go from here to show the claim quoted above (i.e., that )? It’s not clear to me how to compute .

Since the assertion in the quotation is a collection of statements about rescaling the columns of , you might as well prove them all at once. Indeed, it takes no more work to prove a generalization of the assertion:

When is right-multiplied by an invertible matrix , then the new coefficient estimate is equal to left-multiplied by .

The only algebraic facts you need are the (easily proven, well-known ones) that for any matrices and for invertible matrices and . (A subtler version of the latter is needed when working with generalized inverses: for invertible and and any , .)

Proof by algebra:

QED. (In order for this proof to be fully general, the superscript refers to a generalized inverse.)

Proof by geometry:

Given bases and of and , respectively, represents a linear transformation from to . Right-multiplication of by can be considered as leaving this transformation fixed but changing to (that is, to the columns of ). Under that change of basis, the representation of any vector must change via left-multiplication by , QED.

(This proof works, unmodified, even when is not invertible.)

The quotation specifically refers to the case of diagonal matrices with for and .

Connection with least squares

The objective here is to use first principles to obtain the result, with the principle being that of least squares: estimating coefficients that minimize the sum of squares of residuals.

Again, proving a (huge) generalization proves no more difficult and is rather revealing. Suppose
is any map (linear or not) of real vector spaces and suppose is any real-valued function on . Let be the (possibly empty) set of points for which is minimized.

Result: , which is determined solely by and , does not depend on any choice of basis used to represent vectors in .

Proof: QED.

There’s nothing to prove!

Application of the result: Let be a positive semidefinite quadratic form on , let , and suppose is a linear map represented by when bases of and are chosen. Define . Choose a basis of and suppose is the representation of some in that basis. This is least squares: minimizes the squared distance . Because is a linear map, changing the basis of corresponds to right-multiplying by some invertible matrix . That will left-multiply by , QED.

引用自：https://stats.stackexchange.com/questions/311198

Showing that the OLS estimator is scale equivariant?

Connection with least squares

相關問答

我們真的在線性回歸的第一步中取隨機線嗎？

為什麼是 F 統計量≈≈approx1 當原假設為真時？

實際上，獨立同分佈假設是否適用於絕大多數監督學習任務？

為什麼這些圖中的 SE 區域差異如此之大

多重共線性導致的模型不穩定性究竟是什麼？

如果我的線性回歸數據包含幾個混合的線性關係怎麼辦？