Gini
試圖計算 StackOverflow 聲譽分佈的基尼指數?
我正在嘗試使用 SO Data Explorer 計算 SO 聲譽分佈上的 Gini 指數。我試圖實現的等式是這樣的:
在哪裡:= 網站上的用戶數量;= 用戶序列號 (1 - 1,225,000);= 用戶聲譽. 這就是我實現它的方式(從這裡複製):
DECLARE @numUsers int SELECT @numUsers = COUNT(*) FROM Users DECLARE @totalRep float SELECT @totalRep = SUM(Users.Reputation) FROM Users DECLARE @giniNominator float SELECT @giniNominator = SUM( (@numUsers + 1 - CAST(Users.Id as Float)) * CAST(Users.Reputation as Float)) FROM Users DECLARE @giniCalc float SELECT @giniCalc = (@numUsers + 1 - 2*(@giniNominator / @totalRep)) / @numUsers SELECT @giniCalc
我的結果是(目前)-0.53,但這沒有任何意義:我什至不確定它是如何變成負數的,即使在絕對值中,考慮到聲譽如何,我預計不平等會更接近 1你擁有的越多。
我是否在不知不覺中忽略了一些關於聲譽/用戶分佈的假設?
我做錯了什麼?
以下是使用 SQL 計算它的方法:
with balances as ( select '2018-01-01' as date, balance from unnest([1,2,3,4,5]) as balance -- Gini coef: 0.2666666666666667 union all select '2018-01-02' as date, balance from unnest([3,3,3,3]) as balance -- Gini coef: 0.0 union all select '2018-01-03' as date, balance from unnest([4,5,1,8,6,45,67,1,4,11]) as balance -- Gini coef: 0.625 ), ranked_balances as ( select date, balance, row_number() over (partition by date order by balance desc) as rank from balances ) SELECT date, -- (1 − 2B) https://en.wikipedia.org/wiki/Gini_coefficient 1 - 2 * sum((balance * (rank - 1) + balance / 2)) / count(*) / sum(balance) AS gini FROM ranked_balances GROUP BY date ORDER BY date ASC -- verify here http://shlegeris.com/gini
解釋在這裡https://medium.com/@medvedev1088/calculating-gini-coefficient-in-bigquery-3bc162c82168