Gini

試圖計算 StackOverflow 聲譽分佈的基尼指數?

  • August 25, 2012

我正在嘗試使用 SO Data Explorer 計算 SO 聲譽分佈上的 Gini 指數。我試圖實現的等式是這樣的:

在哪裡:= 網站上的用戶數量;= 用戶序列號 (1 - 1,225,000);= 用戶聲譽. 這就是我實現它的方式(從這裡複製):

DECLARE @numUsers int
SELECT @numUsers = COUNT(*) FROM Users
DECLARE @totalRep float
SELECT @totalRep = SUM(Users.Reputation) FROM Users
DECLARE @giniNominator float
SELECT @giniNominator = SUM( (@numUsers + 1 - CAST(Users.Id as Float)) * 
                             CAST(Users.Reputation as Float)) FROM Users
DECLARE @giniCalc float
SELECT @giniCalc = (@numUsers + 1 - 2*(@giniNominator / @totalRep)) / @numUsers
SELECT @giniCalc

我的結果是(目前)-0.53,但這沒有任何意義:我什至不確定它是如何變成負數的,即使在絕對值中,考慮到聲譽如何,我預計不平等會更接近 1你擁有的越多。

我是否在不知不覺中忽略了一些關於聲譽/用戶分佈的假設?

我做錯了什麼?

以下是使用 SQL 計算它的方法:

with balances as (
   select '2018-01-01' as date, balance
   from unnest([1,2,3,4,5]) as balance -- Gini coef: 0.2666666666666667
   union all
   select '2018-01-02' as date, balance
   from unnest([3,3,3,3]) as balance -- Gini coef: 0.0
   union all
   select '2018-01-03' as date, balance
   from unnest([4,5,1,8,6,45,67,1,4,11]) as balance -- Gini coef: 0.625
),
ranked_balances as (
   select date, balance, row_number() over (partition by date order by balance desc) as rank
   from balances
)
SELECT date, 
   -- (1 − 2B) https://en.wikipedia.org/wiki/Gini_coefficient
   1 - 2 * sum((balance * (rank - 1) + balance / 2)) / count(*) / sum(balance) AS gini
FROM ranked_balances
GROUP BY date
ORDER BY date ASC
-- verify here http://shlegeris.com/gini

解釋在這裡https://medium.com/@medvedev1088/calculating-gini-coefficient-in-bigquery-3bc162c82168

引用自:https://stats.stackexchange.com/questions/35081

comments powered by Disqus

相關問答