如何使用 R 計算“通往白宮的路徑”？

November 4, 2012

我剛剛遇到了這個偉大的分析，它在視覺上既有趣又美麗：

http://www.nytimes.com/interactive/2012/11/02/us/politics/paths-to-the-white-house.html

我很好奇如何使用 R 構建這樣的“路徑樹”。構建這樣的路徑樹需要什麼數據和算法？

謝謝。

使用遞歸解決方案是很自然的。

數據必須包括參與競選的州名單、他們的選舉人票以及對左翼（“藍色”）候選人的假定起始優勢。（一個值接近於再現紐約時報的圖形。）在每一步，都會檢查兩種可能性（左贏或輸）；優勢更新；如果此時可以根據剩餘的票數確定結果（贏、輸或平局），則計算停止；否則，對列表中的剩餘狀態遞歸重複。因此：

paths.compute <- function(start, options, states) {
 if (start > sum(options)) x <- list(Id="O", width=1)
 else if (start < -sum(options)) x <- list(Id="R", width=1)
 else if (length(options) == 0 && start == 0) x <- list(Id="*", width=1)
 else {
   l <- paths.compute(start+options[1], options[-1], states[-1])
   r <- paths.compute(start-options[1], options[-1], states[-1])
   x <- list(Id=states[1], L=l, R=r, width=l$width+r$width, node=TRUE)
 }
 class(x) <- "path"
 return(x)
}

states <- c("FL", "OH", "NC", "VA", "WI", "CO", "IA", "NV", "NH")
votes <- c(29, 18, 15, 13, 10, 9, 5, 6, 4)
p <- paths.compute(47, votes, states)

這有效地修剪了每個節點的樹，比探索所有節點需要更少的計算可能的結果。其餘的只是圖形細節，所以我將只討論對有效可視化至關重要的算法部分。

完整的程序如下。它以適度靈活的方式編寫，使用戶能夠調整許多參數。繪圖算法的關鍵部分是樹形佈局。為此，plot.path使用一個width字段將剩餘的水平空間按比例分配給每個節點的兩個後代。該字段最初計算paths.compute為每個節點下的葉子（後代）總數。（如果不進行一些這樣的計算，並且在每個節點處將二叉樹簡單地分成兩半，那麼到第 9 個狀態時，只有每片葉子可用的總寬度，這太窄了。任何開始在紙上畫二叉樹的人很快就會遇到這個問題！）

節點的垂直位置以幾何級數排列（具有公比a），以便在樹的較深部分間距更近。樹枝的粗細和葉子符號的大小也按深度縮放。（這會導致葉子上的圓形符號出現問題，因為它們的縱橫比會隨著a變化而變化。我沒有費心去解決這個問題。）

paths.compute <- function(start, options, states) {
 if (start > sum(options)) x <- list(Id="O", width=1)
 else if (start < -sum(options)) x <- list(Id="R", width=1)
 else if (length(options) == 0 && start == 0) x <- list(Id="*", width=1)
 else {
   l <- paths.compute(start+options[1], options[-1], states[-1])
   r <- paths.compute(start-options[1], options[-1], states[-1])
   x <- list(Id=states[1], L=l, R=r, width=l$width+r$width, node=TRUE)
 }
 class(x) <- "path"
 return(x)
}

plot.path <- function(p, depth=0, x0=1/2, y0=1, u=0, v=1, a=.9, delta=0,
              x.offset=0.01, thickness=12, size.leaf=4, decay=0.15, ...) {
 #
 # Graphical symbols
 #
 cyan <- rgb(.25, .5, .8, .5); cyan.full <- rgb(.625, .75, .9, 1)
 magenta <- rgb(1, .7, .775, .5); magenta.full <- rgb(1, .7, .775, 1)
 gray <- rgb(.95, .9, .4, 1)
 #
 # Graphical elements: circles and connectors.
 #
 circle <- function(center, radius, n.points=60) {
   z <- (1:n.points) * 2 * pi / n.points
   t(rbind(cos(z), sin(z)) * radius + center)
 }
 connect <- function(x1, x2, veer=0.45, n=15, ...){
   x <- seq(x1[1], x1[2], length.out=5)
   y <- seq(x2[1], x2[2], length.out=5)
   y[2] = veer * y[3] + (1-veer) * y[2]
   y[4] = veer * y[3] + (1-veer) * y[4]
   s = spline(x, y, n)
   lines(s$x, s$y, ...)
 }
 #
 # Plot recursively:
 #
 scale <- exp(-decay * depth)
 if (is.null(p$node)) {
   if (p$Id=="O") {dx <- -y0; color <- cyan.full} 
   else if (p$Id=="R") {dx <- y0; color <- magenta.full}
   else {dx = 0; color <- gray}
   polygon(circle(c(x0 + dx*x.offset, y0), size.leaf*scale/100), col=color, border=NA)
   text(x0 + dx*x.offset, y0, p$Id, cex=size.leaf*scale)
 } else {  
   mid <- ((delta+p$L$width) * v + (delta+p$R$width) * u) / (p$L$width + p$R$width + 2*delta)
   connect(c(x0, (x0+u)/2), c(y0, y0 * a), lwd=thickness*scale, col=cyan, ...)
   connect(c(x0, (x0+v)/2), c(y0, y0 * a), lwd=thickness*scale, col=magenta,  ...)
   plot(p$L, depth=depth+1, x0=(x0+u)/2, y0=y0*a, u, mid, a, delta, x.offset, thickness, size.leaf, decay, ...)
   plot(p$R, depth=depth+1, x0=(x0+v)/2, y0=y0*a, mid, v, a, delta, x.offset, thickness, size.leaf, decay, ...)
 }
}

plot.grid <- function(p, y0=1, a=.9, col.text="Gray", col.line="White", ...) {
 #
 # Plot horizontal lines and identifiers.
 #
 if (!is.null(p$node)) {
   abline(h=y0, col=col.line, ...)
   text(0.025, y0*1.0125, p$Id, cex=y0, col=col.text, ...)
   plot.grid(p$L, y0=y0*a, a, col.text, col.line, ...)
   plot.grid(p$R, y0=y0*a, a, col.text, col.line, ...)
 }
}

states <- c("FL", "OH", "NC", "VA", "WI", "CO", "IA", "NV", "NH")
votes <- c(29, 18, 15, 13, 10, 9, 5, 6, 4)
p <- paths.compute(47, votes, states)

a <- 0.925
eps <- 1/26
y0 <- a^10; y1 <- 1.05

mai <- par("mai")
par(bg="White", mai=c(eps, eps, eps, eps))
plot(c(0,1), c(a^10, 1.05), type="n", xaxt="n", yaxt="n", xlab="", ylab="")
rect(-eps, y0 - eps * (y1 - y0), 1+eps, y1 + eps * (y1-y0), col="#f0f0f0", border=NA)
plot.grid(p, y0=1, a=a, col="White", col.text="#888888")
plot(p, a=a, delta=40, thickness=12, size.leaf=4, decay=0.2)
par(mai=mai)

引用自：https://stats.stackexchange.com/questions/41851

如何使用 R 計算“通往白宮的路徑”？

相關問答

確定兩個變量是線性還是非線性甚至不相關的最佳編程方法是什麼

為什麼隨機森林圖中有很多條線？

用於教學目的的著名統計勝利和恐怖故事

這個帶有兩個連接點的行的圖的名稱是什麼？

ReLu 激活的輸出分佈是什麼？

箱線圖的歷史是什麼，“箱線圖”設計是如何演變的？