來自 GLMNET 的可變重要性
函數顯示在原始變量尺度上)?如果是這樣,如何做到這一點(使用 x 和 y 的標準差)標準化回歸係數。示例代碼:
library(glmnet) #data comes from #http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic) datasetTest <- read.csv('C:/Documents and Settings/E997608/Desktop/wdbc.data.txt',head=FALSE) #appears to use the first level as the target success datasetTest$V2<-as.factor(ifelse(as.character(datasetTest$V2)=="M","0","1")) #cross validation to find optimal lambda #using the lasso because alpha=1 cv.result<-cv.glmnet( x=as.matrix(dataset[,3:ncol(datasetTest)]), y=datasetTest[,2], family="binomial", nfolds=10, type.measure="deviance", alpha=1 ) #values of lambda used histogram(cv.result$lambda) #plot of the error measure (here was deviance) #as a CI from each of the 10 folds #for each value of lambda (log actually) plot(cv.result) #the mean cross validation error (one for each of the #100 values of lambda cv.result$cvm #the value of lambda that minimzes the error measure #result: 0.001909601 cv.result$lambda.min log(cv.result$lambda.min) #the value of lambda that minimzes the error measure #within 1 SE of the minimum #result: 0.007024236 cv.result$lambda.1se #the full sequence was fit in the object called cv.result$glmnet.fit #this is same as a call to it directly. #here are the coefficients from the min lambda coef(cv.result$glmnet.fit,s=cv.result$lambda.1se)
據我所知,glmnet 不計算回歸係數的標準誤差(因為它使用循環坐標下降擬合模型參數)。因此,如果您需要標準化回歸係數,則需要使用其他方法(例如 glm)
話雖如此,如果在擬合之前對解釋變量進行了標準化,並且使用“standardize = FALSE”調用glmnet,那麼不太重要的係數將小於更重要的係數 - 因此您可以僅按它們的大小對它們進行排名。這在非微不足道的收縮量(即非零 lambda)下變得更加明顯