中澤さんの 「誤解を生むグラフ」に示されているデータ
> cor(x)
YEAR CUCI BEEFCC BEEFSP
YEAR 1.0000000 0.9402893 0.7666001 0.7501338
CUCI 0.9402893 1.0000000 0.5671291 0.5509674
BEEFCC 0.7666001 0.5671291 1.0000000 0.9971032
BEEFSP 0.7501338 0.5509674 0.9971032 1.0000000
CUCI と BEEFCC, BEEFSP には,ほどほどの正の相関(0.567, 0.551)
しかし,いずれも YEAR とは,高度の正相関
YEAR を制御した(YEAR の影響を除いた)偏相関係数を求めてみると,
> # partial correlation r(CUCI, BEEFCC . YEAY)
> (cor(x$CUCI, x$BEEFCC) - cor(x$CUCI, x$YEAR) * cor(x$BEEFCC, x$YEAR)) /
+ sqrt((1-cor(x$CUCI, x$YEAR)^2)*(1-cor(x$BEEFCC, x$YEAR)^2))
[1] -0.7032116
> # partial correlation r(CUCI, BEEFSP . YEAY)
> (cor(x$CUCI, x$BEEFSP) - cor(x$CUCI, x$YEAR) * cor(x$BEEFSP, x$YEAR)) /
+ sqrt((1-cor(x$CUCI, x$YEAR)^2)*(1-cor(x$BEEFSP, x$YEAR)^2))
[1] -0.6858511
いずれも,やや強い負の相関になった。
YEAR で説明できない部分の散布図を描いておく。左上の 1 点を除いても,負の相関だ。
cuci.year = lm(CUCI ~ YEAR, data=x)$residuals
beefcc.year = lm(BEEFCC ~ YEAR, data=x)$residuals
plot(cuci.year ~ beefcc.year)
text(-1, 3000, sprintf("r = %.3f (partial cor. coef.)",
cor(cuci.year, beefcc.year)), pos=4)
cuci.year = lm(CUCI ~ YEAR, data=x)$residuals
beefsp.year = lm(BEEFSP ~ YEAR, data=x)$residuals
plot(cuci.year ~ beefsp.year)
text(-0.5, 3000, sprintf("r = %.3f (partial cor. coef.)",
cor(cuci.year, beefsp.year)), pos=4)