2021年3月6日のブログ記事一覧-裏 RjpWiki

Julia に翻訳--032　Excel の一変量統計関数

2021年03月06日 | ブログラミング

#==========
Julia の修行をするときに，いろいろなプログラムを書き換えるのは有効な方法だ。
以下のプログラムを Julia に翻訳してみる。

Excel にある一変量統計関数
 http://aoki2.si.gunma-u.ac.jp/R/univariate.html

ファイル名: univariate.jl
関数名: avedev, average, count, devsq, geomean, harmean, scale, skew, kurt
stdev, stdevp, trimmean, varp, large, small, fact, combin

翻訳するときに書いたメモ

Julia が持っている関数と定義が少し違っていたりするものもあるので，利用するときには注意が必要かな。

==========#

using Statistics
using StatsBase # trim, geomean, harmmean

x = [21.5, 63.3, 52.1, 50.7, 58.3, 62.2, 43.7, 52.0, 60.0, 45.9,
36.4, 44.1, 80.4, 56.2, 51.7, 60.4, 38.8, 66.8, 33.0, 46.5];

avedev(x) = mean(abs.(x .- mean(x)))
avedev(x) # 10.02

average(x) = mean(x)
average(x) # 51.2

count(x) = length(x)
count(x) # 20

devsq(x) = (length(x)-1) * var(x)
devsq(x) # 3345.8200000000006

geomean(x) = StatsBase.geomean(x)
geomean(x) # 49.352768555879464

harmean(x) = StatsBase.harmmean(x)
harmean(x) # 47.18603977346159

scale(x; corrected=true) = (x .- mean(x)) ./ std(x; corrected)
std(scale(x)) # 1.0
std(scale(x, corrected=false)) # 1.025978352085154

function skew(x; corrected = true)
n = length(x)
return corrected ?
n*sum(scale(x) .^ 3)/(n-1)/(n-2) :
sum(scale(x; corrected) .^ 3) / n
end
skew(x) # -0.11679753569198104
skew(x, corrected=false) # -0.10784856887717938

Julia の skewness() は corrected=false に相当するもの
skewness(x) # -0.10784856887717896

function kurt(x; corrected = true)
n = length(x)
return corrected ?
n*(n+1)*sum(scale(x) .^ 4)/(n-1)/(n-2)/(n-3)-3*(n-1)^2/(n-2)/(n-3) :
sum(scale(x; corrected) .^ 4)/n-3
end
kurt(x) # 0.6657329536764971
kurt(x, corrected=false) # 0.22484782913535772

Julia の kurtosis() は corrected=false に相当するもの
kurtosis(x) # 0.22484782913535906; corrected=false

stdev(x) = std(x)
stdev(x) # 13.27010887196048

stdevp(x) = std(x; corrected = false)
stdevp(x) # 12.934102210822367

trimmean(x; p) = mean(trim(x; prop=p/2)) # trim() は合わせて p を取り除く
trimmean(x, p = 0.2) # 51.39375

varp(x) = var(x; corrected=false)
varp(x) # 167.29100000000003

large(x, k) = sort!(x)[end-k+1]
large(x, 3) # 63.3

small(x, k) = sort!(x)[k]
small(x, 4) # 38.8

fact(n) = factorial(n)
fact(9) # 362880

combin(n, i) = binomial(n, i)
combin(8, 3) #56

頑張ってます。クリックお願いします。

Julia に翻訳--031　Box-Cox 変換の，最適のλ（図を描く）

2021年03月06日 | ブログラミング

#==========
Julia の修行をするときに，いろいろなプログラムを書き換えるのは有効な方法だ。
以下のプログラムを Julia に翻訳してみる。

Box-Cox 変換の，最適のλ（図を描く）
http://aoki2.si.gunma-u.ac.jp/R/Box-Cox-transformation.html

Box-Cox 変換の，最適のλ（数値で求める）
http://aoki2.si.gunma-u.ac.jp/R/Box-Cox-transformation2.html

ファイル名: boxcoxtransformation.jl
関数名: boxcoxtransformation, boxcoxtransformation2

翻訳するときに書いたメモ

変数の個数もタイプも同じなので，関数名を同じにできない。

==========#

using StatsBase, Statistics, Plots

# Box-Cox 変換の，最適なλを図により求める
function boxcoxtransformation(x::Vector{Float64};
l::Float64 = -3.0, r::Float64 = 3.0, delta::Float64 = 0.1)
BoxCoxsub(λ::Float64) = λ == 0 ? std(Gm .* log.(x)) :
std((x .^ λ .- 1)./(λ .* Gm .^ (λ .- 1)))
Gm = geomean(x)
lambda0 = l:delta:r
result = similar(lambda0)
for i in 1:length(lambda0)
result[i] = BoxCoxsub(lambda0[i])
end
return lambda0, result
end

# Box-Cox 変換の，最適なλをシンプレックス法によって求める
function boxcoxtransformation2(x::Vector{Float64}; loop::Int64 = 500,
EPSILON::Float64 = 1e-15,
α::Float64 = 2.0, β::Float64 = 0.5,
γ::Float64 = 2.0)::Float64
BoxCoxsub(λ::Float64) = λ == 0 ? std(Gm .* log.(x)) :
std((x .^ λ .- 1)./(λ .* Gm .^ (λ .- 1)))
Gm = geomean(x)
p1 = -3.0
p2 = p1+0.1
vec = [p1, p2]
for i in 1:loop
result = [BoxCoxsub(vec[1]), BoxCoxsub(vec[2])]
h = argmax(result)
s = argmin(result)
ph = vec[h]
fh = result[h]
ps = vec[s]
fs = result[s]
p0 = vec[s]
pr = (1+α)*p0-α*ph
fr = BoxCoxsub(pr)
if fr > fh
pc = β*ph+(1-β)*p0
vec[h] = pc
elseif fs > fr
pe = γ*pr+(1-γ)*p0
fe = BoxCoxsub(pe)
vec[h] = fr > fe ? pe : pr
else
vec[h] = pr
end
if abs((vec[1]-vec[2])/vec[1]) < EPSILON
break
end
end
mean(vec)
end

使用例

x = [5.0, 5.0, 3.3, 4.3, 4.0, 5.5, 4.0, 6.0, 5.0, 5.0, 4.0, 4.3, 5.3, 5.0,
6.0, 6.7, 6.5, 6.0, 6.0, 5.3, 7.0, 5.0, 6.3, 5.3, 4.5, 6.0, 7.0, 2.0,
2.5, 1.5, 1.7, 1.0, 1.0, 2.0, 1.0, 1.7, 2.0, 2.0, 1.0, 1.3, 2.5, 1.0,
2.0, 2.0, 1.0, 3.0, 1.3, 1.0, 2.0, 2.0, 1.7, 1.0, 1.0, 1.0, 4.0, 5.3,
3.0, 3.7, 2.7, 3.3, 5.0, 2.7, 4.7, 3.7, 3.7, 4.0, 4.7, 4.3, 5.7, 4.7,
5.3, 5.5, 4.3, 7.0, 5.0, 5.3, 5.7, 5.3, 4.3, 5.0, 5.0, 5.0, 4.3, 4.7,
1.3, 1.3, 1.3, 1.7, 2.0, 1.0, 1.3, 1.3, 1.7, 1.7, 1.3, 1.3, 1.3, 2.3,
2.0, 2.7, 2.0, 1.7, 2.3, 1.0, 2.0, 2.0, 2.0, 1.7, 2.0, 1.3, 1.3, 2.0,
1.3, 1.7, 5.0, 5.0, 3.7, 3.3, 3.0, 3.5, 3.3, 3.0, 3.0, 4.3, 3.7, 4.0,
4.0, 3.3, 5.0, 6.0, 4.0, 4.0, 5.7, 5.5, 5.5, 5.5, 5.3, 4.7, 5.0, 4.3,
4.3, 3.0, 6.0, 2.3, 1.7, 2.0, 1.0, 1.5, 1.0, 1.0, 2.0, 1.0, 2.0, 1.3,
1.7, 1.7, 2.0, 2.0, 2.0, 1.7, 2.0, 1.7, 0.5, 1.7, 1.3, 1.7, 2.3, 2.0,
1.0, 1.0, 1.3, 2.0, 2.0, 5.5, 3.5, 5.3, 5.0, 6.0, 8.0, 6.0, 4.0, 2.0,
3.0, 2.5, 2.0, 1.0, 2.0, 1.0, 3.0, 2.0]

λ, y = boxcoxtransformation(x)
using Plots
pyplot()
plot(λ, y, xlabel="λ", ylabel="result", label="")

argmin(y) # 33
λ[argmin(y)] # 0.2
using DataFrames
println(DataFrame(a = λ, b = y)[26:36, :])
# 11×2 DataFrame
# Row │ a b
# │ Float64 Float64
# ─────┼──────────────────
# 1 │ -0.5 1.78596
# 2 │ -0.4 1.74713
# 3 │ -0.3 1.71505
# 4 │ -0.2 1.68924
# 5 │ -0.1 1.66928
# 6 │ 0.0 1.65483
# 7 │ 0.1 1.6456
# 8 │ 0.2 1.64139
# 9 │ 0.3 1.64201
# 10 │ 0.4 1.64734
# 11 │ 0.5 1.65728

boxcoxtransformation2(x) # 0.23686275184155337

頑張ってます。クリックお願いします。

Julia に翻訳--030　指定された平均値と標準偏差，相関係数，相関係数行列を持つデータ生成，混合分布データ

2021年03月06日 | ブログラミング

#==========
Julia の修行をするときに，いろいろなプログラムを書き換えるのは有効な方法だ。
以下のプログラムを Julia に翻訳してみる。

指定された平均値と標準偏差を持つデータの生成
http://aoki2.si.gunma-u.ac.jp/R/gendat1.html

特定の相関係数を持つ二変量データの生成
http://aoki2.si.gunma-u.ac.jp/R/gendat2.html

特定の相関係数行列を持つ多変量データの生成
http://aoki2.si.gunma-u.ac.jp/R/gendat.html

混合分布に従う 1 変数データの生成
http://aoki2.si.gunma-u.ac.jp/R/mix1.html

混合分布に従う 2 変数データの生成
http://aoki2.si.gunma-u.ac.jp/R/mix2.html

ファイル名: gendat.jl　　関数名: gendat, mix
gendat は 3 個の関数， mix は 2 個の関数

翻訳するときに書いたメモ

同じ名前の関数，多くなると管理が面倒？

==========#

using LinearAlgebra # diag, svd, cholesky
using StatsBase # fit
using Statistics # cor
using Printf

# 平均値，標準偏差を与え，1変数データを返す。
function gendat(n::Int64; mu::Float64=0.0, sigma::Float64=1.0)
x = randn(n)
return (x .- mean(x)) ./ std(x) .* sigma .+ mu
end

# 1 個の相関係数を与え，2変数データを返す
function gendat(n::Int64, r::Float64)
return gendat(n, [1.0 r; r 1.0])
end

# 相関係数行列の下三角行列をベクトルで与え，多変数データを返す
function gendat(n::Int64, r::Vector{Float64})
return gendat(n, trimat(r))
end

# 相関係数行列を与え，多変数データを返す
function gendat(n::Int64, r::Array{Float64,2})
nrow, ncol = size(r)
nrow == ncol || error("'r' must be a square matrix.")
all(r .== transpose(r)) || error("r must be symmetric.")
diag(r) == ones(size(r, 1)) || error("some diagonal of r is not 1.")
all(-1 .<= r .<= 1) || error("some element of 'r' is not in a range [-1, 1].")
all(eigvals(r) .> 0) || error("'r' is not positive definite.")
x = randn(n, nrow)
t = fit(ZScoreTransform, x, dims = 1)
x = StatsBase.transform(t, x)
r2 = cor(x)
solver2 = inv(r2)
vect, valu, junk = svd(r2)
coeff = solver2 * (sqrt.(valu) .* vect')'
x * coeff * cholesky(r).U
end

# 1 変数の混合データ
function mix(n::Int64, mu::Array{Float64,1}, sigma::Array{Float64,1}; times::Array{Int64,1}=ones(Int64, length(mu)))
k = length(mu)
x = Array{Float64,2}(undef, n, k)
if length(sigma) == k
for i = 1:k
y = randn(n)
x[:, i] = gendat(n, mu=mu[i], sigma=sigma[i])
end
select = rand(repeateach(collect(1:k), times), n)
for i = 1:n
x[i, 1] = x[i, select[i]]
end
return x[:, 1], select
else
return NaN
end
end

# 2 変数の混合データ
function mix(n::Int64, mu::Array{Float64,2}, sigma::Array{Float64,2}, r::Array{Float64,1}; times::Array{Int64,1}=ones(Int64, length(mu)))
k = size(mu, 1)
x1 = Array{Float64,2}(undef, n, k);
x2 = Array{Float64,2}(undef, n, k);
if k == size(sigma, 1) && k == length(r) && k == length(times) &&
size(mu, 2) == 2 && size(sigma, 2) == 2
for i = 1:k # i = 1
x = gendat(n, r[i]);# typeof(x)
x1[:, i] = x[:, 1] .* sigma[i, 1] .+ mu[i, 1]
x2[:, i] = x[:, 2] .* sigma[i, 2] .+ mu[i, 2]
end
select = rand(repeateach(collect(1:k), times), n)
for i = 1:n
x1[i, 1] = x1[i, select[i]]
x2[i, 1] = x2[i, select[i]]
end
return x1[:, 1], x2[:, 1], select
else
return NaN
end
end

# 対角成分より下の下三角行列を表すベクトルを与えて対称行列を返す（対角成分は1）
function trimat(x::Vector{Float64})
l = length(x)
n = Int((sqrt(1 + 8l) + 1) / 2)
if l != Int(n * (n - 1) / 2)
println("length of vector is not just required")
exit()
end
r = ones(n, n)
suf = 1
for j = 1:n-1
for i = j+1:n
r[j, i] = r[i, j] = x[suf]
suf += 1
end
end
return r
end

function printmatrix(a::Array{Float64,2}; name::String="")
if name != ""
println("[[ $name ]]")
end
nrow, ncol = size(a)
for i = 1:nrow
for j = 1:ncol
@printf("%10.5f", a[i, j])
end
println()
end
end

function repeateach(x::Array{Int64,1}, times::Array{Int64,1})
res = typeof(x[1])[]
for i = 1:length(x)
res = vcat(res, repeat([x[i]], times[i]))
end
return res
end

########################### 使用例 ###########################

# 1 変数正規乱数データの生成
using Random
Random.seed!(123);
x = gendat(10)
# 10-element Array{Float64,1}:
# 0.8374983157243672
# 1.8440139616019378
# 0.7816333282175925
# :
# -0.23769817996780981
# -0.7868152750059677
mean(x) # -1.3322676295501878e-16
std(x) # 1.0

# 2 変数正規乱数データの生成
using Random
Random.seed!(123);
d = gendat(10, 0.5)
# 10×2 Array{Float64,2}:
# -0.493371 0.465154
# -1.59008 0.0138096
# -1.02338 -0.694882
# :
# -0.259941 -0.934208
# -0.00219235 -1.37104
cor(d)
# 2×2 Array{Float64,2}:
# 1.0 0.5
# 0.5 1.0

# 相関係数の下三角行列を与えて多変数正規乱数データを生成する
# 与える順序に注意（列優先）
# 1.0 0.3 0.2 0.4
# 0.3 1.0 0.1 0.5
# 0.2 0.1 1.0 0.6
# 0.4 0.5 0.6 1.0
using Random
Random.seed!(123);
d = gendat(10, [0.3, 0.2, 0.4, 0.1, 0.5, 0.6])
# 10×4 Array{Float64,2}:
# -0.755406 -0.521262 -0.34182 -1.04373
# -0.977335 1.04584 -0.382863 -0.672895
# -0.825122 0.329285 0.0253424 0.0978551
# :
# 0.0379912 0.417965 0.978723 0.898621
# 0.873382 1.47101 1.43973 1.98358
r = cor(d)
# 4×4 Array{Float64,2}:
# 1.0 0.3 0.2 0.4
# 0.3 1.0 0.1 0.5
# 0.2 0.1 1.0 0.6
# 0.4 0.5 0.6 1.0

# 1 変数混合データ
using Random
Random.seed!(123);
mu = [50., 80, 20] # 実数ベクトルで指定する
sigma = [10., 5, 3]
times = [1, 2, 3] # 生成割合（n の倍数）
n = 1000
x, g = mix(n, mu, sigma, times=times)
# 結果の確認
using DataFrames
df = DataFrame(x=x, g=g)
first(df, 5)
# 5 rows × 2 columns
#
# x g
# Float64 Int64
# 1 79.592 2
# 2 79.3002 2
# 3 61.47 1
# 4 22.2441 3
# 5 17.5919 3
gd = groupby(df, :g);
sort!(combine(gd, :x => mean, :x=> std, nrow), :g)
# 3 rows × 4 columns
#
# g x_mean x_std nrow
# Int64 Float64 Float64 Int64
# 1 1 51.0281 10.567 147
# 2 2 79.9744 4.91195 339
# 3 3 19.9814 2.95802 514

# 2 変数混合データ
using Random
Random.seed!(123);
mu = [50. 60; 45 50; 65 48] # 実数ベクトルで指定する
sigma = [8. 3; 7 4; 2 2]
r = [0.8, 0.6, 0.3] # 下三角行列
times = [6, 3, 1] # 生成割合（n の倍数）
n = 1000
x, y, g = mix(n, mu, sigma, r, times=times);
using DataFrames
df = DataFrame(x=x, y=y, g=g)
first(df, 5)
# 5 rows × 3 columns
#
# x y g
# Float64 Float64 Int64
# 1 62.5362 45.686 3
# 2 38.2142 54.0282 1
# 3 44.9595 56.8139 1
# 4 35.8422 42.2224 2
# 5 45.954 51.725 2
gd = groupby(df, :g)
res = combine(gd, :x => mean, :x=> std, [:x, :y] => cor, nrow)
sort!(res, :g)
# 3 rows × 5 columns
#
# g x_mean x_std x_y_cor nrow
# Int64 Float64 Float64 Float64 Int64
# 1 1 49.7517 8.0521 0.785667 593
# 2 2 44.6235 6.88509 0.560334 313
# 3 3 65.0047 1.94801 0.218032 94

頑張ってます。クリックお願いします。

記事一覧 | 画像一覧 | フォロワー一覧 | フォトチャンネル一覧

PVアクセスランキングにほんブログ村

プロフィール

フォロー中フォローするフォローする

バックナンバー

2025年02月

2025年01月

2024年12月

2024年11月

2024年10月

2024年09月

2024年08月

2024年07月

2024年06月

2024年05月

2024年04月

2024年03月

2024年02月

2024年01月

2023年12月

2023年11月

2023年10月

2023年09月

2023年08月

2023年07月

2023年06月

2023年05月

2023年04月

2023年03月

2023年02月

2023年01月

2022年12月

2022年11月

2022年10月

2022年09月

2022年08月

2022年07月

2022年06月

2022年05月

2022年04月

2022年03月

2022年02月

2022年01月

2021年12月

2021年11月

2021年10月

2021年09月

2021年08月

2021年07月

2021年06月

2021年05月

2021年04月

2021年03月

2021年02月

2021年01月

2020年12月

2020年11月

2020年10月

2020年09月

2020年08月

2020年07月

2020年06月

2020年05月

2020年04月

2020年03月

2020年02月

2020年01月

2019年12月

2019年11月

2019年09月

2019年08月

2019年07月

2019年06月

2019年04月

2018年12月

2018年09月

2018年08月

2018年07月

2018年06月

2018年05月

2018年03月

2018年02月

2018年01月

2017年12月

2017年11月

2017年10月

2017年09月

2017年08月

2017年07月

2017年06月

2017年05月

2017年04月

2017年03月

2017年02月

2017年01月

2016年12月

2016年11月

2016年10月

2016年09月

2016年08月

2016年07月

2016年06月

2016年05月

2016年04月

2016年03月

2016年02月

2016年01月

2015年12月

2015年11月

2015年10月

2015年09月

2015年08月

2015年07月

2015年06月

2015年05月

2015年04月

2015年03月

2015年02月

2015年01月

2014年12月

2014年11月

2014年10月

2014年09月

2014年08月

2014年07月

2014年06月

2014年05月

2014年04月

2014年03月

2014年02月

2014年01月

2013年12月

2013年11月

2013年10月

2013年09月

2013年08月

2013年07月

2013年06月

2013年05月

2013年04月

2013年03月

2013年02月

2013年01月

2012年12月

2012年11月

2012年10月

2012年09月

2012年08月

2012年07月

2012年06月

2012年05月

2012年04月

2012年03月

2012年02月

2012年01月

2011年12月

2011年11月

2011年10月

2011年09月

2011年07月

2011年06月

2011年05月

2011年04月

2011年03月

2011年02月

2011年01月

2010年09月

2010年08月

2010年07月

カレンダー

2021年3月
日	月	火	水	木	金	土
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

前月

次月

goo blog お知らせ

	ブログを読むだけ。毎月の訪問日数に応じてポイント進呈
	【コメント募集中】「交通系IC」は何を使っていますか？
	訪問者数に応じてdポイント最大1,000pt当たる！
	dポイントが当たる！無料『毎日くじ』