笔者近期在学习贝叶斯网络模型相关知识的时候,分别使用了Matlab、python、R、Netica、GeNIe、unbbayes等。这些软件各有千秋,但是R语言必须安利给大家,希望大家指正。
R语言
R是用于统计分析、绘图的语言和操作环境。R是属于GNU系统的一个自由、免费、源代码开放的软件,它是一个用于统计计算和统计制图的优秀工具。
那么为什么特别推荐R呢?
图美、上手快、效率高
安装R语言
R语言也是要安装的,不安装怎么用呢?目前按照操作习惯可以划分为两种:R Console和R stuido。
R Console
下载地址为:https://cran.r-,进入以后根据自己的电脑版本对号入座,此处不再赘述。安装完成后如图。
R stuido
下载地址: /ide,老规矩还是对号入座,可以发现有Desktop和Server两个版本,我们选择Desktop。这个笔者并没有安装,所以就不在这分享了。
贝叶斯网络模型包
bnlearn is an R package for learning the graphical structure of Bayesian networks, estimate their parameters and perform some useful inference. It was first released in , it has been been under continuous development for more than 10 years (and still going strong). To get started and install the latest development snapshot type.
安装bnlearn包
二者选其一即可,安装的过程中会提示选择服务器,随便选择一个服务器即可。
install.packages("bnlearn")
install.packages("/releases/bnlearn_latest.tar.gz")
导入bnlearn包
library("bnlearn")bn.boot
如果返回下图证明导入成功。
创建贝叶斯网络模型
空网络
e = empty.graph(LETTERS[1:6])class(e)e
第一个语句代表建立一个贝叶斯网络,第二个语句是查看变量内容,第三个语句是查看变量类型。
非空网络
cptA = matrix(c(0.4, 0.6), ncol = 2, dimnames = list(NULL, c("LOW", "HIGH")))cptB = matrix(c(0.8, 0.2), ncol = 2, dimnames = list(NULL, c("GOOD", "BAD")))cptC = c(0.5, 0.5, 0.4, 0.6, 0.3, 0.7, 0.2, 0.8)dim(cptC) = c(2, 2, 2)dimnames(cptC) = list("C" = c("TRUE", "FALSE"), "A" = c("LOW", "HIGH"),"B" = c("GOOD", "BAD"))net = model2network("[A][B][C|A:B]")dfit = custom.fit(net, dist = list(A = cptA, B = cptB, C = cptC))dfit
打印模型与参数
参数学习
data(learning.test)pdag = iamb(learning.test)score(set.arc(pdag, from = "A", to = "B"), learning.test)score(set.arc(pdag, from = "B", to = "A"), learning.test)fit = bn.fit(dag, learning.test)
学习结果如下
Bayesian network parametersParameters of node A (multinomial distribution)Conditional probability table:abc0.334 0.334 0.332Parameters of node B (multinomial distribution)Conditional probability table:AB abca 0.8561 0.4449 0.1149b 0.0252 0.2210 0.0945c 0.1187 0.3341 0.7906Parameters of node C (multinomial distribution)Conditional probability table:abc0.7434 0.2048 0.0518Parameters of node D (multinomial distribution)Conditional probability table:, , C = aAD abca 0.8008 0.0925 0.1053b 0.0902 0.8021 0.1117c 0.1089 0.1054 0.7830, , C = bAD abca 0.1808 0.8830 0.2470b 0.1328 0.0702 0.4939c 0.6864 0.0468 0.2591, , C = cAD abca 0.4286 0.3412 0.1333b 0.2024 0.3882 0.4444c 0.3690 0.2706 0.4222Parameters of node E (multinomial distribution)Conditional probability table:, , F = aBE abca 0.8052 0.2059 0.1194b 0.0974 0.1797 0.1145c 0.0974 0.6144 0.7661, , F = bBE abca 0.4005 0.3168 0.2376b 0.4903 0.3664 0.5067c 0.1092 0.3168 0.2557Parameters of node F (multinomial distribution)Conditional probability table:ab0.502 0.498
结构学习
data(learning.test)data(gaussian.test) = empty.graph(names(learning.test))modelstring() = "[A][C][F][B|A][D|A:C][E|B:F]" = empty.graph(names(gaussian.test))modelstring() = "[A][B][E][G][C|A:B][D|B][F|A:D:E:G]"score(, learning.test)score(, gaussian.test)score(, learning.test, type = "bic")score(, learning.test, type = "aic")score(, learning.test, type = "bde")
以上分别使用bic、aic、bde评分标准进行结构打分。
参考文献
[1] /examples/
[2] 李硕豪, 张军. 贝叶斯网络结构学习综述[J]. 计算机应用研究, (3).
[3] Scutari M . Bayesian network models for incomplete and dynamic data[J]. Stata Neerlandica, (7).
[4] Scutari M , Vitolo C , Tucker A . Learning Bayesian networks from big data with greedy search: computational complexity and efficient implementation[J]. Stats and Computing, , 29(5):1095-1108.
[5] Scutari M . Dirichlet Bayesian Network Scores and the Maximum Relative Entropy Principle[J]. Behaviormetrika, (11):337-362.