2021年7月19日19:54:12
贝叶斯网络的图结构存储在bn对象中,可以通过三种表示来创建bn对象,即the arc set of the graph, its adjacency matrix or a model formula(边集、邻接矩阵、模型公式)。此外,可以通过empty.graph()
和random.graph()
函数创建空网络结构或随机网络结构。
# 产生空网络 library(bnlearn) e = empty.graph(LETTERS[1:6]) e class(e) # 结果 Random/Generated Bayesian network model: [A][B][C][D][E][F] nodes: 6 arcs: 0 undirected arcs: 0 directed arcs: 0 average markov blanket size: 0.00 average neighbourhood size: 0.00 average branching factor: 0.00 generation algorithm: Empty > class(e) [1] "bn"
通常empty.graph()默认只产生一个bn
对象,可以指定参数num,使之产生一批网络对象。
empty.graph(LETTERS[1:6], num = 2) #此处指定num = 2,产生两个bn对象
要产生边集,首先需要指定一个边集的矩阵。以下代码产生一个矩阵,两列,按行填充,行名为from,to。
arc.set = matrix(c("A", "C", "B", "F", "C", "F"), ncol = 2, byrow = TRUE, dimnames = list(NULL, c("from", "to"))) arc.set from to [1,] "A" "C" [2,] "B" "F" [3,] "C" "F"
将产生的边集arc.set传给bn
对象使用arcs()
函数
arcs(e) = arc.set e Random/Generated Bayesian network model: [A][B][D][E][C|A][F|B:C] nodes: 6 arcs: 3 undirected arcs: 0 directed arcs: 3 average markov blanket size: 1.33 average neighbourhood size: 1.00 average branching factor: 0.50 generation algorithm: Empty
对于arcs()
函数,其会对边集做一定的检查,来看是否符合上下条件。
bogus = matrix(c("X", "Y", "W", "Z"), ncol = 2, byrow = TRUE, dimnames = list(NULL, c("from", "to"))) bogus from to [1,] "X" "Y" [2,] "W" "Z"
将bogus
传给bn
对象e
,会出现报错
arcs(e) = bogus Error in check.arcs(value, nodes = names(x$nodes)) : node(s) 'X' 'W' 'Y' 'Z' not present in the graph.
ignore.cycles = TURE
cycle = matrix(c("A", "C", "C", "B", "B", "A"), ncol = 2, byrow = TRUE, dimnames = list(NULL, c("from", "to"))) cycle from to [1,] "A" "C" [2,] "C" "B" [3,] "B" "A" arcs(e) = cycle Error in `arcs<-`(`*tmp*`, value = c("A", "C", "B", "C", "B", "A")) : the specified network contains cycles. arcs(e,ignore.cycles = TRUE) = cycle Error in `arcs<-`(`*tmp*`, ignore.cycles = TRUE, value = c("A", "C", "B", : 参数没有用(ignore.cycles = TRUE) > acyclic(e) # check whether the graph is acyclic/completely directed.检测是否无环 [1] TRUE
> loops = matrix(c("A", "A", "B", "B", "C", "D"), - ncol = 2, byrow = TRUE, - dimnames = list(NULL, c("from", "to"))) > loops from to [1,] "A" "A" [2,] "B" "B" [3,] "C" "D" > arcs(e) = loops Error in check.arcs(value, nodes = names(x$nodes)) : invalid arcs that are actually loops: A -> A B -> B
ignore.cycles
会覆盖这个检查。> edges = matrix(c("A", "B", "B", "A", "C", "D"), + ncol = 2, byrow = TRUE, + dimnames = list(NULL, c("from", "to"))) > edges from to [1,] "A" "B" [2,] "B" "A" [3,] "C" "D" > arcs(e) = edges > e Random/Generated Bayesian network model: [partially directed graph] nodes: 6 arcs: 2 undirected arcs: 1 #这里有个非直接环弧 directed arcs: 1 average markov blanket size: 0.67 average neighbourhood size: 0.67 average branching factor: 0.17 generation algorithm: Empty
使用amat()
函数可以用矩阵来产生图结构。
我们首先需要做的是产生一个矩阵,矩阵的元素为0/1整数(注意这里使用0L
和1L
,0和1也是可以的,但是会占更多的内存,这在大数据集尤为重要),1L就代表存在一条弧。
> adj = matrix(0L, ncol = 6, nrow = 6, + dimnames = list(LETTERS[1:6], LETTERS[1:6])) > adj A B C D E F A 0 0 0 0 0 0 B 0 0 0 0 0 0 C 0 0 0 0 0 0 D 0 0 0 0 0 0 E 0 0 0 0 0 0 F 0 0 0 0 0 0 > adj["A", "C"] = 1L > adj["B", "F"] = 1L > adj["C", "F"] = 1L > adj["D", "E"] = 1L > adj["A", "E"] = 1L > adj A B C D E F A 0 0 1 0 1 0 B 0 0 0 0 0 1 C 0 0 0 0 0 1 D 0 0 0 0 1 0 E 0 0 0 0 0 0 F 0 0 0 0 0 0
接着使用amat()
函数将弧(arcs)引入到图中
> amat(e) = adj > e Random/Generated Bayesian network model: [A][B][D][C|A][E|A:D][F|B:C] nodes: 6 arcs: 5 undirected arcs: 0 directed arcs: 5 average markov blanket size: 2.33 average neighbourhood size: 1.67 average branching factor: 0.83 generation algorithm: Empty
注意:amat()
函数和arcs()
函数一样,都会检查错误的图结构
模型公式的格式来源于deal包,定义如下
|
),父节点间用冒号:
隔开,但是都是在一个方括号中model2network()
函数,不用先产生空图> model2network("[A][C][B|A][D|C][F|A:B:C][E|F]") Random/Generated Bayesian network model: [A][C][B|A][D|C][F|A:B:C][E|F] nodes: 6 arcs: 6 undirected arcs: 0 directed arcs: 6 average markov blanket size: 2.67 average neighbourhood size: 2.00 average branching factor: 1.00 generation algorithm: Empty
modelstring()
函数和一个已存在的bn
对象,类似之前的arcs()
和amat()
> modelstring(e) = "[A][C][B|A][D|C][F|A:B:C][E|F]" > e Random/Generated Bayesian network model: [A][C][B|A][D|C][F|A:B:C][E|F] nodes: 6 arcs: 6 undirected arcs: 0 directed arcs: 6 average markov blanket size: 2.67 average neighbourhood size: 2.00 average branching factor: 1.00 generation algorithm: Empty
注意:model2network()
和modelstring()
函数同amat()
函数、arcs()
函数一样,都会检查错误的图结构。
默认情况下,random.graph()
函数产生的图和提供的节点顺序一致;在每个弧中,尾部节点排在头部节点的前面。弧是独立的样本,其包含概率由prob
参数指定,prob
参数的默认值设置为使图中平均有与节点一样多的弧。
> random.graph(LETTERS[1:6], prob = 0.1) #顺序是从A到F Random/Generated Bayesian network model: [A][B][D][E][C|B][F|C] #这里可以看出,[B]是排在[C|B]前面的 nodes: 6 arcs: 2 undirected arcs: 0 directed arcs: 2 average markov blanket size: 0.67 average neighbourhood size: 0.67 average branching factor: 0.33 generation algorithm: Full Ordering arc sampling probability: 0.1
和empty.graph()
函数一样,我们也可以指定num
参数来产生一批图,这样就会产生一个列表,可以使用lapply()
来很好地批量计算。
除了使用默认的方法method = "ordered"
,还可以使用method = "ic-dag"
来从具有均匀概率的连通的有向无环图空间中抽样,使用 Ide & Cozman MCMC sampler
请注意,建议的老化长度(length of the burn-in scales ?)与节点数量成二次函数关系,因此使用num
批量生成多个图要比一次生成一个图高效得多。
> random.graph(LETTERS[1:6], num = 2, method = "ic-dag") [[1]] Random/Generated Bayesian network model: [C][E][A|E][B|A:E][D|B:E][F|A:C:D] nodes: 6 arcs: 8 undirected arcs: 0 directed arcs: 8 average markov blanket size: 3.67 average neighbourhood size: 2.67 average branching factor: 1.33 generation algorithm: Ide & Cozman's Multiconnected DAGs burn in length: 216 maximum in-degree: Inf maximum out-degree: Inf maximum degree: Inf [[2]] Random/Generated Bayesian network model: [E][A|E][B|A:E][C|A][D|B:E][F|A:C:D] nodes: 6 arcs: 9 undirected arcs: 0 directed arcs: 9 average markov blanket size: 3.67 average neighbourhood size: 3.00 average branching factor: 1.50 generation algorithm: Ide & Cozman's Multiconnected DAGs burn in length: 216 maximum in-degree: Inf maximum out-degree: Inf maximum degree: Inf
此方法对生成的图的结构接受几个可选的约束,如下面的参数所指定:
random.graph(LETTERS[1:6], num = 2, method = "ic-dag", burn.in = 10^4, + every = 50, max.degree = 3) [[1]] Random/Generated Bayesian network model: [B][F|B][A|F][C|A:F][D|A:B][E|B:C] nodes: 6 arcs: 8 undirected arcs: 0 directed arcs: 8 average markov blanket size: 3.33 average neighbourhood size: 2.67 average branching factor: 1.33 generation algorithm: Ide & Cozman's Multiconnected DAGs burn in length: 10000 maximum in-degree: Inf maximum out-degree: Inf maximum degree: 3 [[2]] Random/Generated Bayesian network model: [A][C|A][F|C][D|F][B|D:F][E|B:C] nodes: 6 arcs: 7 undirected arcs: 0 directed arcs: 7 average markov blanket size: 2.67 average neighbourhood size: 2.33 average branching factor: 1.17 generation algorithm: Ide & Cozman's Multiconnected DAGs burn in length: 10000 maximum in-degree: Inf maximum out-degree: Inf maximum degree: 3
Melançon的 MCMC算法以均匀概率从有向无环图空间抽样(不需要连接not necessarily connected),指定method = "melancon"
即可。
> random.graph(LETTERS[1:6], method = "melancon") Random/Generated Bayesian network model: [A][B|A][F|A][C|A:B][E|B][D|B:C:E] nodes: 6 arcs: 8 undirected arcs: 0 directed arcs: 8 average markov blanket size: 3.00 average neighbourhood size: 2.67 average branching factor: 1.33 generation algorithm: Melancon's Uniform Probability DAGs burn in length: 216 maximum in-degree: Inf maximum out-degree: Inf maximum degree: Inf
至于可选的参数,它与Ide & Cozman算法是相同的。