贝叶斯网络R语言学习笔记1

本文主要是介绍贝叶斯网络R语言学习笔记1，对大家解决编程问题具有一定的参考价值，需要的程序猿们随着小编来一起学习吧！

贝叶斯网络R语言学习笔记1

2021年7月19日19:54:12

一、创建贝叶斯网络结构

1、创建空网络

贝叶斯网络的图结构存储在bn对象中，可以通过三种表示来创建bn对象，即the arc set of the graph, its adjacency matrix or a model formula（边集、邻接矩阵、模型公式）。此外，可以通过empty.graph() 和random.graph()函数创建空网络结构或随机网络结构。

# 产生空网络
library(bnlearn)
e = empty.graph(LETTERS[1:6])
e
class(e)
# 结果
  Random/Generated Bayesian network

  model:
   [A][B][C][D][E][F] 
  nodes:                                 6 
  arcs:                                  0 
    undirected arcs:                     0 
    directed arcs:                       0 
  average markov blanket size:           0.00 
  average neighbourhood size:            0.00 
  average branching factor:              0.00 

  generation algorithm:                  Empty 
 
 > class(e)
[1] "bn"

通常empty.graph()默认只产生一个bn对象，可以指定参数num，使之产生一批网络对象。

empty.graph(LETTERS[1:6], num = 2) #此处指定num = 2，产生两个bn对象

2、创建网络结构

①特定的边集

要产生边集，首先需要指定一个边集的矩阵。以下代码产生一个矩阵，两列，按行填充，行名为from，to。

arc.set = matrix(c("A", "C", "B", "F", "C", "F"),
            	ncol = 2, byrow = TRUE,
          		dimnames = list(NULL, c("from", "to")))
arc.set

     from to 
[1,] "A"  "C"
[2,] "B"  "F"
[3,] "C"  "F"

将产生的边集arc.set传给bn对象使用arcs()函数

arcs(e) = arc.set
e
  Random/Generated Bayesian network

  model:
   [A][B][D][E][C|A][F|B:C] 
  nodes:                                 6 
  arcs:                                  3 
    undirected arcs:                     0 
    directed arcs:                       3 
  average markov blanket size:           1.33 
  average neighbourhood size:            1.00 
  average branching factor:              0.50 

  generation algorithm:                  Empty

对于arcs()函数，其会对边集做一定的检查，来看是否符合上下条件。

边的标签要和图里的标签一致

bogus = matrix(c("X", "Y", "W", "Z"),
               ncol = 2, byrow = TRUE,
               dimnames = list(NULL, c("from", "to")))
bogus
     from to 
[1,] "X"  "Y"
[2,] "W"  "Z"

将bogus传给bn对象e，会出现报错

arcs(e) = bogus

Error in check.arcs(value, nodes = names(x$nodes)) : 
  node(s) 'X' 'W' 'Y' 'Z' not present in the graph.

不能引入环(e.g. A → B → C → A) ，除非我们设置ignore.cycles = TURE

cycle = matrix(c("A", "C", "C", "B", "B", "A"),
               ncol = 2, byrow = TRUE,
               dimnames = list(NULL, c("from", "to")))
cycle

     from to 
[1,] "A"  "C"
[2,] "C"  "B"
[3,] "B"  "A"

arcs(e) = cycle

Error in `arcs<-`(`*tmp*`, value = c("A", "C", "B", "C", "B", "A")) : 
  the specified network contains cycles.

arcs(e,ignore.cycles = TRUE) = cycle

Error in `arcs<-`(`*tmp*`, ignore.cycles = TRUE, value = c("A", "C", "B",  : 
  参数没有用(ignore.cycles = TRUE)

> acyclic(e) # check whether the graph is acyclic/completely directed.检测是否无环
[1] TRUE

不能引入圈 (e.g. A → A).

> loops = matrix(c("A", "A", "B", "B", "C", "D"),
-                ncol = 2, byrow = TRUE,
-                dimnames = list(NULL, c("from", "to")))
> loops
    from to 
[1,] "A"  "A"
[2,] "B"  "B"
[3,] "C"  "D"
> arcs(e) = loops
Error in check.arcs(value, nodes = names(x$nodes)) : 
 invalid arcs that are actually loops:
  A -> A 
  B -> B

可以通过在边集中包括双向边引入非直接边(e.g. A → B and B → A)，保证双向均未引入环，设置ignore.cycles会覆盖这个检查。

> edges = matrix(c("A", "B", "B", "A", "C", "D"),
+                  ncol = 2, byrow = TRUE,
+                  dimnames = list(NULL, c("from", "to")))
> edges
     from to 
[1,] "A"  "B"
[2,] "B"  "A"
[3,] "C"  "D"
> arcs(e) = edges
> e
  Random/Generated Bayesian network

  model:
    [partially directed graph]
  nodes:                                 6 
  arcs:                                  2 
    undirected arcs:                     1 #这里有个非直接环弧 
    directed arcs:                       1 
  average markov blanket size:           0.67 
  average neighbourhood size:            0.67 
  average branching factor:              0.17 

  generation algorithm:                  Empty

②特定的的邻接矩阵

使用amat()函数可以用矩阵来产生图结构。
我们首先需要做的是产生一个矩阵，矩阵的元素为0/1整数（注意这里使用0L和1L，0和1也是可以的，但是会占更多的内存，这在大数据集尤为重要），1L就代表存在一条弧。

> adj = matrix(0L, ncol = 6, nrow = 6,
+              dimnames = list(LETTERS[1:6], LETTERS[1:6]))
> adj
  A B C D E F
A 0 0 0 0 0 0
B 0 0 0 0 0 0
C 0 0 0 0 0 0
D 0 0 0 0 0 0
E 0 0 0 0 0 0
F 0 0 0 0 0 0
> adj["A", "C"] = 1L
> adj["B", "F"] = 1L
> adj["C", "F"] = 1L
> adj["D", "E"] = 1L
> adj["A", "E"] = 1L
> adj
  A B C D E F
A 0 0 1 0 1 0
B 0 0 0 0 0 1
C 0 0 0 0 0 1
D 0 0 0 0 1 0
E 0 0 0 0 0 0
F 0 0 0 0 0 0

接着使用amat()函数将弧（arcs）引入到图中

> amat(e) = adj
> e

  Random/Generated Bayesian network

  model:
   [A][B][D][C|A][E|A:D][F|B:C] 
  nodes:                                 6 
  arcs:                                  5 
    undirected arcs:                     0 
    directed arcs:                       5 
  average markov blanket size:           2.33 
  average neighbourhood size:            1.67 
  average branching factor:              0.83 

  generation algorithm:                  Empty

注意：amat()函数和arcs()函数一样，都会检查错误的图结构

③特定的模型公式

模型公式的格式来源于deal包，定义如下

每个节点在方括号中
每个节点的父节点在列在管道符后（级竖杆符号|），父节点间用冒号:隔开，但是都是在一个方括号中
可以采用两种方式来定义图结构

使用`model2network()`函数，不用先产生空图

> model2network("[A][C][B|A][D|C][F|A:B:C][E|F]")

  Random/Generated Bayesian network

  model:
   [A][C][B|A][D|C][F|A:B:C][E|F] 
  nodes:                                 6 
  arcs:                                  6 
    undirected arcs:                     0 
    directed arcs:                       6 
  average markov blanket size:           2.67 
  average neighbourhood size:            2.00 
  average branching factor:              1.00 

  generation algorithm:                  Empty

使用`modelstring()`函数和一个已存在的`bn`对象，类似之前的`arcs()` 和`amat()`

> modelstring(e) = "[A][C][B|A][D|C][F|A:B:C][E|F]"
> e

  Random/Generated Bayesian network

  model:
   [A][C][B|A][D|C][F|A:B:C][E|F] 
  nodes:                                 6 
  arcs:                                  6 
    undirected arcs:                     0 
    directed arcs:                       6 
  average markov blanket size:           2.67 
  average neighbourhood size:            2.00 
  average branching factor:              1.00 

  generation algorithm:                  Empty

注意：model2network()和modelstring()函数同amat()函数、arcs()函数一样，都会检查错误的图结构。

3、创建一个或多个随机结构

①特定节点顺序

默认情况下，random.graph()函数产生的图和提供的节点顺序一致；在每个弧中，尾部节点排在头部节点的前面。弧是独立的样本，其包含概率由prob参数指定，prob参数的默认值设置为使图中平均有与节点一样多的弧。

> random.graph(LETTERS[1:6], prob = 0.1) #顺序是从A到F

  Random/Generated Bayesian network

  model:
   [A][B][D][E][C|B][F|C]   #这里可以看出，[B]是排在[C|B]前面的
  nodes:                                 6 
  arcs:                                  2 
    undirected arcs:                     0 
    directed arcs:                       2 
  average markov blanket size:           0.67 
  average neighbourhood size:            0.67 
  average branching factor:              0.33 

  generation algorithm:                  Full Ordering 
  arc sampling probability:              0.1

和empty.graph()函数一样，我们也可以指定num 参数来产生一批图，这样就会产生一个列表，可以使用lapply() 来很好地批量计算。

②以均匀概率从连通的有向无环图空间中抽样

除了使用默认的方法method = "ordered"，还可以使用method = "ic-dag"来从具有均匀概率的连通的有向无环图空间中抽样，使用 Ide & Cozman MCMC sampler
请注意，建议的老化长度（length of the burn-in scales ？）与节点数量成二次函数关系，因此使用num批量生成多个图要比一次生成一个图高效得多。

> random.graph(LETTERS[1:6], num = 2, method = "ic-dag")
[[1]]

  Random/Generated Bayesian network

  model:
   [C][E][A|E][B|A:E][D|B:E][F|A:C:D] 
  nodes:                                 6 
  arcs:                                  8 
    undirected arcs:                     0 
    directed arcs:                       8 
  average markov blanket size:           3.67 
  average neighbourhood size:            2.67 
  average branching factor:              1.33 

  generation algorithm:                  Ide & Cozman's Multiconnected DAGs 
  burn in length:                        216 
  maximum in-degree:                     Inf 
  maximum out-degree:                    Inf 
  maximum degree:                        Inf 


[[2]]

  Random/Generated Bayesian network

  model:
   [E][A|E][B|A:E][C|A][D|B:E][F|A:C:D] 
  nodes:                                 6 
  arcs:                                  9 
    undirected arcs:                     0 
    directed arcs:                       9 
  average markov blanket size:           3.67 
  average neighbourhood size:            3.00 
  average branching factor:              1.50 

  generation algorithm:                  Ide & Cozman's Multiconnected DAGs 
  burn in length:                        216 
  maximum in-degree:                     Inf 
  maximum out-degree:                    Inf 
  maximum degree:                        Inf

此方法对生成的图的结构接受几个可选的约束，如下面的参数所指定:

burn.in: the number of iterations for the algorithm to converge to a stationary (and uniform) probability distribution.收敛迭代的次数
every: return only one graph every number of steps instead of all the generated graphs. High values of every result in a more diverse set.每个步骤只返回一个图，而不是所有生成的图。every 的高值会导致更多样化的集合。
max.degree: the maximum degree of a node.
max.in.degree: the maximum in-degree.
max.out.degree: the maximum out-degree.
举例

random.graph(LETTERS[1:6], num = 2, method = "ic-dag", burn.in = 10^4,
+              every = 50, max.degree = 3)
[[1]]

  Random/Generated Bayesian network

  model:
   [B][F|B][A|F][C|A:F][D|A:B][E|B:C] 
  nodes:                                 6 
  arcs:                                  8 
    undirected arcs:                     0 
    directed arcs:                       8 
  average markov blanket size:           3.33 
  average neighbourhood size:            2.67 
  average branching factor:              1.33 

  generation algorithm:                  Ide & Cozman's Multiconnected DAGs 
  burn in length:                        10000 
  maximum in-degree:                     Inf 
  maximum out-degree:                    Inf 
  maximum degree:                        3 


[[2]]

  Random/Generated Bayesian network

  model:
   [A][C|A][F|C][D|F][B|D:F][E|B:C] 
  nodes:                                 6 
  arcs:                                  7 
    undirected arcs:                     0 
    directed arcs:                       7 
  average markov blanket size:           2.67 
  average neighbourhood size:            2.33 
  average branching factor:              1.17 

  generation algorithm:                  Ide & Cozman's Multiconnected DAGs 
  burn in length:                        10000 
  maximum in-degree:                     Inf 
  maximum out-degree:                    Inf 
  maximum degree:                        3

③以均匀概率从有向无环图空间抽样

Melançon的 MCMC算法以均匀概率从有向无环图空间抽样（不需要连接not necessarily connected），指定method = "melancon"即可。

> random.graph(LETTERS[1:6], method = "melancon")

  Random/Generated Bayesian network

  model:
   [A][B|A][F|A][C|A:B][E|B][D|B:C:E] 
  nodes:                                 6 
  arcs:                                  8 
    undirected arcs:                     0 
    directed arcs:                       8 
  average markov blanket size:           3.00 
  average neighbourhood size:            2.67 
  average branching factor:              1.33 

  generation algorithm:                  Melancon's Uniform Probability DAGs 
  burn in length:                        216 
  maximum in-degree:                     Inf 
  maximum out-degree:                    Inf 
  maximum degree:                        Inf

至于可选的参数，它与Ide & Cozman算法是相同的。

4、总结

创建三种网络：空网络、特定网络、随机网络
创建特定网络网络的三种方法：边集、邻接矩阵、模型公式
存在的问题：对随机网络不太懂，为什么要创建随机网络，是给我们看看有这些节点可以产生什么样的结构吗？
随机网络的取样是啥意思？

这篇关于贝叶斯网络R语言学习笔记1的文章就介绍到这儿，希望我们推荐的文章对大家有所帮助，也希望大家多多支持为之网！

Java教程