分享

Open Source Bayesian Network Structure Learning API, Free

 lzqkean 2014-09-04

I introduce a new open source Bayesian network structure learning API called, Free-BN (FBN). FBN is licensed under the Apache 2.0 license. Following, I’ll scratch the surface of FBN and walk you through an example of using FBN.

Why another Bayesian network structure learning API?

While working on my dissertation, I had a tough time looking for open source APIs for constraint-based structural learning of Bayesian networks. The few open source APIs I found dealing with Bayesian networks written in Java were:

This page here provides a long list of Bayesian network related software/APIs. One of the fruition of my dissertation (though not reported or included in my doctoral dissertation) was the development of FBN for Bayesian network structural learning written in Java.

Some features of FBN

So, what can FBN currently do (related to Bayesian networks)? Here’s a non-exhaustive list.

  • Structural learning
    • constraint-based (PC, TPDA, PDFS)
    • search-and-scoring (K2)
    • mixed-type (SC*, CrUMB+-GA)
  • Exact inference (using PPTC algorithm)
  • Logic sampling

Working with FBN should be relatively easy. It’s meant to be an API (not an application). Currently, FBN can only learn from database sources, although, you could extend the API to learn from flat files. FBN works primarily based on the design of inversion of control (IOC) or dependency injection (DI) and uses the Spring Framework to achieve that design. Using DI and working primarily with interfaces mean the API can easily be extended to include other structure learning algorithms.

Walkthrough preliminaries

Before I perform the walkthrough on how to use FBN, let’s provide some background information. The dataset is generated using logic sampling and the Bayesian network reported by (Cooper 1992). This Bayesian network has three variables: X1, X2, and X3. The structure of this Bayesian network is a serial connection: X1 -> X2 -> X3. The local probability models reported are shown in the table below.

P(X1=present)=0.6 P(X1=absent)=0.4
P(X2=present|X1=present)=0.8 P(X2=absent|X1=absent)=0.2
P(X2=present|X1=absent)=0.3 P(X2=absent|X1=absent)=0.7
P(X3=present|X2=present)=0.9 P(X3=absent|X2=absent)=0.1
P(X3=present|X2=absent)=0.15 P(X3=absent|X2=absent)=0.85

The algorithm to learn the Bayesian network from the data will be Three Phase Dependency Analysis (TPDA) (Cheng 2002). TPDA is a constraint-based Bayesian network structure learning algorithm. It has three phases: drafting, thickening, and thinning. TPDA is implemented in FBN and will be used to learn the Bayesian network structure from the data generated using logic sampling.

Setup your data source

FBN takes as input data stored in a database with JDBC drivers. Some examples of such databases are Oracle, MS SQL Server, and MySQL. In this walkthrough, I’ll be showing examples using MySQL.

The data must be stored in two separate tables: one table to specify the variables (denote this as vtable), and one table to hold the actual data (denote this as dtable). The vtable should have the following fields: name, type, and domain. An example of a DDL for a vtable using MySQL is:

1
2
3
4
5
create table vtable (
 name varchar(10),
 domain varchar(20),
 type varchar(10)
);

Since we have three binary variables (x1, x2, and x3), we have to insert values into the vtable to describe these variables.

1
2
3
insert into vtable(name, domain, type) values('x1','absent,present', '1');
insert into vtable(name, domain, type) values('x2','absent,present', '1');
insert into vtable(name, domain, type) values('x3','absent,present', '1');

The type is set to 1 for categorical variables. For all types see net.fdm.data.intf.Variable.

Now, we have to create a table to hold the data. The following is a sample MySQL DDL to create such a table.

1
2
3
4
5
create table dtable (
 x1 varchar(10),
 x2 varchar(10),
 x3 varchar(10)
);

Now that we have created the dtable, insert data into it.

1
2
3
4
5
6
7
insert into dtable(x1,x2,x3) values('present','present','present');
insert into dtable(x1,x2,x3) values('present','present','present');
insert into dtable(x1,x2,x3) values('present','present','present');
...
insert into dtable(x1,x2,x3) values('present','absent','absent');
insert into dtable(x1,x2,x3) values('absent','absent','absent');
insert into dtable(x1,x2,x3) values('absent','absent','absent');

If you download the source code for FBN, the MySQL scripts are located in demo/mysql.sql. The source code to create the Bayesian network and perform logic sampling is located in demo/com/vang/jee/fbn/demo/DataGenerator.java.

Set up your structure learning algorithm

Now it’s time to setup our structure learning algorithm of choice. We can do so in code (using Java), or, the better alternative, is to “wire up” the algorithm using Spring and XML files. The following code shows how to wire up the TPDA structure learning algorithm using Java.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
/**
 * Copyright 2009 Jee Vang
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 
 *  Unless required by applicable law or agreed to in writing, software
 *  distributed under the License is distributed on an "AS IS" BASIS,
 *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 *  See the License for the specific language governing permissions and
 *  limitations under the License.
 */
package com.vang.jee.fbn.demo;
import java.util.Iterator;
import javax.sql.DataSource;
import net.fbn.data.condcorr.impl.MiCondCorr;
import net.fbn.data.condcorr.impl.MiCondIndepTestImpl;
import net.fbn.data.corr.impl.MutualInformation;
import net.fbn.graph.algo.impl.MWSTImpl;
import net.fbn.graph.factory.impl.UnGraphFactoryImpl;
import net.fbn.graph.intf.Graph;
import net.fbn.learner.struct.cb.tpda.impl.DSeparateA;
import net.fbn.learner.struct.cb.tpda.impl.DSeparateB;
import net.fbn.learner.struct.cb.tpda.impl.SimpleOrientArcsImpl;
import net.fbn.learner.struct.cb.tpda.impl.StDraftImpl;
import net.fbn.learner.struct.cb.tpda.impl.StTPDALearnerImpl;
import net.fbn.learner.struct.cb.tpda.impl.StThickenImpl;
import net.fbn.learner.struct.cb.tpda.impl.StThinImpl;
import net.fbn.learner.struct.intf.StructureLearner;
import net.fdm.data.dao.impl.VariableDaoImpl;
import net.fdm.data.dao.intf.VariableDao;
import net.fdm.data.intf.Variable;
import org.apache.commons.dbcp.BasicDataSource;
/**
 * Demo for structure learning using TPDA.
 * @author Jee Vang
 *
 */
public class TestLearning {
    private DataSource _dataSource;
    private VariableDao _variableDao;
    private StructureLearner _structureLearner;
     
    /**
     * Gets a structure learner.
     * @return StructureLearner.
     */
    public StructureLearner getStructureLearner() {
        if(null == _structureLearner) {
            //set the algorithm to perform TPDA drafting phase
            StDraftImpl draft = new StDraftImpl();
            draft.setMwstAlgo(new MWSTImpl());
            draft.setUnGraphFactory(new UnGraphFactoryImpl());
             
            //these are some classes used to help TPDA proceed
            double delta = 0.01d;
            double theta = 0.001d;
            double epsilon = 0.001d;
             
            MutualInformation mi = new MutualInformation();
            mi.setVariableDao(getVariableDao());
             
            MiCondCorr miCondCorr = new MiCondCorr();
            miCondCorr.setVariableDao(getVariableDao());
            miCondCorr.setCorrMetric(mi);
             
            MiCondIndepTestImpl condIndepTest = new MiCondIndepTestImpl();
            condIndepTest.setVariableDao(getVariableDao());
            condIndepTest.setCondCorrMetric(miCondCorr);
            condIndepTest.setDelta(delta);
             
            DSeparateA dSeparateA = new DSeparateA();
            dSeparateA.setVariableDao(getVariableDao());
            dSeparateA.setCondIndepTest(condIndepTest);
             
            DSeparateB dSeparateB = new DSeparateB();
            dSeparateB.setVariableDao(getVariableDao());
            dSeparateB.setEpsilon(epsilon);
            dSeparateB.setCondIndepTest(condIndepTest);
             
            SimpleOrientArcsImpl orientArcs = new SimpleOrientArcsImpl();
            orientArcs.setCondIndepTest(condIndepTest);
            orientArcs.setEpsilon(epsilon);
             
            //set the algorithm to perform the TPDA thickening phase
            StThickenImpl thicken = new StThickenImpl();
            thicken.setDSeparate(dSeparateA);
             
            //set the algorithm to perform the TPDA thinning phase
            StThinImpl thin = new StThinImpl();
            thin.setDSeparateA(dSeparateA);
            thin.setDSeparateB(dSeparateB);
             
            //now wire up tpda
            StTPDALearnerImpl tpda = new StTPDALearnerImpl();
            tpda.setDelta(delta);
            tpda.setTheta(theta);
            tpda.setCorrMetric(mi);
            tpda.setRemoveInsignificantCorrelations(true);
            tpda.setDraft(draft);
            tpda.setThin(thin);
            tpda.setThicken(thicken);
            tpda.setOrientArcs(orientArcs);
             
            _structureLearner = tpda;
        }
        return _structureLearner;
    }
     
    /**
     * Gets variable data access object.
     * @return VariableDao.
     */
    public VariableDao getVariableDao() {
        if(null == _variableDao) {
            VariableDaoImpl variableDao = new VariableDaoImpl();
            variableDao.setDataSource(getDataSource());
            variableDao.setDataTable("dtable");
            variableDao.setDomainColumnName("domain");
            variableDao.setDomainDelimiter(",");
            variableDao.setTypeColumnName("type");
            variableDao.setVarTable("vtable");
             
            _variableDao = variableDao;
        }
         
        return _variableDao;
    }
     
    /**
     * Gets a data source.
     * @return DataSource.
     */
    public DataSource getDataSource() {
        if(null == _dataSource) {
            String driverClassName = "com.mysql.jdbc.Driver";
            String url = "jdbc:mysql://localhost/bn?user=jee&password=jee";
             
            BasicDataSource dataSource = new BasicDataSource();
            dataSource.setDriverClassName(driverClassName);
            dataSource.setUrl(url);
             
            _dataSource = dataSource;
        }
         
        return _dataSource;
    }
     
    /**
     * Gets an array of variables.
     * @return Array of Variable.
     * @throws Exception
     */
    public Variable[] getVariables() throws Exception {
        VariableDao variableDao = getVariableDao();
        Variable[] variables = variableDao.getVariables();
        return variables;
    }
    /**
     * Main method.
     * @param args
     * @throws Exception
     */
    public static void main(String[] args) throws Exception {
        TestLearning testLearning = new TestLearning();
        Variable[] variables = testLearning.getVariables();
        StructureLearner learner = testLearning.getStructureLearner();
        Graph graph = learner.learn(variables);
        System.out.println("NODES");
        for(Iterator it = graph.getNodes().iterator(); it.hasNext(); ) {
            System.out.println(it.next());
        }
         
        System.out.println("ARCS");
        for(Iterator it = graph.getArcs().iterator(); it.hasNext(); ) {
            System.out.println(it.next());
        }
    }
}

The getDataSource method gets a DataSource pointing to your database (in this MySQL instance). The getVariableDao method provides a reference to the VariableDao object that has access to the variable and data. The getStructureLearner method wires up the TPDA implementation. In the main method, you get a reference to all the variables for which you want to perform Bayesian network structure learning and instance of the structure learner. You then pass this array of variables into the learner to produce a Graph. The nodes in the graph should be: x1, x2, x3. The arcs in this graph is: x1–x2 and x2–x3. Therefore, the structure is: x1–x2–x3. Clearly, this graph structure is an undirected graph, and thus cannot satisfy the directed acyclic graph (DAG) requirement of a Bayesian network. The source code for this learning example is located in the source distribution under demo/src/com/vang/jee/fbn/demo/TestLearning.java.

How to get the source and dependencies?

The FBN API is dependent on two other minor projects called, Free-Display and Free-GA (FGA). The Free-Dispaly API is used to visualize the Bayesian networks, while the FGA API is used for search-and-scoring methods for Bayesian network structure learning. You may download all these APIs, and they are all licensed under the Apache 2.0 license.

Free-BN source
Free-BN binary
Free-Disp source
Free-Disp binary
Free-GA source
Free-GA binary

I hope this API helps you in your research. Happy research, data mining, and programming! Cheers! Sib ntsib dua mog!

References

  • G. F. Cooper and E. Herskovitz. “A Bayesian method for the induction of probabilistic networks from data,” Machine Learning, vol. 9, 1992, pp. 309–347.
  • J. Cheng, R. Greiner, J. Kelly, D. A. Bell, and W. Liu. “Learning Bayesian Networks from Data: an Information-Theory Based Approach,” The Artificial Intelligence Journal, vol. 137, 2002, pp. 43–90.

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多