Dawn E. Holmes, Lakhmi C. Jain (Eds.)
Innovations in Machine Learning
Studies in Fuzziness and Soft Computing, Volume 194
Editor-in-chief
Prof. Janusz Kacprzyk
Systems Research Institute
Polish Academy of Sciences
ul. Newelska 6
01-447 Warsaw
Poland
E-mail: kacprzyk@ibspan.waw.pl
Further volumes of this series
can be found on our homepage:
springer.com
Vol. 179. Mircea Negoita,
Bernd Reusch (Eds.)
Real World Applications of Computational
Intelligence, 2005
ISBN 3-540-25006-9
Vol. 180. Wesley Chu,
Tsau Young Lin (Eds.)
Foundations and Advances in Data Mining,
2005
ISBN 3-540-25057-3
Vol. 181. Nadia Nedjah,
Luiza de Macedo Mourelle
Fuzzy Systems Engineering, 2005
ISBN 3-540-25322-X
Vol. 182. John N. Mordeson,
Kiran R. Bhutani, Azriel Rosenfeld
Fuzzy Group Theory, 2005
ISBN 3-540-25072-7
Vol. 183. Larry Bull, Tim Kovacs (Eds.)
Foundations of Learning Classifier Systems,
2005
ISBN 3-540-25073-5
Vol. 184. Barry G. Silverman, Ashlesha Jain,
Ajita Ichalkaranje, Lakhmi C. Jain (Eds.)
Intelligent Paradigms for Healthcare
Enterprises, 2005
ISBN 3-540-22903-5
Vol. 185. Spiros Sirmakessis (Ed.)
Knowledge Mining, 2005
ISBN 3-540-25070-0
Vol. 186. Radim Beˇlohlávek, Vilém
Vychodil
Fuzzy Equational Logic, 2005
ISBN 3-540-26254-7
Vol. 187. Zhong Li, Wolfgang A. Halang,
Guanrong Chen (Eds.)
Integration of Fuzzy Logic and Chaos
Theory, 2006
ISBN 3-540-26899-5
Vol. 188. James J. Buckley, Leonard J.
Jowers
Simulating Continuous Fuzzy Systems, 2006
ISBN 3-540-28455-9
Vol. 189. Hans Bandemer
Mathematics of Uncertainty, 2006
ISBN 3-540-28457-5
Vol. 190. Ying-ping Chen
Extending the Scalability of Linkage
Learning Genetic Algorithms, 2006
ISBN 3-540-28459-1
Vol. 191. Martin V. Butz
Rule-Based Evolutionary Online Learning
Systems, 2006
ISBN 3-540-25379-3
Vol. 192. Jose A. Lozano, Pedro Larrañaga,
Iñaki Inza, Endika Bengoetxea (Eds.)
Towards a New Evolutionary Computation,
2006
ISBN 3-540-29006-0
Vol. 193. Ingo Glöckner
Fuzzy Quantifiers: A Computational Theory,
2006
ISBN 3-540-29634-4
Vol. 194. Dawn E. Holmes, Lakhmi C. Jain
(Eds.)
Innovations in Machine Learning, 2006
ISBN 3-540-30609-9
Dawn E. Holmes
Lakhmi C. Jain
(Eds.)
Innovations
in Machine Learning
Theory and Applications
ABC
Professor Dawn E. Holmes
Department of Statistics
and Applied Probability
University of California
at Santa Barbara
South Hall
Santa Barbara, CA 93106-3110
USA
E-mail: holmes@pstat.ucsb.edu
Professor Lakhmi C. Jain
School of Electrical
& Information Engineering
Knowledge-Based
Intelligent Engineering
Mawson Lakes, SA
Adelaide 5095
Australia
E-mail: lakhmi.jain@unisa.edu.au
Library of Congress Control Number: 2005937511
ISSN print edition: 1434-9922
ISSN electronic edition: 1860-0808
ISBN-10 3-540-30609-9 Springer Berlin Heidelberg New York
ISBN-13 978-3-540-30609-2 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting,
reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9,
1965, in its current version, and permission for use must always be obtained from Springer. Violations are
liable for prosecution under the German Copyright Law.
Springer is a part of Springer Science+Business Media
springer.com
c© Springer-Verlag Berlin Heidelberg 2006
Printed in The Netherlands
The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply,
even in the absence of a specific statement, that such names are exempt from the relevant protective laws
and regulations and therefore free for general use.
Typesetting: by the author and TechBooks using a Springer LATEX macro package
Cover design: Erich Kirchner, Heidelberg
Printed on acid-free paper SPIN: 10985687 89/TechBooks 5 4 3 2 1 0
This book is dedicated to our students.
Foreword
The study of innovation – the development of new knowledge and
artifacts – is of interest to scientists and practitioners. Innovations change
the day-to-day lives of individuals, transform economies, and even create
new societies. The conditions triggering innovation and the innovation
process itself set the stage for economic growth.
Scholars of technology have indicated that innovation lies at the
intersection of science and technology. One view proposes that
innovation is possible through advances in basic science and is realized in
concrete products within the context of applied science. Another view
states that the development of innovative products through applied science
generates new resources on which basic science draws to advance new
ideas and theories. Some believe that that science and technology form a
symbiotic relationship, drawing from and contributing to one another's
progress. Following this view, innovation in any domain can be
enhanced by principles and insights from diverse disciplines.
This book addresses an important component of innovation dealing with
knowledge discovery. The discovery aspect creates a natural bridge
between machine learning concepts, models, and algorithms and
innovation. In years to come machine learning will mark some of the early
fundamentals leading to innovative science.
Andrew Kusiak
Professor of Mechanical and Industrial Engineering
Intelligent Systems Laboratory
The University of Iowa
Iowa City, Iowa
USA
Preface
There are many invaluable books available on machine learning. However,
in compiling a volume titled “Innovations in Machine Learning” we wish
to introduce some of the latest developments to a broad audience of both
specialists and non-specialists in this field.
So, what is machine learning? Machine learning is a branch of artificial
intelligence that grew, as a research area, out of such diverse disciplines as
traditional computer science, linguistics, cognitive science, psychology
and philosophy. Although the philosophical roots of the subject may be
traced back to Leibniz and even ancient Greece, the modern era begins
with the work of Norbert Wiener, the father of Cybernetics, a term that he
introduced in ‘Control and Communication in the Animal and the
Machine’ (1948). However, it was not until 1955 that ‘The Logic
Theorist’, generally accepted as the first AI program, was presented by
Newell and Simon. In this ground-breaking work, Newell and Simon
proved that computers were more than just calculating machines, thus
shepherding in the era of the computational model of the mind.
In Turing's 1950 seminal work ‘Computing Machinery and
Intelligence’, in which he first presents his famous eponymous test, Turing
hoped to establish the claim that human intelligence is not special but can
be explained in terms of computation. Research initially focused on the
misguided notion that machine intelligence should provide a model for
human intelligence. Ultimately, researchers in expert systems found that
this was not the way to go.
The intractable question ‘Can machines think’? was soon modified to
‘Can machines learn’? the answer to which directs the research area that is
the subject of this volume. Machine learning is, therefore, concerned with
building adaptive computer systems that are capable of improving their
performance through learning.
Minsky, a great pioneer in machine learning, built SNARC (Stochastic
Neural-Analog Reinforcement computer), the first randomly wired neural
network learning machine, in 1951. Machine learning in the 1960’s was
largely concerned with knowledge representation and heuristic methods
but by the early 1970’s research in neural networks had begun to flourish.
With the fast moving pace of AI research in the 1980’s it became realistic
to develop systems to solve real-world problems.
As we shall see below, each of the three main learning systems;
Symbolic Learning, Neural Networks and Genetic Algorithms are
X Prface
represented in this volume. The pioneering work of Mitchell on Version
Spaces resulted in a new paradigm for symbolic inductive learning, which
became a dynamic research area. Research in Neural Networks blossomed
after the publication of ‘Parallel Distributed Processing, Volume I and II’
[Rumelhart and James McClelland 1984].
John Holland introduced Genetic Algorithms in the early 1970’s. The
on-going development of this area has resulted in a major paradigm for
research into automated computer program generation. The current volume
introduces the reader to research in classifier systems, the area of genetic
programming concerned with machine learning.
In compiling this volume we have sought to present innovative research
from prestigious contributors in the field of machine learning. Each of the
9 chapters is self-contained and is described below.
Chapter 1 by D. Heckerman C. Meek and G. Cooper is on a Bayesian
approach to casual discovery. In addition to describing the general
Bayesian approach to causal discovery, the authors present a review of
approximation methods for missing data and hidden variables, and
illustrate differences between the Bayesian and constraint-based methods
using artificial and real examples.
Chapter 2 by Neapolitan and Jiang presents a tutorial on learning casual
influences. In the last decade related research in AI, cognitive science and
philosophy have resulted in a method for learning casual relationships
when we have data on at least four variables. This method is described in
this chapter. The recent research on learning casual relationships in the
presence of only two variables is also presented.
Chapter 3 by Roth is on learning based programming. The author has
proposed a programming model that supports interaction with domain
elements at a semantic level. The author has presented some of the
theoretical foundations and first generation implementations of the
learning based programming language.
In Chapter 4, Eberhardt, Glymour and Scheines have presented their
research on casual relations. By combining experimental interventions
with search procedures for graphical causal methods, useful relationships
are shown with perfect data. This research provides useful insight in active
learning scenario.
Chapter 5 by S.H. Muggleton, H. Lodhi, A. Amini and M.J.E. Sternberg
is on Support Vector Inductive Logic Programming (SVILP). The authors
have proposed a general method for constructing kernels for support vector
inductive logic programming. SVIPL is evaluated empirically against
related approaches. The experimental results demonstrate that this novel
approach significantly outperforms all other approaches in the study.
Preface XI
Chapter 6 by Yoshua Bengio, Holger Schwenk, Jean-Sébastien Senécal,
Fréderic Morin and Jean-Luc Gauvain is on neural probabilistic language
models. The main goal of statistical language modeling is to learn the joint
probability function of sequences of words in a language. The authors have
proposed a new scheme to overcome the curse of dimensionality by
learning a distributed representation for words. A number of methods are
proposed to speed-up both training and probability computation. The
authors have incorporated their new model into a speech recognize of
conversational speech.
Chapter 7 by Adriaans and van Zaanen is on computational grammatical
inference. The authors have presented the overview of this area of
research. The authors present linguistic, empirical, and formal grammatical
inference and discuss the work that falls in the areas where these fields
overlap.
Chapter 8 by Jaz Kandola, John Shawe-Taylor, Andre Elisseeff and
Nello Cristianini is on kernel target alignment. Kernal based methods are
increasing being used for data modeling. The authors have presented their
research on measuring the degree of agreement between a kernel and a
learning task.
Chapter 9 by Ralf Herbrich, Thore Graepel, and Bob Williamson is on
the structure of version space. The authors have presented their research on
generalization performance of consistent classifiers, i.e. classifiers that are
contained in the so-called version space. Using a recent result in the PAC-
Bayesian framework the authors have shown that given a suitably chose
hypothesis space these exists a large fraction of classifiers with small
generalization error. The findings are validated using the kernel Gibbs
sampler algorithm.
This book will prove valuable to theoreticians as well as application
scientists/engineers in the broad area of artificial intelligence. Postgraduate
students will also find this a useful sourcebook since it shows the direction
of current research.
We have been fortunate in attracting contributions from top class
researchers and wish to offer our thanks for their support in this project.
We also acknowledge the expertise and time of the reviewers. We
appreciate the assistance of Berend Jan van der Zwaag during the final
preparation of manuscript. Finally, we also wish to thank Springer-Verlag
for their support.
USA Dawn E. Holmes
January 2006 Lakhmi C. Jain
Table of Contents
1 A Bayesian Approach to Causal Discovery .....................................1
1.1 Introduction .....................................................................................1
1.2 The Bayesian Approach ..................................................................2
1.3 Model Selection and Search............................................................6
1.4 Priors ...............................................................................................7
1.5 Example.........................................................................................10
1.6 Methods for Incomplete Data and Hidden Variables ....................13
1.6.1 Monte-Carlo Method............................................................14
1.7 A Case Study.................................................................................20
1.8 Open Issues ...................................................................................23
Acknowledgments .....................................................................................25
References..................................................................................................25
2 A Tutorial on Learning Causal Influence .......................................29
2.1 Introduction ...................................................................................29
2.1.1 Causation..............................................................................30
2.1.2 Causal networks ...................................................................33
2.2 Learning Causal Influences ...........................................................38
2.2.1 Making the Causal Faithfulness Assumption.......................36
2.2.2 Assuming Only Causal Embedded Faithfulness ..................42
2.2.3 Assuming Causal Embedded Faithfulness
with Selection Bias...............................................................53
2.3 Learning Causation From Data on Two Variables........................56
2.3.1 Preliminary Concepts ...........................................................56
2.3.2 Application to Causal Learning............................................62
2.3.3 Application to Quantum Mechanics.....................................64
References..................................................................................................69
3 Learning Based Programming ........................................................73
3.1 Introduction ...................................................................................74
3.2 Learning Based Programming.......................................................76
3.3 The LBP Programming Model ......................................................77
3.3.1 Knowledge Representations for LBP...................................79
3.3.2 Interaction.............................................................................88
3.3.3 Learning Operators in LBP ..................................................89
3.3.4 Inference...............................................................................91
3.3.5 Compilation..........................................................................92
3.4 Related Work ................................................................................ 92
3.5 Discussion ..................................................................................... 93
Acknowledgments ..................................................................................... 94
References.................................................................................................. 94
4 N-1 Experiments Suffice to Determine the Causal Relations
Among N Variables ............................................................................ 97
4.1 Introduction ................................................................................... 97
4.2 The Idea....................................................................................... 102
4.3 Discussion ................................................................................... 106
Acknowledgements.................................................................................. 107
Appendix: Proofs ..................................................................................... 107
References................................................................................................ 112
5 Support Vector Inductive Logic Programming ................................ 113
5.1 Introduction ................................................................................. 113
5.2 Background ................................................................................. 116
5.2.1 Kernels and Support Vector Machines............................... 116
5.2.2 Inductive Logic Programming ........................................... 169
5.3 Support Vector Inductive Logic Programming (SVILP) ............ 119
5.3.1 Family example .................................................................. 120
5.3.2 Definition of kernel ............................................................ 121
5.4 Related Work .............................................................................. 122
5.4.1 Propositionalisation............................................................ 125
5.4.2 Kernel within ILP............................................................... 126
5.5 Implementation ........................................................................... 127
5.6 Experiments................................................................................. 127
5.6.1 Materials............................................................................. 127
5.6.2 Methods.............................................................................. 128
5.7 Conclusions and Further Works.................................................. 131
Acknowledgements ................................................................................ 132
References................................................................................................ 132
6 Neural Probabilistic Language Models............................................. 137
6.1 Introduction ................................................................................. 138
6.1.1 Fighting the Curse of Dimensionality
with Distributed Representations ....................................... 140
6.1.2 Relation to Previous Work ..........................................