Integer Linear Programming in OpenGM

Today I will be discussing some of the ways in which inference algorithms especially the Linear Programming ones work in OpenGM.
So there are many ways to solve an energy minimization problem.
From the table below we can get an idea about the different accumulative operations used to do inference:

word_1

That is, if we need to solve for energy minimization problem, we have to go for a combination of opengm::Adder and opengm::Minimizer in OpenGM to bring the same effect.

Along with this, we have to choose one Inference algorithm. From OpenGM we get the list of available inference algorithms as below:

  1. Message Passing
    1. Belief Propogation
    2. TRBP
  2. TRW-S
  3. Dynamic Programming
  4. Dual Decomposition
    1. Using Sub-gradient Methods
    2. Using Bundle Methods
  5. A* Search
  6. (Integer) Linear Programming
  7. Graph Cuts
  8. QPBO
  9. Move Making
    1. Expansion
    2. Swap
    3. Iterative Conditional Modes(ICM)
    4. Lazy Flipper
    5. LOC
  10. Sampling
    1. Gibbs
    2. Swendsen-Wang
  11. Wrappers around other libraries
    1. libDAI
    2. MRF-LIB
  12. BruteForce Search

We will be concentrating here on the ILP Relaxation problem. Lets see how it goes!

Any energy minimization problem can be formulated as an (integer) linear problem. We can also sufficiently solve the problem by considering a relaxation of the original problem by relaxing the system of imequlities.

The corresponding parameter class can take many optional parameters which can be used to tune the solver. The most important are the flags

  • to switch to integer mode integerConstraint_
  • the number of threads that should be used numberOfThreads_
  • the verbose flag verbose_ that switches the CPLEX output on
  • the timeLimit_ that stops the method after N seconds realtime

 

A sample is shown below:

word_2

Getting OpenGM to work …

I had been trying hard to get accustomed with the OpenGM library so that I get to understand what are the things that I will be needing for my implementation of Approximate algorithms.

I will jot down my findings here:

Below is the most important image that will help me to write off things quite cleanly.

factor_graphs_opengm

 

The things that are needed by the OpenGM for completely specifying the above factor model are:

  1. The number of variables along with cardinality (the number of labels) of each of these. This is done by constructing an object called an Label Space.
  2. Functions describe the different small \varphi_i that are used to decompose the bigger \varphi.
  3. Operations can be addition, multiplication, max and min. These are the operations through which the \varphi decomposes into \varphi_i.
  4. Factors is the only missing link which connects the appropriate \varphi with respective x_v's intended as the inputs.

Writing a complete model in OpenGM using ExplicitFunctions.

explicit_func_code_blog

 

 

The above one was a very primitive model. Let us make from a complete one like present below:

factor_snap

 

 

download

GSoC 2015

It had been quite an awesome thing to be selected in Google Summer of Code for the second time. Along with it, I also got an offer to pursue masters in Johns Hopkins University in Computer Science in upcoming Fall. The god has been great to me! I thank almighty for all this .

Okay, back to work.

So my project this year deals with implementing “Approximate algorithms for inference in graphical models” in the awesome org. pgmpy. I will try to encompass everything that I have understood till now by reading various research papers and the help I had from the mentors in the form of naive questions. (I think I may need to keep the post updating. Below is what I have understood till now.)

 

What are graphical models? 

A graphical model is a probabilistic model where the conditional dependencies between the random variables are specified via a graph. Graphical models provide a flexible framework for modelling large collections of variables with complex interactions.

In this graphical representation, the nodes correspond to the variables in our domain, and the edges correspond to direct probabilistic interactions between them

Graphical models are good in doing the following things:

  1. Model Representation
    • It allows tractable representation of the joint distribution.
    • Very transparent
  2. Inference
    • It facilitates answering queries using our model of the world.
    • For example: algorithms for computing the posterior probability of some variables given the evidence on others.
    • Say: we observe that it is night and the owl is howling. We wish to know how likely a theft is going to happen, a query that can be formally be written as P(Theft=true | Time=Night, Conditions=Owl is howling)
  3. Learning
    • It facilitates the effective construction of these models by learning from data a model that provides a good approximation to our past experience
    • They sometimes reveal surprising connections between variables and provide novel insights about a domain.

 

Why we want to do inference? Ahem!! What is this “inference” actually!

The fundamental inference problem underlying many applications of graphical models is to find the most likely configuration of the probability distribution also called as the MAP (Maximum a Posteriori) assignment. This is what the underlying problem is while solving the stereo vision and protein design problems.

The graphical models that we consider involve discrete random variables. And it is known that a MAP inference can be formulated as an integer linear program.

There are many ways to solve the MAP inference problem namely:

  • Message Passing Algorithms: They solve the inference problem by passing messages along the edges of the graph that summarize each variable’s beliefs. Better check Yedidia et al., 2005 paper for a gentle introduction in message propagation. However, these algorithms can have trouble converging, and in difficult inference problems tend not to give good results
  • LP Relaxation: The LP relaxation is obtained by formulating inference as an integer linear program and then relaxing the integrality constraints on the variables. (More on this later!)

To correctly find the MAP assignment is equivalent to finding the assignment x_m that maximizes

\theta(x) = \Sigma_{ij \in E}\theta_{ij}(x_i, x_j)+\Sigma_{i \in V} \theta_i(x_i)

To understand what the above term is, we need to delve into the theory of pairwise Markov Random Fields. For the moment being, think of \theta_{ij} as the edge potential and \theta_{i} as the vertex potential.

To turn the above into an integer linear program( ILP ), we introduces variables

  1. \mu_i(x_i), one for each i \in V and state x_i
  2. \mu_{ij}(x_i, x_j), one for each edge ij \in E and pair of states x_i, x_j

The objective function is then:

\max_{\mu} \Sigma_{i \in V}\Sigma_{x_i}\theta_{i}(x_i) \mu_i(x_i)+\Sigma_{ij \in E}\Sigma_{x_i, x_j}\theta_{ij}(x_i, x_j) \mu_{ij}(x_i, x_j)

 

The set of \mu that arises from such joint distributions is known as the marginal polytope. There always exist a maximizing \mu that is integral- a vertex of the marginal polytope – and which corresponds to x_m

Although the number of variables in this LP is less, the difficulty comes from an exponential number of  linear inequalities typically required to describe the marginal polytope.

The idea in LP relaxations is to relax the difficult global constraint that the marginals in \mu arise from some common joint distribution. Instead we enforce this only over some subsets of variables that we refer to as clusters.

But what are constraints? How come you can use constraints and clusters interchangeably? 

What constraint is to Marginal Polytope is clusters to the primal LP problem.

So essentially, we here force every “cluster” of variables to choose a local assignment instead of enforcing that these local assignments are globally consistent. Had these local assignments been consistent globally, we would have the complete Marginal Polytope already.

We slowly and steadily add each of the clusters so that we tend towards the Marginal Polytope.

I am attaching few slides that I have found really useful in this case:

sontag_1

sontag_2 sontag_3 sontag_4

 

Through LP relaxation we solve for the case where the fewer clusters were assigned the local assignment( The local consistency polytope).The solution to this gives us the equation mentioned in the last above.

Why we need approximate algorithms? Is there any other way to do it like exact ones? If exact ones are there, why not use them!!

Solving for the marginal polytope gets too heavy. I will showcase the process in the form of some other slides that I really found useful.

sontag_a sontag_b sontag_c sontag_e sontag_f sontag_g sontag_h sontag_i sontag_k

What are the types of approximate algorithms that you are going to implement? Are there many types? 

LP Relaxation is the obvious one that is my target here. The other one above where I have added constraints to the marginal polytope is called the cutting plane method ( It happens to be a specific algorithm used in LP Relaxations to tend towards the MAP polytope)

Please describe step by step process of these algorithms. How does approximations helps when accuracy is concerned. Tell about the trade-offs that happen between accuracy and speed!

*I will get onto it soon*

Do previous implementation of what you are going to do already exists? If they, how you plan to make your implementation different from them. Is the language different?

I think they do exist in OpenGM library. I will surely take inspiration out of it. Yup, It is in C++ and we were to make our implementation in Python.

 

 

YOU MUST CHECK OUT THE AWESOME LIBRARY I AM WORKING WITH:  Pgmpy