Last Progress Report

I have successfully implemented the triplet clustering method from the 2nd paper. I am able to follow Abinash’s comments this time. He also appears happy with my coding so far. The results are all cool 🙂

They are 99% accurate if the convergence is fast. I am checking accuracy by comparing it against the MPLP code by Sonag. The results deviate when the number of iterations increase. This happens to be the issue I have spent my time mostly on in the last few days. The conclusion is that … the deviation is not of the result of an algorithmic error but loss of precision in rounding off fractions. I think we can bear such inaccuracies. Very minute indeed.

My next expectation is to write off a fantastic ipython notebook as a part of documentation showcasing the algorithm step by step so that a normal user doesn’t have to get into cumbersome details. Another expectation is ofcourse to get the current pr( pull 420 ) merged. I know that I need to implement the 3rd paper also, but I just feel that writing the proper documentation in the form of an ipython notebook and then extending the work would be a better option.

Okay then. I must retire now. Bye bye

Progress report after mid-term

Woah! I am enjoying every bit of coding now. I have figured out the whole algorithm from the basics! I had been a fool all the time 🙂 . Though I still think that the current understanding couldn’t have been possible if I didn’t made all the mistakes! So Thumbs Up!

Spending time with the exact inference codes that are already implemented in the pgmpy library along with the meticulous reading of those algorithms from Koller PGM book helped me enormously. The Variable elimination and the Belief-propagation methods are really cool for the beginners if they want to jump onto Approximate inference!

The messages are nothing but the Factor data-structures of the pgmpy library. Gosh! I never knew that. I am happy to understand how to inject the current implementation of the Mplp code into the library. My status is that I have completely reworked the 1st paper into pythonic class-oriented code. It is working awesome with some of the example UAI files that are present here: PASCAL examples .

I am biting my fingers to finish off the coding as it has been so interesting to code now I know what the algorithm meant!

Check out my new PR: pull 449

I hope you notice improvements there.

As far as the road-blocks are concerned…they seem to never leave me at all. After implementing the 1st paper two times (I hope you remember the old crude implementation), I got to know that Sontag later modified it in his PHD thesis a bit from (the original one in the Fixing-Max Product 2007 paper) that I implemented.  The new one is as follows:


Of the set of 3 papers, the Later 2 papers are written by Sontag along with the cpp code so I had the intention of changing the code. Though this part was trivial but still counts as a road-block!

The next thing to expect from my side is to add the Triplet clustering code. Till then, bye bye.

Mid-Term Summary

The Mid-term is over. I am through the first half of the Google summer of Code! As far as the accomplishments are concerned, I have almost implemented the culmination of 3 papers in python and have kept on updating my pull request here: pull 420 .

My most efforts went on getting all the algorithm work so I was not able to work on finesse points. Abinash is not happy with the current code. There are many loop-holes and my coding style isn’t good. Looks like the way I have spent my time understanding the algorithm from the implemented code isn’t right. I needed to get a grip of the algorithm.

So I have spent a lot of time again on the theory. This time, I made meticulous notes. For notes check this out: http://kislaynotes.droppages.com/

My expectations for the next period is to completely rework the current implementation. This code can be thought more like a sample to understand what lies ahead.

One thing got clear. I need to get more object oriented. The current method that I was using was more of porting the C++ code to python disregarding the pgmpy’s inherent classes.I now see, how sheepishly I read the already implemented exact inference algorithms in the library. I need to get more pythonic. My current code looks more like a C++ code which is very bad.

Anyways, so my focus now is to spend time with the already implemented library. See how the 1st paper fits in. Discuss a lot with the mentor. I have lost some precious time, but now since I know some of the things where I was doing wrong … I hope to proceed swiftly

 

Bye! I hope to get some cool python coding done in the next few days!

Progress Report

Continuing from my previous post on being able to understand the gist of the culmination of the 3 papers, I think I have some ideas now. Not 100% still. Being reading the papers and so much of mathematics related to it, I thought giving in to write code side by side will be good.

This is so because too much of theory makes me feel handicapped. I mean, writing code along helps me grasp the mathematical formulations nicely. This always doesn’t come along well though as there are many cases where I have no idea how to implement certain section of the paper and I move on by working out the implementation from the author’s code in the hope that I get it after that section is completed. And it works, I do really understand that part. The only thing is that it is not the right way! It is cumbersome.

Moreover, the implemented code written by Sontag et.al lacks proper documentation adding up my worries. Anyways, I am treading forward. Slowly but steadily. I have started writing the 1st paper in python following the C++ code and my understandings of the paper.

The part that now that i have to deal is the message updating method. This forms the crux of the paper.

 

This goes in the following manner. I have to read the Markov Model from the UAI file and take it in for the MAP query. Then run the update message method until we reach some convergence. For reading the model from the UAI file, I am thankful to Ankur who gladly helped me in writing a smple UAI reader so that I could focus more on the algorithm than on the other nuances. (check it out here: pull 411 ).

For the other part where I wrote a sample MPLP code, check out the other PR: pull 420 . I really wanted to get my mentor’s comments so I pushed this PR. Getting to know how much you are in water sooner or later is always helpful. I intend to grow this PR and write keep on implementing the left part.

Community bonding experience.

Hi all!

The community bonding period is over. I tried to start with understanding those source codes which have implemented the same class of algorithms that I am hoping to implement over the GSoC timeline.

For me, it is Approximate inference of Probabilistic Graphic Models. OpenGM is one of the best C++ template library for discrete factor graph models and distributive operations on these models. It includes state-of-the-art optimization and inference algorithms beyond message passing.

The method that I intend to implement is basically a message passing algorithm. It is called Max-product Linear Programming(MPLP) approx. inference. It was first proposed by Amir Globerson and Tommi Jaakkola in the paper “Fixing max-product: Convergent message passing algorithms for MAP LP-Relaxation” in the year 2007.

I figured it out that this is going to be the 1st phase of my implementation.

The whole of my timeline consists of implementing 2 more subsequent papers which are very much written in hierarchy to this one. So, I have been thinking that if this one is implemented correctly and efficiently, then extending the code for the other 2 papers will be easy.

The other 2 subsequent papers are:

So as far as the community bonding period is concerned, my learning curve was right up. I read all the 3 papers and tried to grasp the concepts of PGM’s. It came out to be not so easy and I had to fall back.

I started understanding PGM’s from the basics by skimming past the introductory videos on PGM from Coursera by Daphne Koller. After going over them, I tried for another read of the papers. I am now able to figure the objective quite well than the last time.

From the last GSoC experience, I am very much sure that understanding is very much important before coding. I just hope I get to understand the gist of the papers and implement them well.

Integer Linear Programming in OpenGM

Today I will be discussing some of the ways in which inference algorithms especially the Linear Programming ones work in OpenGM.
So there are many ways to solve an energy minimization problem.
From the table below we can get an idea about the different accumulative operations used to do inference:

word_1

That is, if we need to solve for energy minimization problem, we have to go for a combination of opengm::Adder and opengm::Minimizer in OpenGM to bring the same effect.

Along with this, we have to choose one Inference algorithm. From OpenGM we get the list of available inference algorithms as below:

  1. Message Passing
    1. Belief Propogation
    2. TRBP
  2. TRW-S
  3. Dynamic Programming
  4. Dual Decomposition
    1. Using Sub-gradient Methods
    2. Using Bundle Methods
  5. A* Search
  6. (Integer) Linear Programming
  7. Graph Cuts
  8. QPBO
  9. Move Making
    1. Expansion
    2. Swap
    3. Iterative Conditional Modes(ICM)
    4. Lazy Flipper
    5. LOC
  10. Sampling
    1. Gibbs
    2. Swendsen-Wang
  11. Wrappers around other libraries
    1. libDAI
    2. MRF-LIB
  12. BruteForce Search

We will be concentrating here on the ILP Relaxation problem. Lets see how it goes!

Any energy minimization problem can be formulated as an (integer) linear problem. We can also sufficiently solve the problem by considering a relaxation of the original problem by relaxing the system of imequlities.

The corresponding parameter class can take many optional parameters which can be used to tune the solver. The most important are the flags

  • to switch to integer mode integerConstraint_
  • the number of threads that should be used numberOfThreads_
  • the verbose flag verbose_ that switches the CPLEX output on
  • the timeLimit_ that stops the method after N seconds realtime

 

A sample is shown below:

word_2

Getting OpenGM to work …

I had been trying hard to get accustomed with the OpenGM library so that I get to understand what are the things that I will be needing for my implementation of Approximate algorithms.

I will jot down my findings here:

Below is the most important image that will help me to write off things quite cleanly.

factor_graphs_opengm

 

The things that are needed by the OpenGM for completely specifying the above factor model are:

  1. The number of variables along with cardinality (the number of labels) of each of these. This is done by constructing an object called an Label Space.
  2. Functions describe the different small \varphi_i that are used to decompose the bigger \varphi.
  3. Operations can be addition, multiplication, max and min. These are the operations through which the \varphi decomposes into \varphi_i.
  4. Factors is the only missing link which connects the appropriate \varphi with respective x_v's intended as the inputs.

Writing a complete model in OpenGM using ExplicitFunctions.

explicit_func_code_blog

 

 

The above one was a very primitive model. Let us make from a complete one like present below:

factor_snap

 

 

download

GSoC 2015

It had been quite an awesome thing to be selected in Google Summer of Code for the second time. Along with it, I also got an offer to pursue masters in Johns Hopkins University in Computer Science in upcoming Fall. The god has been great to me! I thank almighty for all this .

Okay, back to work.

So my project this year deals with implementing “Approximate algorithms for inference in graphical models” in the awesome org. pgmpy. I will try to encompass everything that I have understood till now by reading various research papers and the help I had from the mentors in the form of naive questions. (I think I may need to keep the post updating. Below is what I have understood till now.)

 

What are graphical models? 

A graphical model is a probabilistic model where the conditional dependencies between the random variables are specified via a graph. Graphical models provide a flexible framework for modelling large collections of variables with complex interactions.

In this graphical representation, the nodes correspond to the variables in our domain, and the edges correspond to direct probabilistic interactions between them

Graphical models are good in doing the following things:

  1. Model Representation
    • It allows tractable representation of the joint distribution.
    • Very transparent
  2. Inference
    • It facilitates answering queries using our model of the world.
    • For example: algorithms for computing the posterior probability of some variables given the evidence on others.
    • Say: we observe that it is night and the owl is howling. We wish to know how likely a theft is going to happen, a query that can be formally be written as P(Theft=true | Time=Night, Conditions=Owl is howling)
  3. Learning
    • It facilitates the effective construction of these models by learning from data a model that provides a good approximation to our past experience
    • They sometimes reveal surprising connections between variables and provide novel insights about a domain.

 

Why we want to do inference? Ahem!! What is this “inference” actually!

The fundamental inference problem underlying many applications of graphical models is to find the most likely configuration of the probability distribution also called as the MAP (Maximum a Posteriori) assignment. This is what the underlying problem is while solving the stereo vision and protein design problems.

The graphical models that we consider involve discrete random variables. And it is known that a MAP inference can be formulated as an integer linear program.

There are many ways to solve the MAP inference problem namely:

  • Message Passing Algorithms: They solve the inference problem by passing messages along the edges of the graph that summarize each variable’s beliefs. Better check Yedidia et al., 2005 paper for a gentle introduction in message propagation. However, these algorithms can have trouble converging, and in difficult inference problems tend not to give good results
  • LP Relaxation: The LP relaxation is obtained by formulating inference as an integer linear program and then relaxing the integrality constraints on the variables. (More on this later!)

To correctly find the MAP assignment is equivalent to finding the assignment x_m that maximizes

\theta(x) = \Sigma_{ij \in E}\theta_{ij}(x_i, x_j)+\Sigma_{i \in V} \theta_i(x_i)

To understand what the above term is, we need to delve into the theory of pairwise Markov Random Fields. For the moment being, think of \theta_{ij} as the edge potential and \theta_{i} as the vertex potential.

To turn the above into an integer linear program( ILP ), we introduces variables

  1. \mu_i(x_i), one for each i \in V and state x_i
  2. \mu_{ij}(x_i, x_j), one for each edge ij \in E and pair of states x_i, x_j

The objective function is then:

\max_{\mu} \Sigma_{i \in V}\Sigma_{x_i}\theta_{i}(x_i) \mu_i(x_i)+\Sigma_{ij \in E}\Sigma_{x_i, x_j}\theta_{ij}(x_i, x_j) \mu_{ij}(x_i, x_j)

 

The set of \mu that arises from such joint distributions is known as the marginal polytope. There always exist a maximizing \mu that is integral- a vertex of the marginal polytope – and which corresponds to x_m

Although the number of variables in this LP is less, the difficulty comes from an exponential number of  linear inequalities typically required to describe the marginal polytope.

The idea in LP relaxations is to relax the difficult global constraint that the marginals in \mu arise from some common joint distribution. Instead we enforce this only over some subsets of variables that we refer to as clusters.

But what are constraints? How come you can use constraints and clusters interchangeably? 

What constraint is to Marginal Polytope is clusters to the primal LP problem.

So essentially, we here force every “cluster” of variables to choose a local assignment instead of enforcing that these local assignments are globally consistent. Had these local assignments been consistent globally, we would have the complete Marginal Polytope already.

We slowly and steadily add each of the clusters so that we tend towards the Marginal Polytope.

I am attaching few slides that I have found really useful in this case:

sontag_1

sontag_2 sontag_3 sontag_4

 

Through LP relaxation we solve for the case where the fewer clusters were assigned the local assignment( The local consistency polytope).The solution to this gives us the equation mentioned in the last above.

Why we need approximate algorithms? Is there any other way to do it like exact ones? If exact ones are there, why not use them!!

Solving for the marginal polytope gets too heavy. I will showcase the process in the form of some other slides that I really found useful.

sontag_a sontag_b sontag_c sontag_e sontag_f sontag_g sontag_h sontag_i sontag_k

What are the types of approximate algorithms that you are going to implement? Are there many types? 

LP Relaxation is the obvious one that is my target here. The other one above where I have added constraints to the marginal polytope is called the cutting plane method ( It happens to be a specific algorithm used in LP Relaxations to tend towards the MAP polytope)

Please describe step by step process of these algorithms. How does approximations helps when accuracy is concerned. Tell about the trade-offs that happen between accuracy and speed!

*I will get onto it soon*

Do previous implementation of what you are going to do already exists? If they, how you plan to make your implementation different from them. Is the language different?

I think they do exist in OpenGM library. I will surely take inspiration out of it. Yup, It is in C++ and we were to make our implementation in Python.

 

 

YOU MUST CHECK OUT THE AWESOME LIBRARY I AM WORKING WITH:  Pgmpy

Benchmarking OpenCV’s integration with Shogun

I have made some codes for converting Mat objects from OpenCV library to CDenseFeatures object in Shogun Library. There are basically three methods of conversion.

  • Using the for loops (i.e the manual way)
  • Using the Memcpy command.
  • Using constructors.

I have enlisted here the benchmarking output for the conversion of Random matrices from Mat to CDenseFeatures. We will try to draw major conclusions based on them in the end like to choose the fastest one as the default. All the tests are done on my Local machine which is a Ubuntu 12.04 Precise.

Let’s start with the conversion of Mat to CDenseFeatures.

For instance: a cv::Mat object initialized randomly in CV_8U type, and converted to the following types:

  1. To uint8_t
  2. To int8_t
  3. To uint16_t
  4. To int16_t
  5. To int32_t
  6. To float32_t
  7. To float64_t

These 7 conversion are shown through sky blued bars(Different Mat types are marked in the horizontal axis label at the bottom most region of each of the chart).

Similarly all the orange bars marks show the conversion from CV_8S to their respective conversion output types.

Similarly all the green bars marks show the conversion from CV_32F to their respective conversion output types and so on….

…………………………………………………………………………………………………..

…………………………………………………………………………………………………..

…………………………………………………………………………………………………..

Few inference that we can draw directly are

1. Manual methods are consistent. i.e it doesn’t matter what the type of or on which other type are we converting our matrices into. It gives an approximately equal results in all of them.

2. Memcpy method is very very good with respect to the time taken. It is kind-off 2 times faster than the Manual method on smaller sized matrices especially for the types of uint8_t and int8_t.

For the back conversion, we will require the float CDenseFeatures object. And we will be converting It into different other types of Mat.


…………………………………………………………………………………………………..

…………………………………………………………………………………………………..

Few inferences that we can draw from these 3 figures are :

1. We see that in the constructor method, the conversion time for converting Float64_t CDenseFeature objects into CV_64F is almost negligible.

2. The order in which they can be arranged are Constructor> Memcpy < Manual.

2nd week report.

My work is primarily to develop a clean way to convert between OpenCV’s Mat object into our Shogun’s SGMatrix object. This would be really cool, once we have it as there will be a seamless interaction between both the libraries. Whenever we need machine learning capabilities, we can switch to Shogun and whenever we need the Image processing manipulations we can work on with OpenCV.

Point:

To develop a Factory such that if in its arguments I give a cv::Mat object’s name, It returns me with a DenseFeature object of shogun containing that matrix. The flexibility should be like that whatever type the input Mat might be (i.e uint8, uin16, float32,..etc), the user can choose what kind of DenseFeature he wanna work with. And yeah going vice versa will also be needed.

Things that are needed to be done:

We have 3 methods of converting data as suggested by Kevin in his gist here . Those are namely:

1. Using Constructors.

2. Using for loops (i.e doing it all manually).

3. Using Memcpy.

So, Using this I got a nice idea of how the main conversion would look like. Now comes the design part.

That means the functions declaration that we want would look something like this :

CDenseFeatures<T>* CV2FeaturesFactory::getDenseFeatures(cv::Mat, CV2SGOptions).

The 1st argument is to hold the name of the cv::Mat object that we want to transform.

The 2nd argument is to hold the name of the different options.

The template argument T would specify the type of the output CDenseFeatures that we want.

Initial Steps that came to my mind:

Now doing such a thing would require to define a template function in a non-templated class.

The non-templated class would look like :

class  CV2FeaturesFactory
{
public:
template<class T> static CDenseFeatures<T>* CV2FeaturesFactory::getDenseFeatures(cv::Mat, CV2SGOptions);
};

We need to somehow extract few things from the cv::Mat before we start with the Kevin’s methods. Those are:

1. Number of Rows.

2. Number of Columns.

3. What type is cv::Mat object  to start with!!

4. I think we may even need to convert it into the required type before we feed it into the 3 methods mentioned at the start.

Setting It all up:

Most things are quite straight forward.

  • To calculate number of rows/cols for a obj named cvMat, we have :
int r=cvMat.rows;
int c=cvMat.cols;
  • No need to know the type of cvMat. We just need to convert whatever it is into the required type using the
cvMat.convertTo(cvMat, the output typename);
  • The only thing thats needed to be done here explicitly is to transfer the typename from the template argument (i.e <class T> in the function definition) to something understandable to OpenCV in its convertTo() function ‘s second argument.
  • I checked out the way OpenCV handles it’s Types, and found about the enums:
8-bit unsigned integer (uchar)
8-bit signed integer (schar)
16-bit unsigned integer (ushort)
16-bit signed integer (short)
32-bit signed integer (int)
32-bit floating-point number (float)
64-bit floating-point number (double)
enum { CV_8U=0, CV_8S=1, CV_16U=2, CV_16S=3, CV_32S=4, CV_32F=5, CV_64F=6 };

This was a great finding. What this means is that we only need to somehow convert the given T into the required integer. Say, if the user has specified ‘uint16_t’ as the ‘T’. Then we need to get integer value 2 into the arguments of convertTo() function.

Now, the thing is that I cannot possibly play with adding default enums for uint8_t, uint16t_t, float32_t …. . It is not allowed as I kept on getting the error as it being ‘redeclared’.

  • The only thing I could come up with was a templated function which returns the required integers depending on the template arguement T.  For eg:
int ans = get<uint8_t>() would set ans to 0;
int ans = get<float64_t>() would set ans to 6;

I searched and found USER SPECIALIZATION. It does exactly this thing.

By default, a template gives a single definition to be used for every template argument. This doesn’t always makes sense in the way that If I want my implementation to be different for different types say, for each of the 3 following types:

uint8, int16, float32 — I want to write three different implementations of the same named templated functions. This can be achieved using Specialization.

So, I came up with this :

the general template must be declared before any specialization.

“The critical information supplied by the general template is the set of template parameters that the user must supply to use it or any of its specializations.”

template<typename T> class CV2SGTypeName;
template<> struct CV2SGTypeName<uint8_t>{
 static const int get(){return 0;}
};
template<> struct CV2SGTypeName<int8_t>{
static const int get(){return 1;}
};
template<> struct CV2SGTypeName<uint16_t>{
static const int get(){return 2;}
} and so on

Here I have created few specializations which can be used as the common implementation for the user.

So now we have a function CV2SGTypeName<T>::get() which can be used very flexibly to return integers based on the template argument.

i.e

const int myType = CV2SGTypeName<T>::get();

I will add the source codes Once It gets ready to be merged.