Last Progress Report

I have successfully implemented the triplet clustering method from the 2nd paper. I am able to follow Abinash’s comments this time. He also appears happy with my coding so far. The results are all cool 🙂

They are 99% accurate if the convergence is fast. I am checking accuracy by comparing it against the MPLP code by Sonag. The results deviate when the number of iterations increase. This happens to be the issue I have spent my time mostly on in the last few days. The conclusion is that … the deviation is not of the result of an algorithmic error but loss of precision in rounding off fractions. I think we can bear such inaccuracies. Very minute indeed.

My next expectation is to write off a fantastic ipython notebook as a part of documentation showcasing the algorithm step by step so that a normal user doesn’t have to get into cumbersome details. Another expectation is ofcourse to get the current pr( pull 420 ) merged. I know that I need to implement the 3rd paper also, but I just feel that writing the proper documentation in the form of an ipython notebook and then extending the work would be a better option.

Okay then. I must retire now. Bye bye

Progress report after mid-term

Woah! I am enjoying every bit of coding now. I have figured out the whole algorithm from the basics! I had been a fool all the time 🙂 . Though I still think that the current understanding couldn’t have been possible if I didn’t made all the mistakes! So Thumbs Up!

Spending time with the exact inference codes that are already implemented in the pgmpy library along with the meticulous reading of those algorithms from Koller PGM book helped me enormously. The Variable elimination and the Belief-propagation methods are really cool for the beginners if they want to jump onto Approximate inference!

The messages are nothing but the Factor data-structures of the pgmpy library. Gosh! I never knew that. I am happy to understand how to inject the current implementation of the Mplp code into the library. My status is that I have completely reworked the 1st paper into pythonic class-oriented code. It is working awesome with some of the example UAI files that are present here: PASCAL examples .

I am biting my fingers to finish off the coding as it has been so interesting to code now I know what the algorithm meant!

Check out my new PR: pull 449

I hope you notice improvements there.

As far as the road-blocks are concerned…they seem to never leave me at all. After implementing the 1st paper two times (I hope you remember the old crude implementation), I got to know that Sontag later modified it in his PHD thesis a bit from (the original one in the Fixing-Max Product 2007 paper) that I implemented.  The new one is as follows:


Of the set of 3 papers, the Later 2 papers are written by Sontag along with the cpp code so I had the intention of changing the code. Though this part was trivial but still counts as a road-block!

The next thing to expect from my side is to add the Triplet clustering code. Till then, bye bye.

Mid-Term Summary

The Mid-term is over. I am through the first half of the Google summer of Code! As far as the accomplishments are concerned, I have almost implemented the culmination of 3 papers in python and have kept on updating my pull request here: pull 420 .

My most efforts went on getting all the algorithm work so I was not able to work on finesse points. Abinash is not happy with the current code. There are many loop-holes and my coding style isn’t good. Looks like the way I have spent my time understanding the algorithm from the implemented code isn’t right. I needed to get a grip of the algorithm.

So I have spent a lot of time again on the theory. This time, I made meticulous notes. For notes check this out: http://kislaynotes.droppages.com/

My expectations for the next period is to completely rework the current implementation. This code can be thought more like a sample to understand what lies ahead.

One thing got clear. I need to get more object oriented. The current method that I was using was more of porting the C++ code to python disregarding the pgmpy’s inherent classes.I now see, how sheepishly I read the already implemented exact inference algorithms in the library. I need to get more pythonic. My current code looks more like a C++ code which is very bad.

Anyways, so my focus now is to spend time with the already implemented library. See how the 1st paper fits in. Discuss a lot with the mentor. I have lost some precious time, but now since I know some of the things where I was doing wrong … I hope to proceed swiftly

 

Bye! I hope to get some cool python coding done in the next few days!

Progress Report

Continuing from my previous post on being able to understand the gist of the culmination of the 3 papers, I think I have some ideas now. Not 100% still. Being reading the papers and so much of mathematics related to it, I thought giving in to write code side by side will be good.

This is so because too much of theory makes me feel handicapped. I mean, writing code along helps me grasp the mathematical formulations nicely. This always doesn’t come along well though as there are many cases where I have no idea how to implement certain section of the paper and I move on by working out the implementation from the author’s code in the hope that I get it after that section is completed. And it works, I do really understand that part. The only thing is that it is not the right way! It is cumbersome.

Moreover, the implemented code written by Sontag et.al lacks proper documentation adding up my worries. Anyways, I am treading forward. Slowly but steadily. I have started writing the 1st paper in python following the C++ code and my understandings of the paper.

The part that now that i have to deal is the message updating method. This forms the crux of the paper.

 

This goes in the following manner. I have to read the Markov Model from the UAI file and take it in for the MAP query. Then run the update message method until we reach some convergence. For reading the model from the UAI file, I am thankful to Ankur who gladly helped me in writing a smple UAI reader so that I could focus more on the algorithm than on the other nuances. (check it out here: pull 411 ).

For the other part where I wrote a sample MPLP code, check out the other PR: pull 420 . I really wanted to get my mentor’s comments so I pushed this PR. Getting to know how much you are in water sooner or later is always helpful. I intend to grow this PR and write keep on implementing the left part.

Community bonding experience.

Hi all!

The community bonding period is over. I tried to start with understanding those source codes which have implemented the same class of algorithms that I am hoping to implement over the GSoC timeline.

For me, it is Approximate inference of Probabilistic Graphic Models. OpenGM is one of the best C++ template library for discrete factor graph models and distributive operations on these models. It includes state-of-the-art optimization and inference algorithms beyond message passing.

The method that I intend to implement is basically a message passing algorithm. It is called Max-product Linear Programming(MPLP) approx. inference. It was first proposed by Amir Globerson and Tommi Jaakkola in the paper “Fixing max-product: Convergent message passing algorithms for MAP LP-Relaxation” in the year 2007.

I figured it out that this is going to be the 1st phase of my implementation.

The whole of my timeline consists of implementing 2 more subsequent papers which are very much written in hierarchy to this one. So, I have been thinking that if this one is implemented correctly and efficiently, then extending the code for the other 2 papers will be easy.

The other 2 subsequent papers are:

So as far as the community bonding period is concerned, my learning curve was right up. I read all the 3 papers and tried to grasp the concepts of PGM’s. It came out to be not so easy and I had to fall back.

I started understanding PGM’s from the basics by skimming past the introductory videos on PGM from Coursera by Daphne Koller. After going over them, I tried for another read of the papers. I am now able to figure the objective quite well than the last time.

From the last GSoC experience, I am very much sure that understanding is very much important before coding. I just hope I get to understand the gist of the papers and implement them well.

Integer Linear Programming in OpenGM

Today I will be discussing some of the ways in which inference algorithms especially the Linear Programming ones work in OpenGM.
So there are many ways to solve an energy minimization problem.
From the table below we can get an idea about the different accumulative operations used to do inference:

word_1

That is, if we need to solve for energy minimization problem, we have to go for a combination of opengm::Adder and opengm::Minimizer in OpenGM to bring the same effect.

Along with this, we have to choose one Inference algorithm. From OpenGM we get the list of available inference algorithms as below:

  1. Message Passing
    1. Belief Propogation
    2. TRBP
  2. TRW-S
  3. Dynamic Programming
  4. Dual Decomposition
    1. Using Sub-gradient Methods
    2. Using Bundle Methods
  5. A* Search
  6. (Integer) Linear Programming
  7. Graph Cuts
  8. QPBO
  9. Move Making
    1. Expansion
    2. Swap
    3. Iterative Conditional Modes(ICM)
    4. Lazy Flipper
    5. LOC
  10. Sampling
    1. Gibbs
    2. Swendsen-Wang
  11. Wrappers around other libraries
    1. libDAI
    2. MRF-LIB
  12. BruteForce Search

We will be concentrating here on the ILP Relaxation problem. Lets see how it goes!

Any energy minimization problem can be formulated as an (integer) linear problem. We can also sufficiently solve the problem by considering a relaxation of the original problem by relaxing the system of imequlities.

The corresponding parameter class can take many optional parameters which can be used to tune the solver. The most important are the flags

  • to switch to integer mode integerConstraint_
  • the number of threads that should be used numberOfThreads_
  • the verbose flag verbose_ that switches the CPLEX output on
  • the timeLimit_ that stops the method after N seconds realtime

 

A sample is shown below:

word_2

Getting OpenGM to work …

I had been trying hard to get accustomed with the OpenGM library so that I get to understand what are the things that I will be needing for my implementation of Approximate algorithms.

I will jot down my findings here:

Below is the most important image that will help me to write off things quite cleanly.

factor_graphs_opengm

 

The things that are needed by the OpenGM for completely specifying the above factor model are:

  1. The number of variables along with cardinality (the number of labels) of each of these. This is done by constructing an object called an Label Space.
  2. Functions describe the different small \varphi_i that are used to decompose the bigger \varphi.
  3. Operations can be addition, multiplication, max and min. These are the operations through which the \varphi decomposes into \varphi_i.
  4. Factors is the only missing link which connects the appropriate \varphi with respective x_v's intended as the inputs.

Writing a complete model in OpenGM using ExplicitFunctions.

explicit_func_code_blog

 

 

The above one was a very primitive model. Let us make from a complete one like present below:

factor_snap

 

 

download

GSoC 2015

It had been quite an awesome thing to be selected in Google Summer of Code for the second time. Along with it, I also got an offer to pursue masters in Johns Hopkins University in Computer Science in upcoming Fall. The god has been great to me! I thank almighty for all this .

Okay, back to work.

So my project this year deals with implementing “Approximate algorithms for inference in graphical models” in the awesome org. pgmpy. I will try to encompass everything that I have understood till now by reading various research papers and the help I had from the mentors in the form of naive questions. (I think I may need to keep the post updating. Below is what I have understood till now.)

 

What are graphical models? 

A graphical model is a probabilistic model where the conditional dependencies between the random variables are specified via a graph. Graphical models provide a flexible framework for modelling large collections of variables with complex interactions.

In this graphical representation, the nodes correspond to the variables in our domain, and the edges correspond to direct probabilistic interactions between them

Graphical models are good in doing the following things:

  1. Model Representation
    • It allows tractable representation of the joint distribution.
    • Very transparent
  2. Inference
    • It facilitates answering queries using our model of the world.
    • For example: algorithms for computing the posterior probability of some variables given the evidence on others.
    • Say: we observe that it is night and the owl is howling. We wish to know how likely a theft is going to happen, a query that can be formally be written as P(Theft=true | Time=Night, Conditions=Owl is howling)
  3. Learning
    • It facilitates the effective construction of these models by learning from data a model that provides a good approximation to our past experience
    • They sometimes reveal surprising connections between variables and provide novel insights about a domain.

 

Why we want to do inference? Ahem!! What is this “inference” actually!

The fundamental inference problem underlying many applications of graphical models is to find the most likely configuration of the probability distribution also called as the MAP (Maximum a Posteriori) assignment. This is what the underlying problem is while solving the stereo vision and protein design problems.

The graphical models that we consider involve discrete random variables. And it is known that a MAP inference can be formulated as an integer linear program.

There are many ways to solve the MAP inference problem namely:

  • Message Passing Algorithms: They solve the inference problem by passing messages along the edges of the graph that summarize each variable’s beliefs. Better check Yedidia et al., 2005 paper for a gentle introduction in message propagation. However, these algorithms can have trouble converging, and in difficult inference problems tend not to give good results
  • LP Relaxation: The LP relaxation is obtained by formulating inference as an integer linear program and then relaxing the integrality constraints on the variables. (More on this later!)

To correctly find the MAP assignment is equivalent to finding the assignment x_m that maximizes

\theta(x) = \Sigma_{ij \in E}\theta_{ij}(x_i, x_j)+\Sigma_{i \in V} \theta_i(x_i)

To understand what the above term is, we need to delve into the theory of pairwise Markov Random Fields. For the moment being, think of \theta_{ij} as the edge potential and \theta_{i} as the vertex potential.

To turn the above into an integer linear program( ILP ), we introduces variables

  1. \mu_i(x_i), one for each i \in V and state x_i
  2. \mu_{ij}(x_i, x_j), one for each edge ij \in E and pair of states x_i, x_j

The objective function is then:

\max_{\mu} \Sigma_{i \in V}\Sigma_{x_i}\theta_{i}(x_i) \mu_i(x_i)+\Sigma_{ij \in E}\Sigma_{x_i, x_j}\theta_{ij}(x_i, x_j) \mu_{ij}(x_i, x_j)

 

The set of \mu that arises from such joint distributions is known as the marginal polytope. There always exist a maximizing \mu that is integral- a vertex of the marginal polytope – and which corresponds to x_m

Although the number of variables in this LP is less, the difficulty comes from an exponential number of  linear inequalities typically required to describe the marginal polytope.

The idea in LP relaxations is to relax the difficult global constraint that the marginals in \mu arise from some common joint distribution. Instead we enforce this only over some subsets of variables that we refer to as clusters.

But what are constraints? How come you can use constraints and clusters interchangeably? 

What constraint is to Marginal Polytope is clusters to the primal LP problem.

So essentially, we here force every “cluster” of variables to choose a local assignment instead of enforcing that these local assignments are globally consistent. Had these local assignments been consistent globally, we would have the complete Marginal Polytope already.

We slowly and steadily add each of the clusters so that we tend towards the Marginal Polytope.

I am attaching few slides that I have found really useful in this case:

sontag_1

sontag_2 sontag_3 sontag_4

 

Through LP relaxation we solve for the case where the fewer clusters were assigned the local assignment( The local consistency polytope).The solution to this gives us the equation mentioned in the last above.

Why we need approximate algorithms? Is there any other way to do it like exact ones? If exact ones are there, why not use them!!

Solving for the marginal polytope gets too heavy. I will showcase the process in the form of some other slides that I really found useful.

sontag_a sontag_b sontag_c sontag_e sontag_f sontag_g sontag_h sontag_i sontag_k

What are the types of approximate algorithms that you are going to implement? Are there many types? 

LP Relaxation is the obvious one that is my target here. The other one above where I have added constraints to the marginal polytope is called the cutting plane method ( It happens to be a specific algorithm used in LP Relaxations to tend towards the MAP polytope)

Please describe step by step process of these algorithms. How does approximations helps when accuracy is concerned. Tell about the trade-offs that happen between accuracy and speed!

*I will get onto it soon*

Do previous implementation of what you are going to do already exists? If they, how you plan to make your implementation different from them. Is the language different?

I think they do exist in OpenGM library. I will surely take inspiration out of it. Yup, It is in C++ and we were to make our implementation in Python.

 

 

YOU MUST CHECK OUT THE AWESOME LIBRARY I AM WORKING WITH:  Pgmpy

Self disparaging thoughts that cleared…

I understand the murky title I have put here. Always indulging deep into the conscience of the people around me, I stumbled across some questions. It’s been ten days since I looked for an answer that validated them. I posed these questions in front of my friends, and the more often I did it, the greater the longing became to quell my thirst.

I am a guy known to be idiosyncratic for his profound knowledge of human aspirations. Such is the case that many people come over to me to discuss their important career decisions and they follow my say! While I enjoy this position a lot, there comes a responsibility that I have to deal over with. I myself am not a very successful guy, nor the most intelligent one and the reason they ask me for these favors are unbeknownst to me. But one thing that I get out of most of the interactions that I had with my friends has put me into a dilemma recently. That explains the title hopefully, as I am surging with thoughts to pen down the realization that I had today which I think tries to answer that question.

Now what the question is? It’s not easy to put up that question as it will only apply to you If you had faced a similar situation. It’s about what to choose from the many choices presented to you. All the time people have to choose one thing over the another, sometimes life decisions which can have a near-permanent impact. At these times one has to overcome the void which is created by the other option which we left out. Can this void replenish over time? is the question most sought after. While helping out others solving this question , I got struck with multitude of uncertainties from them. Better than calling uncertainties, the better word would be fear. The fear of failing. Without exception, I found huge traces of that fear lying beneath myself. To repudiate this fact and continue helping others overcoming this fear is equivalent to a crime. I stopped that day as I found myself unqualified, only to start again after overcoming this fear myself.

So the fear of failing it is! Do we really to get that fear away? Nope. For some this fear actually work wonders. When you fear something, you work to bypass it. You tend to remain safe and opt for a decision that won’t give you hiccups in future. There is nothing wrong in this. Being safe is the simplest of all the tendencies that a person want. I am happy for these guys for their non-polemical ideals. For these people life will be great and they will have a fun life with their families. They may have to shun their passions but that’s okay. I feel poignant for guys who are not okay with this, sadly which encompasses me. It’s difficult for guys in this category to opt for the safe choice knowing that they may never be able to fill the void.

Then how to overcome this fear. In the first place, why you think that having a safe life won’t enrich you. Putting it very simply, people in this category want to deliver in this world and they know it very well that in order to do anything at that scale needs hard work and dedication. Won’t you enjoy dedicating yourself to something that you love! That’s the point. They have the perfect idea that if they are allowed to work freely on their area of interest, nothing in this world can thwart the results.

To overcome this fear, we need to keep in track of people who are like minded! A conducive environment is what needed. Terrible people create terrible environment and we must never get them complicate our lives. Phew! I don’t know from where the last few lines came!

I will write it more clearly in the next posts to come.

Benchmarking OpenCV’s integration with Shogun

I have made some codes for converting Mat objects from OpenCV library to CDenseFeatures object in Shogun Library. There are basically three methods of conversion.

  • Using the for loops (i.e the manual way)
  • Using the Memcpy command.
  • Using constructors.

I have enlisted here the benchmarking output for the conversion of Random matrices from Mat to CDenseFeatures. We will try to draw major conclusions based on them in the end like to choose the fastest one as the default. All the tests are done on my Local machine which is a Ubuntu 12.04 Precise.

Let’s start with the conversion of Mat to CDenseFeatures.

For instance: a cv::Mat object initialized randomly in CV_8U type, and converted to the following types:

  1. To uint8_t
  2. To int8_t
  3. To uint16_t
  4. To int16_t
  5. To int32_t
  6. To float32_t
  7. To float64_t

These 7 conversion are shown through sky blued bars(Different Mat types are marked in the horizontal axis label at the bottom most region of each of the chart).

Similarly all the orange bars marks show the conversion from CV_8S to their respective conversion output types.

Similarly all the green bars marks show the conversion from CV_32F to their respective conversion output types and so on….

…………………………………………………………………………………………………..

…………………………………………………………………………………………………..

…………………………………………………………………………………………………..

Few inference that we can draw directly are

1. Manual methods are consistent. i.e it doesn’t matter what the type of or on which other type are we converting our matrices into. It gives an approximately equal results in all of them.

2. Memcpy method is very very good with respect to the time taken. It is kind-off 2 times faster than the Manual method on smaller sized matrices especially for the types of uint8_t and int8_t.

For the back conversion, we will require the float CDenseFeatures object. And we will be converting It into different other types of Mat.


…………………………………………………………………………………………………..

…………………………………………………………………………………………………..

Few inferences that we can draw from these 3 figures are :

1. We see that in the constructor method, the conversion time for converting Float64_t CDenseFeature objects into CV_64F is almost negligible.

2. The order in which they can be arranged are Constructor> Memcpy < Manual.