VMC optimization

From QMC

Jump to: navigation, search

THIS IS NOT CORRECT/COMPLETE YET. DON'T TRY YET

We are interested in minimizing the function math where we are trying to optimize over all possible values of alpha. We've seen that one technique of optimizing such a function is to graph all values of math and simply choose the minimum. Although we got away with this for one parameter, we can already see for two parameters it is difficult and for many parameters it would be impossible.

These days, there are recent and sophisticated ways of handling this problem. In this section, we will discuss methods that are more naive. They are nonetheless interesting and a good starting point for understanding how one might deal with the optimization problem.

We have already decided that using a grid of points is impractical. A natural next thing to try is try to successfully iterate to more and more accurate sets of parameters. A number of natural optimization techniques (newton's method, conjugate gradient, etc.) allow one to do this by using local information about the derivative with respect to the parameters. If we had exact values of math this would be reasonable approach. Unfortunately we currently only have an approximation generated by monte carlo. This approximation has the disadvantage of being noisy. This makes iterative methods (especially one's using finite differences to get derivatives) nonstable and problematic.


One possible solution to this problem is to come up with an approximation to our function that is not noisy.


This function is smooth and optimizing over this function using canonical optimization techniques would be practical. One possible way of doing this is the following:

Imagine you have the energy and 10,000 configurations sampled with the probability distribution induced by the parameter math One approximation to our function induced by parameter math is

math

This function has the advantage that it is not noisy. It has at least two disadvantages which we will take in turn.

Problem 1: To begin with, for values math far from math we will find that, because math is small, there will be very few effective points of data. This means that although the function is smooth at these values of math it will be highly inaccurate.

Problem 2: The second problem is the following. Imagine you one had 2 (instead of 10,000) points in your sample. It's likely that you could find some set of parameters that makes this configurations especially small even though if you used these parameters for the entire integral, they would be suboptimal. This is called overfitting.

Even in the face of these two obstacles, one might imagine that using this technique would get us a closer estimate for a good set of parameters. Then we could iterate a couple times.


Let's try this! To begin with, modify your VMC function to return not only a list of energies but also a list of configurations.

Now, write a function

def ChangeAlpha(coordList,params1,params2,WF):
   return params2_energy

that takes a list of coordinates that have been sampled with params1 and calculates the energy of the WF with the set of params2.

Now, let us write a function

def Optimize2(WF,initParams):
   

that does one iteration of the above method by sampling once at initParams and then minimizing everything else using this approximate function. You may start optimizing by just searching for the minimum from a grid of points, but we will have to soon do better then this (that was the whole point of switching to another style of approximate function).

Now write

def Optimize3(WF):

that actually calls Optimize2 a number of times iteratively and selects the best set of parameters.

Now that you have things working, we should speed things up so that we don't have to scan across a whole grid of points. Instead replace the optimization procedure in Optimize2 with one that uses conjugate gradient to find the minimum values quickly. To do this you might want to take a look at scipy.optimize.


It turns out that before the recent developments in optimization methods, people had developed ways of getting around the above pathologies. To get around the pathology that a finite number of points had a tendency to be overfit, the objective function was changed. Instead of using an objective function of minimum energy, an objective function of minimum variance of the points was used.

Switch your objective function to use minimum variance instead of minimum energy. Do you see a drastic difference in your results? What happens if you use a very few number of monte carlo configurations (say 100).

The second pathology can be handled in the following way. Imagine that we approximate our function by assuming math This removes said pathology at the cost of a pretty bad approximation. Nonetheless, if we are only interested in using this method to get us to the next iteratively closer set of paramaters, it is not an unreasonable thing to try.

Modify your methods to use this technique. What happens? Are things better controlled?