Something I find is underemphasised in this type of material is the equivalence between computing the value of an integral and the sum of functions of random variables. This is a very useful intuition for framing various problems in efficient ways.

This article brings it up, but only as a "look at what you can do with MC methods" and not "here's this fundamental relation that's really important."

There's lots of useful understanding about importance sampling and related techniques that seem obvious when you look at it as an integration problem.

Unfortunately my only exposure to this is a University course I took called Computer Intensive Methods in Mathematical Statistics[1].

It was a really good course, but at the time I was not in a mental space where I could get the most out of it, so I'm kind of hoping someone else can point to a solid textbook or something.

I liked the book "Statistical Mechanics: Algorithms and Computations" by Werner Krauth. It's oriented toward physics, but IIRC he links Monte Carlo simulations to the path integral formalism.

There is also a MOOC (same title, same author) on Coursera that I found interesting. It doesn't not go to the same depth as the book however. They are more complementing each other, the book for theory, the MOOC for implementation exercises.

I don't know if I can plug-in here some of the materials I prepared some years ago. It might help to bridge/connect the use of Monte Carlo techniques and its relations with integration:

Also just to mention something else that is very interesting related with MC : "quasi-Monte Carlo techniques" (small example in https://github.com/ChristopheRappold/MCandPython/blob/master... ). For people interested in MC, just take a look to those keywords.

The compound interest explanation was particularly intriguing. In part because I haven't fully grokked the discrete-vs-continuous interest difference yet (also haven't tried very hard -- just remember being a bit surprised by some things mentioned in Paul Wilmott Introduces Quantitative Finance). I liked the intuition of simply "owning very many such bills" reducing to the expectation.

Nice fact! How about this for a proof: Let X_j be a sequence for unif(0,1) random variables, S_n the sequence of partial sums (so S_4=X_1+X_2+X_3+X_4).

If N is the number of uniforms needed to have a sum bigger than 1, you can see that P(N>n)=P(S_n<1).

P(S_n<1) is not too bad to compute, or you can wikipedia it (Irwin-Hall distribution) to find the expression P(S_n<1)=1/n!

Combining the things, E[N]=sum(P(N>n),n=0..infinity)=sum(1/n!,n=0..infinity)=exp(1).

Thank you, I will look into this. I've taken discrete math and some graph theory, but never stats, so all those families of distributions are alien to me :)

Even given their nuclear weapons origin, we lowly civil nuclear power engineers still use Monte Carlo methods all day every day. We all make our employers buy big supercomputers or get us access to the national lab leadership class HPC to just pound the hell out of our reactor design problems with random particle transport chains. Sure there are (dramatically) faster deterministic methods that are generally good enough, but Monte Carlo allows you to use exact geometry and not bother too much with the pesky art of computing average nuclear interaction probabilities.

Heck, my buddy at MIT made an open-source Monte Carlo code called OpenMC that's now run by Argonne National Lab [1]. Now everyone can do truly legit reactor design with Monte Carlo!

Monte Carlo methods are prominent all over physics. They can be efficient methods for approximating multidimensional integrals; so they are used in calculating collision terms in kinetic equations, for example.

Such a crucial skill set. The applications are remarkably broad: adaptive optics in astronomy, bayes nets in genetics, boson sampling in qc ... and of course DeepMind's AlphaGo ;)

For a more formal treatment, the projects in the MIT comp prob course are genuinely enjoyable to implement:

MITx 6.008.1x Computational Probability and Inference

A (minor) point not raised in the article is that the accuracy of these methods is typically bounded unless your RNG is truly random. So for example if you try to calculate pi as shown you will not be able to finesse the answer beyond a certain accuracy if you are using a typical language's pseudo-random number generator, because the true space of possible values is only partially sampled.

RNG quality is probably the least of your worries.

The curse of dimensionality means you often need a huge number of samples to guarantee a useful level of precision. Additionally, many quantities you'd like to know come up in Monte Carlo simulations themselves. That creates an intractable O(N^2) problem.

Here's an example that I encountered at work yesterday:

You want to know P(x>=a and y>=b) where (x,y) is a standard bivariate normal with correlation R.

You can:

1. Simulate a bunch of (x,y) and take the mean of the indicator function 1[x>=a, y>=b]. This will take whole seconds to compute.

2. Use an adaptive quadrature method like scipy.integrate.nquad. This will take dozens to hundreds of milliseconds.

3. Write a fixed grid Gauss-Legendre quadrature routine in c. This will take less than a microsecond.

I needed this value inside a Monte Carlo simulation. If I had to simulate the solution, the problem would be intractable.

Does that mean we can use the calculation of e.g. pi and a check what value it converges to as a check of the sampling space of a rng? I guess the sampling space is only a necessary condition for rng "quality"?

My personal anecdote: Math is not my strong side, to put it mildly, but I can write simulators just fine. When solving the first n Euler problems I got stuck on the analytical solution of the chip defect problem. Simulating it was easy enough. But whoever put those Euler problems together was pretty clever: the simulated answer wasn't precise enough (8 digits instead of 10 if I recall), forcing you to find the analytical one, or to brute force the remaining two digits.

There are a couple of reasons that might happen. You’d have to use at least 64 bit doubles (or 64 bit ints if using fixed point), since a 32 bit float is ~7 digits of precision.

Also if you’re plugging random numbers directly into the simulator like this article does, the convergence rate will be n^(1/2) which means that going from 8 digits of precision to 10 digits will take 10,000 times longer. In order to get a better convergence rate, you could use jittering/stratification or a low-discrepancy sequence instead of a bare random number generator. Even with a perfect QMC convergence rate of 1/n, getting 10 digits of precision reliably will take 100 times longer than 8 digits of precision.

I don’t remember the chip defect problem, so I’m not sure whether I solved it. ;) But based on your description, and knowing that Euler problems are designed to be resistant to long computations, I might bet on the convergence rate being the issue. I saw your comment about using a better RNG, but if you used a 64-bit RNG, I doubt that is the problem, I think you’d get away with a low quality RNG if you waited long enough.

I used Monte Carlo simulations to good effect analyzing the card game Set. Specifically, the odds of not finding a set among the 12 cards. More or less impossible to solve analytically (at least for me), but no problem to write a short program and run simulations.

Huge caveat: central limit theorem isn't all compassing, and the limit will diverge unexpectedly if you keep using it without checking the distribution first.

A nice example of how much effort MC simulations save is calculating the area of a shadow cast by a cylinder. If you orient it just right the answer is easy, but it's a properly taxing calculation to work out an analytical function by hand that works for any orientation. Seriously give it a try, it's much harder than it appears.

You can make a spreadsheet in 3 minutes that numerically solves the problem with MC. Now imagine that the cylinder is partly transparent and you want to know how much light makes it through. A few minutes of adjustment and the spreadsheet will do it no problem. Now imagine there is another object embedded in the first with its own transparency. Again only a relatively small amount of time is needed to get the simulation working. The analytical solution is not reachable by mortals.

You generally need to know the fundamental probabilities in advance to sample the random numbers properly. What Monte Carlo allows you to do is combine basic probabilities in a vastly more complex system to see the effective average probabilities of the whole.

> What Monte Carlo allows you to do is combine basic probabilities in a vastly more complex system to see the effective average probabilities of the whole

Could you elaborate on this a bit? I'm having a hard time seeing why and how you would combine basic probabilities with a complex system.

Problems solved with Monte Carlo methods are often ones where you need to draw random values from oddly-shaped multidimensional distributions.

The naive method of doing this is just to cover the space evenly and weight each value by its probability. This is often computationally inefficient because you spend lots of time generating samples that don't actually contribute much to the result.

This is why we have methods like Metropolis--Hastings or Gibbs: these take an appropriately encoded description of the desired distribution and, through clever means, generate samples from it relatively cheaply, in proportion to how likely they are.

This part of the problem is likely what the GP alluded to.

Sure. I'll give an example. In nuclear engineering we know the probabilities, as functions of energy, of each nuclear reaction a neutron and nucleus can have. We measured these carefully in labs. They're fundamental. Now if you load up a Monte Carlo simulation of an entire nuclear reactor and use Monte Carlo to send neutrons through you can compute the average reaction rates all around the reactor with its complex structures just with those fundamentals.

So in this case it's complex because of the the number of particles in the system, but because you can model each one with known probabilities you can therefore simulate the full system?

Yeah. I wrote, a long time ago, a small experiment where you can calculate the area of a crazy drawn shape and approximate it's area using monte carlo simulation.

Monte Carlo method is even used by chess programs. Having the engine quickly play multiple games against itself, from a given position, is an alternative method of evaluation, probing how the position holds "in practice".

Also Monte Carlo Tree Search was decisive in bringing Go (the ancient board game) engines from unable to compete with advanced amateur players to successfully competing with the cream of the pros!

I find it remarkable how well MCTS works. Even without the value or policy net, the vanilla algorithm is capable of finding meaningful results by solving the MDP.

i once used the monte carlo method to create a very simple but surprisingly effective AI for 4-in-a-row. it's so easy to implement it pretty much worked on the first try, and as a bonus the difficulty is easily tunable by changing the number of simulations (i.e. fewer simulations mean the AI makes mistakes).

This article brings it up, but only as a "look at what you can do with MC methods" and not "here's this fundamental relation that's really important."

There's lots of useful understanding about importance sampling and related techniques that seem obvious when you look at it as an integration problem.

Computer Intensive Methods in Mathematical Statistics[1].It was a really good course, but at the time I was not in a mental space where I could get the most out of it, so I'm kind of hoping someone else can point to a solid textbook or something.

[1]: https://www.kth.se/student/kurser/kurs/SF2955?l=en

There is also a MOOC (same title, same author) on Coursera that I found interesting. It doesn't not go to the same depth as the book however. They are more complementing each other, the book for theory, the MOOC for implementation exercises.

https://github.com/ChristopheRappold/MCandPython/blob/master...

Also just to mention something else that is very interesting related with MC : "quasi-Monte Carlo techniques" (small example in https://github.com/ChristopheRappold/MCandPython/blob/master... ). For people interested in MC, just take a look to those keywords.

Does anyone know of an accessible proof of this statement?

edit: the tricky part is recognizing that P(sum < 1) is equal to 1/n!, after that, it's the world's most recognizable taylor series

One solver in particular submitted several elegant solutions: http://math.uchicago.edu/~timblack/blog/addtoone.html

If N is the number of uniforms needed to have a sum bigger than 1, you can see that P(N>n)=P(S_n<1).

P(S_n<1) is not too bad to compute, or you can wikipedia it (Irwin-Hall distribution) to find the expression P(S_n<1)=1/n!

Combining the things, E[N]=sum(P(N>n),n=0..infinity)=sum(1/n!,n=0..infinity)=exp(1).

Heck, my buddy at MIT made an open-source Monte Carlo code called OpenMC that's now run by Argonne National Lab [1]. Now everyone can do truly legit reactor design with Monte Carlo!

[1] https://github.com/openmc-dev/openmc

For a more formal treatment, the projects in the MIT comp prob course are genuinely enjoyable to implement:

MITx 6.008.1x Computational Probability and Inference

https://www.edx.org/course/computational-probability-and-inf...

The curse of dimensionality means you often need a

hugenumber of samples to guarantee a useful level of precision. Additionally, many quantities you'd like to know come up in Monte Carlo simulations themselves. That creates an intractable O(N^2) problem.Here's an example that I encountered at work yesterday:

You want to know P(x>=a and y>=b) where (x,y) is a standard bivariate normal with correlation R.

You can:

1. Simulate a bunch of (x,y) and take the mean of the indicator function 1[x>=a, y>=b]. This will take whole seconds to compute.

2. Use an adaptive quadrature method like scipy.integrate.nquad. This will take dozens to hundreds of milliseconds.

3. Write a fixed grid Gauss-Legendre quadrature routine in c. This will take less than a microsecond.

I needed this value

insidea Monte Carlo simulation. If I had to simulate the solution, the problem would be intractable.If you're curious:

https://www.ams.org/journals/mcom/1978-32-141/S0025-5718-197...

Also if you’re plugging random numbers directly into the simulator like this article does, the convergence rate will be n^(1/2) which means that going from 8 digits of precision to 10 digits will take 10,000 times longer. In order to get a better convergence rate, you could use jittering/stratification or a low-discrepancy sequence instead of a bare random number generator. Even with a perfect QMC convergence rate of 1/n, getting 10 digits of precision reliably will take 100 times longer than 8 digits of precision.

I don’t remember the chip defect problem, so I’m not sure whether I solved it. ;) But based on your description, and knowing that Euler problems are designed to be resistant to long computations, I might bet on the convergence rate being the issue. I saw your comment about using a better RNG, but if you used a 64-bit RNG, I doubt that is the problem, I think you’d get away with a low quality RNG if you waited long enough.

https://lee-phillips.org/pidaydarts/

[1] the wonderful svelte js compiler: https://svelte.dev/

http://statweb.stanford.edu/~owen/mc/

More details here:

https://henrikwarne.com/2011/09/30/set-probabilities-revisit...

You can make a spreadsheet in 3 minutes that numerically solves the problem with MC. Now imagine that the cylinder is partly transparent and you want to know how much light makes it through. A few minutes of adjustment and the spreadsheet will do it no problem. Now imagine there is another object embedded in the first with its own transparency. Again only a relatively small amount of time is needed to get the simulation working. The analytical solution is not reachable by mortals.

What Monte Carlo allows you to do is combine basic probabilities in a vastly more complex system to see the effective average probabilities of the wholeCould you elaborate on this a bit? I'm having a hard time seeing why and how you would combine basic probabilities with a complex system.

The naive method of doing this is just to cover the space evenly and weight each value by its probability. This is often computationally inefficient because you spend lots of time generating samples that don't actually contribute much to the result.

This is why we have methods like Metropolis--Hastings or Gibbs: these take an appropriately encoded description of the desired distribution and, through clever means, generate samples from it relatively cheaply, in proportion to how likely they are.

This part of the problem is likely what the GP alluded to.

https://github.com/victorqribeiro/monteCarlo

A nice writeup that helped me understand MCTS: https://tianjun.me/static/essay_resources/Lets_Play_Hanabi/p...

> Of course, for n ≥ 365 you don’t need any calculations, it’s a straightforward consequence of the pigeonhole principle.