文摘
We consider the problem of learning a target probability distribution over a set of N binary variables from the knowledge of the expectation values (with this target distribution) of M observables, drawn uniformly at random. The space of all probability distributions compatible with these M expectation values within some fixed accuracy, called version space, is studied. We introduce a biased measure over the version space, which gives a boost increasing exponentially with the entropy of the distributions and with an arbitrary inverse ‘temperature-\(\Gamma \). The choice of \(\Gamma \) allows us to interpolate smoothly between the unbiased measure over all distributions in the version space (\(\Gamma =0\)) and the pointwise measure concentrated at the maximum entropy distribution (\(\Gamma \rightarrow \infty \)). Using the replica method we compute the volume of the version space and other quantities of interest, such as the distance R between the target distribution and the center-of-mass distribution over the version space, as functions of \(\alpha =(\log M)/N\) and \(\Gamma \) for large N. Phase transitions at critical values of \(\alpha \) are found, corresponding to qualitative improvements in the learning of the target distribution and to the decrease of the distance R. However, for fixed \(\alpha \), the distance R does not vary with \(\Gamma \), which means that the maximum entropy distribution is not closer to the target distribution than any other distribution compatible with the observable values. Our results are confirmed by Monte Carlo sampling of the version space for small system sizes (\(N\le 10\)). Keywords Probabilistic inference Maximum entropy principle Replica method