MAP learning

When selecting the right hypothesis, a Bayesian approach is normally one of the best choices, because it takes into account all the factors and, as we'll see, even if it's based on conditional independence, such an approach works perfectly when some factors are partially dependent. However, its complexity (in terms of probabilities) can easily grow because all terms must always be taken into account. For example, a real coin is a very short cylinder, so, in tossing a coin, we should also consider the probability of even (when the coin lies on its border). Let's say, it's 0.001. It means that we have three possible outcomes: P(head) = P(tail) = (1.0 - 0.001)/2.0 and P(even) = 0.001. The latter event is obviously unlikely, but, in Bayesian learning, it must be considered (even if it'll be squeezed by the strength of the other terms).

An alternative is picking the most probable hypothesis in terms of A Posteriori probability:

This approach is called MAP and it can really simplify the scenario when some hypotheses are quite unlikely (for example, in tossing a coin, a MAP hypothesis will discard P(even)). However, it still does have an important drawback: it depends on Apriori probabilities (remember that maximizing the A Posteriori implies also considering the Apriori). As Russel and Norvig (Artificial Intelligence: A Modern Approach, Russel S., Norvig P., Pearson) pointed out, this is often a delicate part of an inferential process, because there's always a theoretical background that can lead to a particular choice and exclude others. In order to rely only on data, it's necessary to have a different approach.