AyushNet_Blog‎ > ‎

"PAC" (Probably Approximately Correct) Learning and Calculus

posted Aug 29, 2015, 9:57 PM by Shyam Sarkar   [ updated Aug 31, 2015, 12:38 AM ]

The learning processes in machine learning algorithms are generalizations from past experiences. After having experienced a learning data set, the generalization process is the ability of a machine learning algorithm to accurately execute on new examples and tasks. The learner needs to build a general model about a problem space enabling a machine learning algorithm to produce sufficiently accurate predictions in future cases. The training examples come from some generally unknown probability distribution.

In theoretical computer science, computational learning theory performs computational analysis of machine learning algorithms and their performance. The training data set is limited in size and may not capture all forms of distributions in future data sets. The performance is represented by probabilistic bounds. Errors in generalization are quantified by bias-variance decompositions. The time complexity and feasibility of learning in computational learning theory describes a computation to be feasible if it is done in polynomial time. Positive results are determined and classified when a certain class of functions can be learned in polynomial time whereas negative results are determined and classified when learning cannot be done in polynomial time.

PAC (Probably Approximately Correct) learning is a framework for mathematical analysis of machine learning theory.  The basic idea of PAC learning is that a really bad hypothesis can be easy to identify.  A bad hypothesis will err on one of the training examples with high probability. A consistent hypothesis will be probably approximately correct. If there are more training examples, then the probability of “approximately correct” becomes much higher. The theory investigates questions about (a) sample complexity: how many training examples are needed to learn a successful hypothesis, (b) computational complexity: how much computational effort is needed to learn a successful hypothesis, and finally (c) bounds for mistakes: how many training examples will the learner misclassify before converging to a successful hypothesis.

Mathematically, let (1) X be the set of all possible examples, (2) D be the probability distribution over X from which observed instances are drawn, (3) C be the set of all possible concepts c, where c: X ® {0.1}, and (4) H be the set of possible hypothesis considered by a learner, H Í C. The true error of hypothesis h, with respect to the target concept c and observation distribution D is the probability P that h will misclassify an instance drawn according to D:

The error should be zero in the ideal case. A concept class C is “PAC learnable” by a hypothesis class H if and only if there exists a learning algorithm L such that given any target concept c in C, any target distribution D over the possible examples X, and any pair of real numbers 0 < ε, δ < 1, L takes as input a training set of m examples drawn according to D, where the size of m is bounded above by a polynomial in 1/ε and 1/δ and outputs an hypothesis h in H about which it is true with confidence (probability over all possible choices of the training set) greater than 1 −  δ, then the error of the hypothesis is less than ε.

A hypothesis is consistent with the training data if it returns the correct classification for every example presented it. A consistent learner returns only hypotheses that are consistent with the training data. Given a consistent learner, the number of examples sufficient to assure that any hypothesis will be probably (with probability (1 - δ)) approximately (within error ε) correct is

Calculus is an important branch of mathematics not considered so far as one of the building blocks of machine learning techniques. Calculus is used in every branch of physical science, computer science, statistics, engineering, economics, business, medicine, meteorology, epidemiology and in other fields wherever there is a need to mathematically model a problem to derive an optimal solution. It allows one to go from (non-constant) rates of change to the total change or vice versa. A mathematical model represented in calculus for a large data set can very well represent a hypothesis with very low error (ε) or zero error in machine learning. A complex hypothesis is possible to be constructed with one or more part(s) being represented in calculus based model(s).

The fundamental theorem of calculus states that differentiation and integration are inverse operations. More precisely, it relates the values of anti-derivatives to definite integrals. It can also be interpreted as a precise statement of the fact that differentiation is the inverse of integration. In machine learning, if a hypothesis involves model(s) represented in calculus then there must be complementing processes of differentiation and integration involved in the overall learning processes. Calculus based mathematical models can be used as part of a hypothesis for machine learning over a wide variety of data sets derived from devices such as heart monitoring implants, bio-chip transponders on farm animals, electric clams in coastal waters, automobiles with built-in sensors, smart homes, smart cities or airplanes with sensors. These devices or sensors used inside physical, biological or environmental systems collect large volumes of data. Efficient machine learning algorithms for such data sets can use hypothesis based on mathematical models involving both calculus and statistics.