### AyushNet_Blog

#### CALSTATDN Model

The CALSTATDN
model combines methods of Calculus (CAL), Statistics (STAT) and database normalization
(DN) to process and analyze large volumes of data captured from sensors,
embedded devices or any other internet data source within any physical,
environmental or biological systems as part of “Internet of Things” (IoT). FIG 1 is a conceptual flow diagram illustrating unique conceptual model
for machine learning in CALSTATDN model. An unknown target function f (X) is
shown. Some samples are created from probability distribution P on data
values X for
training purposes. Derivatives or rates of changes of values in X are
computed and are also used for training a model. The training sets are used to
train a model for correct hypothesis in order to find an approximation of the
function f. The best hypothesis g must
fit the training samples very well. In CALSTATDN model, if the set H is
chosen carefully to involve models from both calculus and statistics then the
approximation will be very close to the true unknown function f provided
there are enough training data sets. The best hypothesis g(X) uniquely applies both calculus and statistics-based models
with high levels of correctness leading to very small error estimated. The
CALSTATDN model exploits the powerful ideas of rates of changes,
differentiation and integration in calculus along with ideas of generalizations
over sets of values in statistical computing to derive the best hypothesis g(X) to explain the behavior of any
function f(X)
with fewer data points and with few generalizations over unseen data
points compared to other conventional machine learning methods.FIG
2 is a flow diagram showing the implementation with iterative stages of final
machine learning hypothesis described in FIG 1. FIG
2 further illustrates partitioning of data sets based on queries into
normalized data tables. Data partitions based on partitioning over primary keys
are illustrated. These partitions are used for parallel executions at the next
level of machine learning. There are condition checks whether more processing
is needed or not. If more processing is needed with answer “Yes”, the next
stage of processing goes back to calculus based computing stage for
iteration(s). If no more processing is necessary with answer “No”, the next
processing stage goes for integration operations in calculus along with other
operations to generate results of analysis in the form of graphs and charts.
The integration operation is necessary as an inverse operation over derivatives
computed at earlier stages in this method. In CALSTATDN model, parallel
executions at several iterative stages improve overall performance by several
orders of magnitudes. The flow diagram of FIG 2 demonstrates unique iterative stages over calculus based operations; statistics based operations and database normalizations for querying data partitions leading to next stage of parallel computations. These iterative stages are applied to analyze extremely large data sets with much more improvements in performance and much higher levels of correctness for machine learning with reduction in errors. |

#### The THING in "Internet of Things"

A thing, in the Internet of things (IoT), is an entity or physical object that has the ability to transfer data over a network. A thing has an identifier. I have some questions about such THING in "Internet of Things". IoT is expected to generate large amounts of data from diverse locations that increase very quickly, thereby increasing the need to better index, store and process such data. A thing can be simple thing like a sensor or a composite thing made up of disparate or separate parts of things. IoT software vendors might like to master a particular stage of the machine-to-machine (M2M) data flow process, such as: (1) managing the communication with connected devices/sensors or with composite things comprising many sensors/devices; (2) providing middleware for integration to data repositories where definitions for both simple and composite things should be maintained (3) storing data from simple or composite things with static or dynamic relationships; (4) securing the simple or composite data; and (5) analyzing and visualizing data for all types of things including composite things; Are we going to deal with objects and interrelationships over objects at all levels of data flow process ? |

#### "PAC" (Probably Approximately Correct) Learning and Calculus

The learning processes in machine learning algorithms are generalizations from past experiences. After having experienced a learning data set, the generalization process is the ability of a machine learning algorithm to accurately execute on new examples and tasks. The learner needs to build a general model about a problem space enabling a machine learning algorithm to produce sufficiently accurate predictions in future cases. The training examples come from some generally unknown probability distribution. In theoretical computer science, computational learning theory performs computational analysis of machine learning algorithms and their performance. The training data set is limited in size and may not capture all forms of distributions in future data sets. The performance is represented by probabilistic bounds. Errors in generalization are quantified by bias-variance decompositions. The time complexity and feasibility of learning in computational learning theory describes a computation to be feasible if it is done in polynomial time. Positive results are determined and classified when a certain class of functions can be learned in polynomial time whereas negative results are determined and classified when learning cannot be done in polynomial time. PAC (Probably Approximately Correct) learning is a framework for mathematical analysis of machine learning theory. The basic idea of PAC learning is that a really bad hypothesis can be easy to identify. A bad hypothesis will err on one of the training examples with high probability. A consistent hypothesis will be probably approximately correct. If there are more training examples, then the probability of “approximately correct” becomes much higher. The theory investigates questions about (a) sample complexity: how many training examples are needed to learn a successful hypothesis, (b) computational complexity: how much computational effort is needed to learn a successful hypothesis, and finally (c) bounds for mistakes: how many training examples will the learner misclassify before converging to a successful hypothesis. Mathematically, let (1) X be the set of all possible examples, (2) D be
the probability distribution over X from which observed instances are drawn, (3)
C be the set of all possible concepts A hypothesis is consistent with the training data if it returns the correct classification for every example presented it. A consistent learner returns only hypotheses that are consistent with the training data. Given a consistent learner, the number of examples sufficient to assure that any hypothesis will be probably (with probability (1 - δ)) approximately (within error ε) correct is Calculus is an important branch of mathematics not considered so far as one of the building blocks of machine learning techniques. Calculus is used in every branch of physical science, computer science, statistics, engineering, economics, business, medicine, meteorology, epidemiology and in other fields wherever there is a need to mathematically model a problem to derive an optimal solution. It allows one to go from (non-constant) rates of change to the total change or vice versa. A mathematical model represented in calculus for a large data set can very well represent a hypothesis with very low error (ε) or zero error in machine learning. A complex hypothesis is possible to be constructed with one or more part(s) being represented in calculus based model(s). The fundamental theorem of calculus states that differentiation and integration are inverse operations. More precisely, it relates the values of anti-derivatives to definite integrals. It can also be interpreted as a precise statement of the fact that differentiation is the inverse of integration. In machine learning, if a hypothesis involves model(s) represented in calculus then there must be complementing processes of differentiation and integration involved in the overall learning processes. Calculus based mathematical models can be used as part of a hypothesis for machine learning over a wide variety of data sets derived from devices such as heart monitoring implants, bio-chip transponders on farm animals, electric clams in coastal waters, automobiles with built-in sensors, smart homes, smart cities or airplanes with sensors. These devices or sensors used inside physical, biological or environmental systems collect large volumes of data. Efficient machine learning algorithms for such data sets can use hypothesis based on mathematical models involving both calculus and statistics. |