The CALSTATDN
model combines methods of Calculus (CAL), Statistics (STAT) and database normalization
(DN) to process and analyze large volumes of data captured from sensors,
embedded devices or any other internet data source within any physical,
environmental or biological systems as part of “Internet of Things” (IoT). FIG 1 is a conceptual flow diagram illustrating unique conceptual model
for machine learning in CALSTATDN model. An unknown target function f (X) is
shown. Some samples are created from probability distribution P on data
values X for
training purposes. Derivatives or rates of changes of values in X are
computed and are also used for training a model. The training sets are used to
train a model for correct hypothesis in order to find an approximation of the
function f. The best hypothesis g must
fit the training samples very well. In CALSTATDN model, if the set H is
chosen carefully to involve models from both calculus and statistics then the
approximation will be very close to the true unknown function f provided
there are enough training data sets. The best hypothesis g(X) uniquely applies both calculus and statistics-based models
with high levels of correctness leading to very small error estimated. The
CALSTATDN model exploits the powerful ideas of rates of changes,
differentiation and integration in calculus along with ideas of generalizations
over sets of values in statistical computing to derive the best hypothesis g(X) to explain the behavior of any
function f(X)
with fewer data points and with few generalizations over unseen data
points compared to other conventional machine learning methods.FIG
2 is a flow diagram showing the implementation with iterative stages of final
machine learning hypothesis described in FIG 1. FIG
2 further illustrates partitioning of data sets based on queries into
normalized data tables. Data partitions based on partitioning over primary keys
are illustrated. These partitions are used for parallel executions at the next
level of machine learning. There are condition checks whether more processing
is needed or not. If more processing is needed with answer “Yes”, the next
stage of processing goes back to calculus based computing stage for
iteration(s). If no more processing is necessary with answer “No”, the next
processing stage goes for integration operations in calculus along with other
operations to generate results of analysis in the form of graphs and charts.
The integration operation is necessary as an inverse operation over derivatives
computed at earlier stages in this method. In CALSTATDN model, parallel
executions at several iterative stages improve overall performance by several
orders of magnitudes. The flow diagram of FIG 2 demonstrates unique iterative stages over calculus based operations; statistics based operations and database normalizations for querying data partitions leading to next stage of parallel computations. These iterative stages are applied to analyze extremely large data sets with much more improvements in performance and much higher levels of correctness for machine learning with reduction in errors. |

AyushNet_Blog >