Predictive Modelling in Analytics

Predictive Analytics modelling is the process of creating, testing and validating a model to attempt the prediction of the probability of an outcome. A number of modeling methods from machine learning, artificial intelligence, and statistics are available in predictive analytics software solutions for this task. The model is chosen on the basis of testing, validation and evaluation using the detection theory to guess the probability of an outcome in a given set amount of input data. Models can use one or more classifiers in trying to determine the probability of a set of data belonging to another set. The different models available on the Modeling portfolio of predictive analytics software enables to derive new information about the data and to develop the predictive models. Each model has its own strengths and weakness and is best suited to particular types of problems.

A model is reusable and is created by training an algorithm using historical data and saving the model for reuse purpose to share the common business rules which can be applied to similar data, in order to analyse results without the historical data, by using the trained algorithm.

Most of the predictive modelling software solutions have the capability to export the model information into a local file in industry standard Predictive Modelling Markup Language, (PMML) format for sharing the model with other PMML compliant applications to perform analysis on similar data.

Predictive-Analytics-Process-Flow

Business process on Predictive Modelling

Creating the model : Software solutions allows you to create a model to run one or more algorithms on the data set.

Testing the model: Test the model on the data set. In some scenarios, the testing is done on past data to see how best the model predicts.

Validating the model : Validate the model run results using visualization tools and business data understanding.

Evaluating the model : Evaluating the best fit model from the models used and choosing the model right fitted for the data.

Predictive modelling process

The process involve running one or more algorithms on the data set where prediction is going to be carried out. This is an iterative processing and often involves training the model, using multiple models on the same data set and finally arriving on the best fit model based on the business data understanding.

Models Category

1.Predictive models :The models in Predictive models analyse the past performance for future predictions.

2.Descriptive models: The models in descriptive model category quantify the relationships in data in a way that is often used to classify data sets into groups.

3.Decision models: The decision models describe the relationship between all the elements of a decision in order to predict the results of decisions involving many variables.

Algorithms

Algorithms perform data mining and statistical analysis in order to determine trends and patterns in data. The predictive analytics software solutions has built in algorithms such as regressions, time series, outliers, decision trees, k-means and neural network for doing this. Most of the software also provides integration to open source R library.

Time Series Algorithms which perform time based predictions. Example Algorithms are Single Exponential Smoothing, Double Exponential Smoothing and Triple Exponential Smoothing.

Regression Algorithms which predicts continuous variables based on other variables in the dataset. Example algorithms are Linear Regression, Exponential Regression, Geometric Regression, Logarithmic Regression and Multiple Linear Regression.

Association Algorithms which Finds the frequent patterns in large transactional dataset to generate association rules. Example algorithms are Apriori

Clustering Algorithms which cluster observations into groups of similar Groups. Example algorithms are K-Means , Kohonen, and TwoStep.

Decision Trees Algorithms classify and predict one or more discrete variables based on other variables in the dataset. Example algorithms are C 4.5 and CNR Tree

Outlier Detection Algorithms detect the outlying values in the dataset. Example algorithms are Inter Quartile Range and Nearest Neighbour Outlier

Neural Network Algorithms does the forecasting, classification, and statistical pattern recognition. Example algorithms are NNet Neural Network and MONMLP Neural Network

Ensemble models are a form of Monte Carlo analysis where multiple numerical predictions are conducted using slightly different initial conditions.

Factor Analysis deals with variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. Example algorithms are Maximum likelihood algorithm.

Naive Bayes are probabilistic classifier based on applying Bayes’ theorem with strong (naive) independence assumptions.

Support vector machines are supervised learning models with associated learning algorithms that analyse data and recognise patterns, used for classification and regression analysis.

Uplift modelling, models the incremental impact of a treatment on an individual’s behaviour.

Survival analysis are analysis of time to events.

Features in Predictive Modeling

1) Data Analysis and manipulation : Tools for data analysis, create new data sets, modify, club, categorise, merge and filter data sets

2) Visualisation : Visualisation features includes interactive graphics, reports.

3) Statistics : Statistics tools to create and confirm the relationships between variables in the data. Statistics from different statistical software can be integrated to some of the solutions.

4) Hypothesis testing : Creation of models, evaluation and choosing of the right model.

So there we have the modelling element of the overall process, in the next post we will tackle the subject of Model deployment and monitoring.

Leave a comment