Data Mining is the process of running data through sophisticated algorithms to uncover meaningful patterns and correlations that may otherwise be hidden. These can be used to help you understand the business better and also exploited to improve future performance through predictive analytics. For example, data mining can warn you if there’s a high probability a specific customer won’t pay on time based on an analysis of customers with similar characteristics.

Uncover hidden patterns and correlations in data
Predict future events based on historical patterns

 

 

Support for the whole process of data mining
 
Preparation of input data
Statistical evaluation of learning schemes
Visualization of input data and the result of learning
Tools and algorithms
 
69 data pre-processing tools
116 classification/regression algorithms
11 clustering algorithms
18 attribute/subset evaluators + 12 search algorithms for feature selection
5 algorithms for finding association rules
User Interfaces
 
Explorer - data exploration/visualization, model construction and export, preliminary evaluation
Experimenter - large-scale algorithm comparison with statistical tests for significant differences in performance
KnowledgeFlow - process model view of data mining, export of DM process
 
Pentaho Data Mining Feature Details
Powerful Data Mining Engine
 
Provides a comprehensive set of machine learning algorithms from the Weka project including clustering, segmentation, decision trees, random forests, neural networks, and principal component analysis.
Pentaho has added integration with the Pentaho Data Integration and automated the process of transforming data into the format the data mining engine needs.
Algorithms can either be applied directly to a dataset or called from Java code.
Output can be viewed graphically, interacted with programmatically, or used data source for reports, further analysis, and other processes.
Filters are provided for discretization, normalization, re-sampling, attribute selection, and transforming and combining attributes.
Classifiers provide models for predicting nominal or numeric quantities. Learning schemes include decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, and other advanced techniques.
The data mining engine is also well-suited for developing new machine learning schemes, enabling customers to incorporate their own models.
Inputs and outputs can be controlled programmatically, enabling developers to create completely custom solutions using the components provided.
Support for Predictive Model Markup Language (PMML)
Graphical Design Tools
 
Graphical user interfaces are provided for data pre-processing, classification, regression, clustering, association rules, and visualization.
KnowledgeFlow shows you the flow of data through the system and the processes that it goes through.
Uncover Hidden Patterns and Relationships
 

A classic example of data mining is a retailer who uncovers a relationship between sales of diapers and beer on Sunday afternoons – two items you wouldn’t normally consider as linked. The explanation is that husbands who are sent out to pick up a fresh supply of diapers are also likely to pick up some beer while they happen to be in the store – something that hadn’t been recognized as a significant sales driver before data mining uncovered it.

 
Exploit Insights to Improve Performance
 

Continuing the example above, very often retailers act on the relationships they discover by using tactics such as placing linked items together on end-of-isle displays as a way to spur additional purchases. All organizations can benefit from acting in a similar way – using newly discovered patterns and correlations as the basis for taking action to improve their efficiency and effectiveness.

Predict Future Performance
 

Those who do not learn from history are doomed to repeat it” is a famous quote from philosopher George Santayana. In the case of data mining, being able to predict outcomes based on historic data can dramatically improve the quality and outcomes of decision making in the present. As a simple example, if the best indicator of whether a customer will pay on time turns out to be a combination of their market segment and whether or not they have paid previous bills on time, then this is information you can usefully benefit from in making current credit decisions.

Embed Insights into Your Applications
 

You can use the data mining results to display a simple summary statement and recommendations within operational applications. For example, on a credit screen you could add: “Based on this new account profile there is an 85% chance this customer will pay late. It is therefore recommended you require a 50% prepayment on this order”. Reporting on aggregate results such as Days Sales Outstanding (DSO) enables you to measure business improvements based on when recommendations were followed and when they weren’t so that you can fine-tune your model and recommendations over time for optimal effect.

Wide Range of Algorithms
 

No algorithm is likely to be optimal in all situations. For this reason it’s important that you’re able to try out a range to find the algorithm that fits a particular set of data the best. If you find several data mining algorithms that fit well, you can use all of them - for example: “Based on analysis of 3 predictive models, the chances this customer will pay late are; Model A: 95% (96% correct), Model B: 89% (92% correct), Model C: 76% (97% correct)”.

 
  Knowledge Explorer lets you explore your data and prepare it for data mining.
How Data Mining Works
  Choosing a Model
Analysts can work with a range of models graphically. These include many advanced forms of data mining such as clustering, segmentation, decision trees, random forests, neural nets, and principal component analysis.
  Adding Data
Value-added features can be added to the data. For example, you can specify thresholds and have the system automatically “bucket” or derive data to create new columns for analysis.
  Adapting
Each model works to adapt its parameters to attempt a best fit to the sample data. Analysts can let this happen automatically, or manually adjust parameters (depending on the model).
  Evaluating
Results can be evaluated by applying the model to historical data to test its predictive power compared to actual results.
  Perfecting
The cycle of adapting the model until it is optimized is known as “training” the model. Once properly trained, the model will reliably yield the best results for the specific business purpose it is being applied to.
  Delivering
Output can be in a multitude of forms. For example, you might choose to include a simple statement within another application, or output a graphical decision tree that users can navigate.
   
Copyright © 2010 4Sight Business Intelligence, Inc. All Rights Reserved.    |          Home   |   Contact Us   |   Sitemap           Website Design by 123Triad Web Design