Modeling Engine: CART (Decision Trees)
Modeling Engine: MARS (Nonlinear Regression)
Modeling Engine: TreeNet (Stochastic Gradient Boosting)
Modeling Engine: RandomForests for Classification
Reporting ROC curves during model building and model scoring
Model performance stats based on Cross Validation
Model performance stats based on out of bag data during bootstrapping
Reporting performance summaries on learn and test data partitions
Reporting Gains and Lift Charts during model building and model scoring
Automatic creation of Command Logs
Built-in support to create, edit, and execute command files
Translating models into SAS-compatible language
Reading and writing datasets in all current database/statistical file formats, including csv file format
Option to save processed datasets into all current database/statistical file formats
Select Cases in Score Setup
TreeNet Scoring Offset in Score Setup
Setting of focus class supported for all categorical variables<
Scalable limits on terminal nodes. This is a special mode that will ensure the ATOM and/or MINCHILD
Descriptive Statistics: Summary Stats, Stratified Stats, Charts and Histograms
Activity Window: Brief data description, quick navigation to most common activities
Additional Modeling Engines: Regularized Regression (LASSO/Ridge/LARS/Elastic Net/GPS)
Data analysis Binning Engine
Automatic creation of missing value indicators
Option to treat missing value in a categorical predictor as a new level
License to any level supported by RAM (currently 32MB to 1TB)
License for multi-core capabilities
Using built-in BASIC Programming Language during data preparation
Automatic creation of lag variables based on user specifications during data preparation
Automatic creation and reporting of key overall and stratified summary statistics for user supplied list of variables
Display charts, histograms, and scatter plots for user selected variables
Command Line GUI Assistant to simplify creating and editing command files
Translating models into SAS/PMML/C/Java/Classic and ability to create classic and specialized reports for existing models
Unsupervised Learning - Breiman's column scrambler
Scoring any Automate (pre-packaged scenario of runs) as an ensemble model
Summary statistics based on missing value imputation using scoring mechanism
Impute options in Score Setup
GUI support of SCORE PARTITIONS (GUI feature, SCORE PARTITIONS=YES)
Quick Impute Analysis Engine: One-step statistical and model based imputation
Advanced Imputation via Automate TARGET. Control over fill selection and new impute variable creation
Correlation computation of over 10 different types of correlation
Save OOB predictions from cross-validation models
Custom selection of a new predictors list from an existing variable importance report
User defined bins for Cross Validation
Cross-Validation models can now be scored as an Ensemble
An alternative to variable importance based on Leo Breiman's scrambler
Data Binning Results display (GUI feature)
Data Binning Analysis Engine bins variables using model-based binning (via AUTOMATE BIN), or using weights of evidence coding
BIN ROUND, ADAPTIVEROUND methods (BIN METHOD=ROUND/ADAPTIVEROUND)
Controls for number of Bins and Deciles (BOPTIONS NBINS, NDECILES)
EVAL command and GUI display (GUI feature)
Summary stats for the correlations (Correlation Stats tab) (GUI feature)
TONUMERIC: create contiguous integer variables from other variables
Automation: Build two models reversing the roles of the learn and test samples (Automate FLIP)
Automation: Explore model stability by repeated random drawing of the learn sample from the original dataset (Automate DRAW)
Automation: For time series applications, build models based on sliding time window using a large array of user options (Automate DATASHIFT)
Automation: Explore mutual multivariate dependencies among available predictors (Automate TARGET)
Automated imputation of all missing values (via Automate Target)
Automation: Explore the effects of the learn sample size on the model performance (Automate LEARN CURVE)
Automation: Build a series of models by varying the random number seed (Automate SEED
)
Automation: Explore the marginal contribution of each predictor to the existing model (Automate LOVO)
Automation: Explore model stability by repeated repartitioning of the data into learn, test, and possibly hold-out samples (Automate PARTITION)
Automation: Explore the nonlinear univariate relationships between the target and each available predictor (Automate ONEOFF)
Automation: Bootstrapping process (sampling with replacement from the learn sample) with a large array of user options (Random Forests-style sampling of predictors, saving in-bag and out-of-bag scores, proximity matrix, and node dummies) (Automate BOOTSTRAP)
Automation: AUTOMATE ENABLETIMING=YES|NO to control timing reporting in Automates
Save out of bag predictions during Cross Validation
Use TREATMENT variables when scoring uplift models (SCORE EVAL)
Use TREATMENT variables when evaluating uplift model predictions (EVAL)
Automation: "Shifts" the "crossover point" between learn and test samples with each cycle of the Automate (Automate LTCROSSOVER)
Automation: Build a series of models using different backward variable selection strategies (Automate SHAVING)
Automation: Build a series of models using the forward-stepwise variable selection strategy (Automate STEPWISE)
Automation: Explore nonlinear univariate relationships between each available predictor and the target (Automate XONY)
Automation: Build a series of models using randomly sampled predictors (Automate KEEP)
Automation: Explore the impact of a potential replacement of a given predictor by another one (Automate SWAP)
Automation: Parametric bootstrap process (Automate PBOOT)
Automation: Build a series of models for each strata defined in the dataset (Automate STRATA)
Automation: Generate detailed univariate stats on every continuous predictor to spot potential outliers and problematic records (AUTOMATE OUTLIERS)
Automation: Convert (bin) all continuous variables into categorical (discrete) versions using a large array of user options (equal width, weights of evidence, Naïve Bayes, superwised) (AUTOMATE BIN)
Automation: Build a series of models using every available data mining engine (Automate MODELS)
Automation: Run TreeNet for Predictor selection, Auto-bin predictors, then build a series of models using every available data mining engine (Automate GLM)
Modeling Pipelines: RuleLearner, ISLE