SPM General Components

  • Modeling Engine: CART (Decision Trees)
  • Modeling Engine: MARS (Nonlinear Regression)
  • Modeling Engine: TreeNet (Stochastic Gradient Boosting)
  • Modeling Engine: RandomForests for Classification
  • Reporting ROC curves during model building and model scoring
  • Model performance stats based on Cross Validation
  • Model performance stats based on out of bag data during bootstrapping
  • Reporting performance summaries on learn and test data partitions
  • Reporting Gains and Lift Charts during model building and model scoring
  • Automatic creation of Command Logs
  • Built-in support to create, edit, and execute command files
  • Translating models into SAS-compatible language
  • Reading and writing datasets in all current database/statistical file formats, including csv file format
  • Option to save processed datasets into all current database/statistical file formats
  • Select Cases in Score Setup
  • TreeNet Scoring Offset in Score Setup
  • Setting of focus class supported for all categorical variables<
  • Scalable limits on terminal nodes. This is a special mode that will ensure the ATOM and/or MINCHILD
  • Descriptive Statistics: Summary Stats, Stratified Stats, Charts and Histograms
  • Activity Window: Brief data description, quick navigation to most common activities
  • Additional Modeling Engines: Regularized Regression (LASSO/Ridge/LARS/Elastic Net/GPS)
  • Data analysis Binning Engine
  • Automatic creation of missing value indicators
  • Option to treat missing value in a categorical predictor as a new level
  • License to any level supported by RAM (currently 32MB to 1TB)
  • License for multi-core capabilities
  • Using built-in BASIC Programming Language during data preparation
  • Automatic creation of lag variables based on user specifications during data preparation
  • Automatic creation and reporting of key overall and stratified summary statistics for user supplied list of variables
  • Display charts, histograms, and scatter plots for user selected variables
  • Command Line GUI Assistant to simplify creating and editing command files
  • Translating models into SAS/PMML/C/Java/Classic and ability to create classic and specialized reports for existing models
  • Unsupervised Learning - Breiman's column scrambler
  • Scoring any Automate (pre-packaged scenario of runs) as an ensemble model
  • Summary statistics based on missing value imputation using scoring mechanism
  • Impute options in Score Setup
  • GUI support of SCORE PARTITIONS (GUI feature, SCORE PARTITIONS=YES)
  • Quick Impute Analysis Engine: One-step statistical and model based imputation
  • Advanced Imputation via Automate TARGET. Control over fill selection and new impute variable creation
  • Correlation computation of over 10 different types of correlation
  • Save OOB predictions from cross-validation models
  • Custom selection of a new predictors list from an existing variable importance report
  • User defined bins for Cross Validation
  • Cross-Validation models can now be scored as an Ensemble
  • An alternative to variable importance based on Leo Breiman's scrambler
  • Data Binning Results display (GUI feature)
  • Data Binning Analysis Engine bins variables using model-based binning (via AUTOMATE BIN), or using weights of evidence coding
  • BIN ROUND, ADAPTIVEROUND methods (BIN METHOD=ROUND/ADAPTIVEROUND)
  • Controls for number of Bins and Deciles (BOPTIONS NBINS, NDECILES)
  • EVAL command and GUI display (GUI feature)
  • Summary stats for the correlations (Correlation Stats tab) (GUI feature)
  • TONUMERIC: create contiguous integer variables from other variables
  • Automation: Build two models reversing the roles of the learn and test samples (Automate FLIP)
  • Automation: Explore model stability by repeated random drawing of the learn sample from the original dataset (Automate DRAW)
  • Automation: For time series applications, build models based on sliding time window using a large array of user options (Automate DATASHIFT)
  • Automation: Explore mutual multivariate dependencies among available predictors (Automate TARGET)
  • Automated imputation of all missing values (via Automate Target)
  • Automation: Explore the effects of the learn sample size on the model performance (Automate LEARN CURVE)
  • Automation: Build a series of models by varying the random number seed (Automate SEED
  • )
  • Automation: Explore the marginal contribution of each predictor to the existing model (Automate LOVO)
  • Automation: Explore model stability by repeated repartitioning of the data into learn, test, and possibly hold-out samples (Automate PARTITION)
  • Automation: Explore the nonlinear univariate relationships between the target and each available predictor (Automate ONEOFF)
  • Automation: Bootstrapping process (sampling with replacement from the learn sample) with a large array of user options (Random Forests-style sampling of predictors, saving in-bag and out-of-bag scores, proximity matrix, and node dummies) (Automate BOOTSTRAP)
  • Automation: AUTOMATE ENABLETIMING=YES|NO to control timing reporting in Automates
  • Save out of bag predictions during Cross Validation
  • Use TREATMENT variables when scoring uplift models (SCORE EVAL)
  • Use TREATMENT variables when evaluating uplift model predictions (EVAL)
  • Automation: "Shifts" the "crossover point" between learn and test samples with each cycle of the Automate (Automate LTCROSSOVER)
  • Automation: Build a series of models using different backward variable selection strategies (Automate SHAVING)
  • Automation: Build a series of models using the forward-stepwise variable selection strategy (Automate STEPWISE)
  • Automation: Explore nonlinear univariate relationships between each available predictor and the target (Automate XONY)
  • Automation: Build a series of models using randomly sampled predictors (Automate KEEP)
  • Automation: Explore the impact of a potential replacement of a given predictor by another one (Automate SWAP)
  • Automation: Parametric bootstrap process (Automate PBOOT)
  • Automation: Build a series of models for each strata defined in the dataset (Automate STRATA)
  • Automation: Generate detailed univariate stats on every continuous predictor to spot potential outliers and problematic records (AUTOMATE OUTLIERS)
  • Automation: Convert (bin) all continuous variables into categorical (discrete) versions using a large array of user options (equal width, weights of evidence, Naïve Bayes, superwised) (AUTOMATE BIN)
  • Automation: Build a series of models using every available data mining engine (Automate MODELS)
  • Automation: Run TreeNet for Predictor selection, Auto-bin predictors, then build a series of models using every available data mining engine (Automate GLM)
  • Modeling Pipelines: RuleLearner, ISLE
  •