/AWS1/CL_SGMTABULARJOBCONFIG¶
The collection of settings used by an AutoML job V2 for the tabular problem type.
CONSTRUCTOR¶
IMPORTING¶
Required arguments:¶
iv_targetattributename TYPE /AWS1/SGMTARGETATTRIBUTENAME /AWS1/SGMTARGETATTRIBUTENAME¶
The name of the target variable in supervised learning, usually represented by 'y'.
Optional arguments:¶
io_candidategenerationconfig TYPE REF TO /AWS1/CL_SGMCANDIDATEGENERAT00 /AWS1/CL_SGMCANDIDATEGENERAT00¶
The configuration information of how model candidates are generated.
io_completioncriteria TYPE REF TO /AWS1/CL_SGMAUTOMLJOBCOMPLET00 /AWS1/CL_SGMAUTOMLJOBCOMPLET00¶
How long a job is allowed to run, or how many candidates a job is allowed to generate.
iv_featurespecifications3uri TYPE /AWS1/SGMS3URI /AWS1/SGMS3URI¶
A URL to the Amazon S3 data source containing selected features from the input data source to run an Autopilot job V2. You can input
FeatureAttributeNames(optional) in JSON format as shown below:
{ "FeatureAttributeNames":["col1", "col2", ...] }.You can also specify the data type of the feature (optional) in the format shown below:
{ "FeatureDataTypes":{"col1":"numeric", "col2":"categorical" ... } }These column keys may not include the target column.
In ensembling mode, Autopilot only supports the following data types:
numeric,categorical,text, anddatetime. In HPO mode, Autopilot can supportnumeric,categorical,text,datetime, andsequence.If only
FeatureDataTypesis provided, the column keys (col1,col2,..) should be a subset of the column names in the input data.If both
FeatureDataTypesandFeatureAttributeNamesare provided, then the column keys should be a subset of the column names provided inFeatureAttributeNames.The key name
FeatureAttributeNamesis fixed. The values listed in["col1", "col2", ...]are case sensitive and should be a list of strings containing unique values that are a subset of the column names in the input data. The list of columns provided must not include the target column.
iv_mode TYPE /AWS1/SGMAUTOMLMODE /AWS1/SGMAUTOMLMODE¶
The method that Autopilot uses to train the data. You can either specify the mode manually or let Autopilot choose for you based on the dataset size by selecting
AUTO. InAUTOmode, Autopilot choosesENSEMBLINGfor datasets smaller than 100 MB, andHYPERPARAMETER_TUNINGfor larger ones.The
ENSEMBLINGmode uses a multi-stack ensemble model to predict classification and regression tasks directly from your dataset. This machine learning mode combines several base models to produce an optimal predictive model. It then uses a stacking ensemble method to combine predictions from contributing members. A multi-stack ensemble model can provide better performance over a single model by combining the predictive capabilities of multiple models. See Autopilot algorithm support for a list of algorithms supported byENSEMBLINGmode.The
HYPERPARAMETER_TUNING(HPO) mode uses the best hyperparameters to train the best version of a model. HPO automatically selects an algorithm for the type of problem you want to solve. Then HPO finds the best hyperparameters according to your objective metric. See Autopilot algorithm support for a list of algorithms supported byHYPERPARAMETER_TUNINGmode.
iv_generatecandidatedefnso00 TYPE /AWS1/SGMGENERATECANDIDATEDE00 /AWS1/SGMGENERATECANDIDATEDE00¶
Generates possible candidates without training the models. A model candidate is a combination of data preprocessors, algorithms, and algorithm parameter settings.
iv_problemtype TYPE /AWS1/SGMPROBLEMTYPE /AWS1/SGMPROBLEMTYPE¶
The type of supervised learning problem available for the model candidates of the AutoML job V2. For more information, see SageMaker Autopilot problem types.
You must either specify the type of supervised learning problem in
ProblemTypeand provide the AutoMLJobObjective metric, or none at all.
iv_sampleweightattributename TYPE /AWS1/SGMSAMPLEWEIGHTATTRNAME /AWS1/SGMSAMPLEWEIGHTATTRNAME¶
If specified, this column name indicates which column of the dataset should be treated as sample weights for use by the objective metric during the training, evaluation, and the selection of the best model. This column is not considered as a predictive feature. For more information on Autopilot metrics, see Metrics and validation.
Sample weights should be numeric, non-negative, with larger values indicating which rows are more important than others. Data points that have invalid or no weight value are excluded.
Support for sample weights is available in Ensembling mode only.
Queryable Attributes¶
CandidateGenerationConfig¶
The configuration information of how model candidates are generated.
Accessible with the following methods¶
| Method | Description |
|---|---|
GET_CANDIDATEGENERATIONCFG() |
Getter for CANDIDATEGENERATIONCONFIG |
CompletionCriteria¶
How long a job is allowed to run, or how many candidates a job is allowed to generate.
Accessible with the following methods¶
| Method | Description |
|---|---|
GET_COMPLETIONCRITERIA() |
Getter for COMPLETIONCRITERIA |
FeatureSpecificationS3Uri¶
A URL to the Amazon S3 data source containing selected features from the input data source to run an Autopilot job V2. You can input
FeatureAttributeNames(optional) in JSON format as shown below:
{ "FeatureAttributeNames":["col1", "col2", ...] }.You can also specify the data type of the feature (optional) in the format shown below:
{ "FeatureDataTypes":{"col1":"numeric", "col2":"categorical" ... } }These column keys may not include the target column.
In ensembling mode, Autopilot only supports the following data types:
numeric,categorical,text, anddatetime. In HPO mode, Autopilot can supportnumeric,categorical,text,datetime, andsequence.If only
FeatureDataTypesis provided, the column keys (col1,col2,..) should be a subset of the column names in the input data.If both
FeatureDataTypesandFeatureAttributeNamesare provided, then the column keys should be a subset of the column names provided inFeatureAttributeNames.The key name
FeatureAttributeNamesis fixed. The values listed in["col1", "col2", ...]are case sensitive and should be a list of strings containing unique values that are a subset of the column names in the input data. The list of columns provided must not include the target column.
Accessible with the following methods¶
| Method | Description |
|---|---|
GET_FEATURESPECS3URI() |
Getter for FEATURESPECIFICATIONS3URI, with configurable defa |
ASK_FEATURESPECS3URI() |
Getter for FEATURESPECIFICATIONS3URI w/ exceptions if field |
HAS_FEATURESPECS3URI() |
Determine if FEATURESPECIFICATIONS3URI has a value |
Mode¶
The method that Autopilot uses to train the data. You can either specify the mode manually or let Autopilot choose for you based on the dataset size by selecting
AUTO. InAUTOmode, Autopilot choosesENSEMBLINGfor datasets smaller than 100 MB, andHYPERPARAMETER_TUNINGfor larger ones.The
ENSEMBLINGmode uses a multi-stack ensemble model to predict classification and regression tasks directly from your dataset. This machine learning mode combines several base models to produce an optimal predictive model. It then uses a stacking ensemble method to combine predictions from contributing members. A multi-stack ensemble model can provide better performance over a single model by combining the predictive capabilities of multiple models. See Autopilot algorithm support for a list of algorithms supported byENSEMBLINGmode.The
HYPERPARAMETER_TUNING(HPO) mode uses the best hyperparameters to train the best version of a model. HPO automatically selects an algorithm for the type of problem you want to solve. Then HPO finds the best hyperparameters according to your objective metric. See Autopilot algorithm support for a list of algorithms supported byHYPERPARAMETER_TUNINGmode.
Accessible with the following methods¶
| Method | Description |
|---|---|
GET_MODE() |
Getter for MODE, with configurable default |
ASK_MODE() |
Getter for MODE w/ exceptions if field has no value |
HAS_MODE() |
Determine if MODE has a value |
GenerateCandidateDefinitionsOnly¶
Generates possible candidates without training the models. A model candidate is a combination of data preprocessors, algorithms, and algorithm parameter settings.
Accessible with the following methods¶
| Method | Description |
|---|---|
GET_GENERATECANDIDATEDEFNS00() |
Getter for GENERATECANDIDATEDEFNSONLY, with configurable def |
ASK_GENERATECANDIDATEDEFNS00() |
Getter for GENERATECANDIDATEDEFNSONLY w/ exceptions if field |
HAS_GENERATECANDIDATEDEFNS00() |
Determine if GENERATECANDIDATEDEFNSONLY has a value |
ProblemType¶
The type of supervised learning problem available for the model candidates of the AutoML job V2. For more information, see SageMaker Autopilot problem types.
You must either specify the type of supervised learning problem in
ProblemTypeand provide the AutoMLJobObjective metric, or none at all.
Accessible with the following methods¶
| Method | Description |
|---|---|
GET_PROBLEMTYPE() |
Getter for PROBLEMTYPE, with configurable default |
ASK_PROBLEMTYPE() |
Getter for PROBLEMTYPE w/ exceptions if field has no value |
HAS_PROBLEMTYPE() |
Determine if PROBLEMTYPE has a value |
TargetAttributeName¶
The name of the target variable in supervised learning, usually represented by 'y'.
Accessible with the following methods¶
| Method | Description |
|---|---|
GET_TARGETATTRIBUTENAME() |
Getter for TARGETATTRIBUTENAME, with configurable default |
ASK_TARGETATTRIBUTENAME() |
Getter for TARGETATTRIBUTENAME w/ exceptions if field has no |
HAS_TARGETATTRIBUTENAME() |
Determine if TARGETATTRIBUTENAME has a value |
SampleWeightAttributeName¶
If specified, this column name indicates which column of the dataset should be treated as sample weights for use by the objective metric during the training, evaluation, and the selection of the best model. This column is not considered as a predictive feature. For more information on Autopilot metrics, see Metrics and validation.
Sample weights should be numeric, non-negative, with larger values indicating which rows are more important than others. Data points that have invalid or no weight value are excluded.
Support for sample weights is available in Ensembling mode only.
Accessible with the following methods¶
| Method | Description |
|---|---|
GET_SAMPLEWEIGHTATTRNAME() |
Getter for SAMPLEWEIGHTATTRIBUTENAME, with configurable defa |
ASK_SAMPLEWEIGHTATTRNAME() |
Getter for SAMPLEWEIGHTATTRIBUTENAME w/ exceptions if field |
HAS_SAMPLEWEIGHTATTRNAME() |
Determine if SAMPLEWEIGHTATTRIBUTENAME has a value |