

# Supported transforms for Visual ETL


 The following section contains information on the supported transforms for Visual ETL in Amazon SageMaker Unified Studio :

**Topics**
+ [

# Add current timestamp transform
](add-current-timestamp-transform.md)
+ [

# Aggregate transform
](aggregate-transform.md)
+ [

# Array to columns transform
](array-to-columns-transform.md)
+ [

# Autobalance processing transform
](autobalance-processing-transform.md)
+ [

# Change columns transform
](change-columns-transform.md)
+ [

# Concatenate columns transform
](concatenate-columns-transform.md)
+ [

# Custom code transform
](custom-code-transform.md)
+ [

# Derived column transform
](derived-column-transform.md)
+ [

# Drop columns transform
](drop-columns-transform.md)
+ [

# Drop duplicates transform
](drop-duplicates-transform.md)
+ [

# Drop nulls transform
](drop-null-transform.md)
+ [

# Explode array or map into rows transform
](explode-array-map-rows-transform.md)
+ [

# Extract JSON path transform
](extract-json-path-transform.md)
+ [

# Fill nulls transform
](fill-null-transform.md)
+ [

# Filter transform
](filter-transform.md)
+ [

# Flatten transform
](flatten-transform.md)
+ [

# Format timestamp transform
](format-timestamp-transform.md)
+ [

# Identifier transform
](identifier-transform.md)
+ [

# Intersect transform
](intersect-transform.md)
+ [

# Join transform
](join-transform.md)
+ [

# Limit tranform
](limit-transform.md)
+ [

# Lookup transform
](lookup-transform.md)
+ [

# Parse JSON column transform
](parse-json-transform.md)
+ [

# Pivot rows into columns transform
](pivot-transform.md)
+ [

# Regex extract transform
](regex-extract-transform.md)
+ [

# Rename columns transform
](rename-columns-transform.md)
+ [

# Select columns transform
](select-columns-transform.md)
+ [

# Split string transform
](split-string-transform.md)
+ [

# SQL query transform
](sql-query-transform.md)
+ [

# To timestamp transform
](to-timestamp-transform.md)
+ [

# Union transform
](union-transform.md)
+ [

# Unpivot columns into rows transform
](unpivot-transform.md)
+ [

# UUID transform
](uuid-transform.md)

# Add current timestamp transform


 The Add Current Timestamp transform allows you to mark the rows with the time on which the data was processed. This is useful for auditing purposes or to track latency in the data pipeline. You can add this new column as a timestamp data type or a formatted string. 

**To add an Add Current Timestamp transform:**

1. Navigate to your Visual ETL job in Amazon SageMaker Unified Studio.

1. Choose the plus icon to open the **Add nodes** menu.

1. Under **Transforms**, choose **Add Current Timestamp**.

1. Select the diagram to add the node to your Visual ETL job.

1. Select the node on the diagram to view details about the transform.

1. Under **Timestamp column name**, enter a custom name for the new column.

1. (Optional) under **Timestamp format**, enter a format if you would prefer the column to be a formatted date string.

# Aggregate transform


You may use the Aggregate transform to perform summary calculations on selected fields.

**To use the Aggregate transform**

1. Add the Aggregate node to the visual job diagram.

1. (Optional) Click on the rename node icon to enter a new name for the node in the job diagram.

1. On the Node properties view, choose the "fields to group by", selecting the drop-down field (optional). You can select more than one field at a time or search for a field name by typing in the search bar.

   When fields are selected, the name and datatype are shown. To remove a field, click 'X' on the field.

1. Choose Aggregate another column. It is required to select at least one field.

1. Choose a field in the Field to aggregate drop-down.

1. Choose the aggregation function to apply to the chosen field:
   + avg - calculates the average
   + count - calculates the number of non-null values
   + max - returns the highest value that satisfies the 'group by' criteria
   + min - returns the lowest value that satisfies the 'group by' criteria
   + sum - the sum of all values in the group  
![\[The Amazon SageMaker Unified Studio UI showing the configuration options for the Aggregate transform.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/vis-etl/vis-etl-aggregate.png)

# Array to columns transform


The Array To Columns transform allows you extract some or all the elements of a column of type array into new columns. The transform will fill the new columns as much as possible if the array has enough values to extract, optionally taking the elements in the positions specified.

For instance, if you have an array column “subnet”, which was the result of applying the “Split String” transform on a ip v4 subnet, you can extract the first and forth positions into new columns “first\$1octect” and “forth\$1octect”. The output of the transform in this example would be (notice the last two rows have shorter arrays than expected):


| subnet | first\$1octect | fourth\$1octect | 
| --- | --- | --- | 
| [54, 240, 197, 238] | 54 | 238 | 
| [192, 168, 0, 1] | 192 | 1 | 
| [192, 168] | 192 |  | 
| [] |  |  | 

**To add an Array to Columns transform:**

1. Navigate to your visual ETL job in Amazon SageMaker Unified Studio.

1. Choose the plus icon to open the **Add nodes** menu.

1. Under **Transforms**, choose **Array to Columns**.

1. Select the diagram to add the node to your visual ETL job.

1. Select the node on the diagram to view details about the transform.

1. Under **Array type column**, choose the column of type array from which the new columns are extracted.

1. Under **Output columns**, enter names for the output columns.

1. (Optional) Under **Array indexes to use**, enter numbers to indicate which columns to include.

# Autobalance processing transform


Use this transform to repartition data to optimize future cluster resource usage. This transform is particularly useful for uneven datasets.

**To add an Autobalance Processing transform:**

1. Navigate to your visual ETL job in Amazon SageMaker Unified Studio.

1. Choose the plus icon to open the **Add nodes** menu.

1. Under **Transforms**, choose **Autobalance Processing**.

1. Select the diagram to add the node to your visual ETL job.

1. Select the node on the diagram to view details about the transform.

1. Under **Number of partitions**, input a number of partitions to randomly distribute the data into. Or, switch the toggle to off to use the number of cores as the partition number.

1. (Optional) Under **Repartition columns**, identify columns that you want data of the same value to be assigned to the same partition in.

# Change columns transform


A Change columns transform remaps the source data property keys into the desired configured for the target data. In a Change Columns transform node, you can:

1. Change the name of multiple columns.

1. Change the data type of the columns, if the new data type is supported and there is a transformation path between the two data types.

1. Use a custom expression to change the values of selected columns.

**To add a Change Schema node to your job diagram**

1. Open the menu and then choose Change Columns to add a new transform to your job diagram, if needed.

1. (Optional) Click on the rename node icon to enter a new name for the node in the job diagram.

1. Modify the input schema by clicking on the node, then:

   1. To rename a column, enter the new name in the Target column field.

   1. To change the data type for a selected column, select "Add new type cast", then choose the source column, target column, and new data type from the Target type list.

   1. To change the data values for a selected column, add a custom expression.

1. (Optional) After configuring the node properties and transform properties, you can preview the modified dataset by choosing the Data preview tab in the node details panel.

![\[The Amazon SageMaker Unified Studio UI showing the configuration options for the Change Columns transform.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/vis-etl/vis-etl-change-columns.png)


# Concatenate columns transform


The Concatenate transform allows you to build a new string column using the values of other columns with an optional spacer. For example, if we define a concatenated column “date” as the concatenation of “year”, “month” and “day” (in that order) with “-” as the spacer, we would get:


| day | month | year | date | 
| --- | --- | --- | --- | 
| 01 | 01 | 2020 | 2020-01-01 | 
| 02 | 01 | 2020 | 2020-01-02 | 
| 03 | 01 | 2020 | 2020-01-02 | 
| 04 | 01 | 2020 | 2020-01-02 | 

**To add a Concatenate Columns transform:**

1. Navigate to your visual ETL job in Amazon SageMaker Unified Studio.

1. Choose the plus icon to open the **Add nodes** menu.

1. Under **Transforms**, choose **Concatenate Columns**.

1. Select the diagram to add the node to your visual ETL job.

1. Select the node on the diagram to view details about the transform.

1. Under **Concatenated column**, enter the name of a new column that will be generated.

1. Under **Columns**, select the input columns.

1. (Optional) Under **Spacer**, enter a string to place between concatenated fields.

1. (Optional) Under **Null value**, enter the string to use when a column value is null.

# Custom code transform


If you need to perform more complicated transformations on your data, or want to add data property keys to the dataset, you can add a Custom code transform to your job diagram. The Custom code node allows you to enter a script that performs the transformation.

**To add a custom code node to your job diagram**

1. Open the Resource panel and then choose Custom Code to add a custom transform to your job diagram.

1. (Optional) Click on the rename node icon to enter a new name for the node in the job diagram.

1. Enter desired code changes.

The following examples show the format of the code to enter in the code box:

```
def FilterPopulationAbove1000(input_df):
  df = input_df
  df = df[df['population'] > 1000]
  return df
```

![\[The Amazon SageMaker Unified Studio UI showing the configuration options for the Custom Code transform.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/vis-etl/vis-etl-customcode.png)


# Derived column transform


The Derived Column transform allows you to define a new column based on a math formula or SQL expression in which you can use other columns in the data, as well as constants and literals. For instance, to derive a “percentage” column from the columns "success" and "count", you can enter the SQL expression: "success \$1 100 / count \$1\$1 '%'".

Example result:


| success | count | percentage | 
| --- | --- | --- | 
| 14 | 100 | 14% | 
| 6 | 20 | 3% | 
| 3 | 40 | 7.5% | 

**To add a Derived Column transform:**

1. Navigate to your visual ETL job in Amazon SageMaker Unified Studio.

1. Choose the plus icon to open the **Add nodes** menu.

1. Under **Transforms**, choose **Derived Column**.

1. Select the diagram to add the node to your visual ETL job.

1. Select the node on the diagram to view details about the transform.

1. Under **Name of derived column**, enter the name of a new column that will be generated.

1. Under **Column expression**, enter a SQL expression to define the new column based on existing columns.

# Drop columns transform


You can create a subset of data property keys from the dataset using the Drop columns transform. You indicate which data property keys you want to remove from the dataset and the rest of the keys are retained.

**To add a Drop columns transform node to your job diagram**

1. Open the Resource panel and then choose Drop columns to add a new transform to your diagram.

1. (Optional) Click on the rename node icon to enter a new name for the node in the job diagram.

1. Click on the node and view the Node properties panel.

1. Choose the data property keys from the "Field" column to drop the column from the data source.

1. (Optional) After configuring the node properties and transform properties, you can preview using the Data preview tab in job diagram.

![\[The Amazon SageMaker Unified Studio UI showing the configuration options for the Drop columns transform.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/vis-etl/vis-etl-drop-cols.png)


# Drop duplicates transform


The Drop duplicates transform removes rows from your data source by giving you two options. You can choose to remove the duplicate row that are completely the same, or you can choose to choose the fields to match and remove only those rows based on your chosen fields.

For example, in this data set, you have duplicate rows where all the values in some of the rows are exactly the same as another row, and some of the values in rows are the same or different.


**Example Data Set**  

| Row | Name | Email | Age | State | Note | 
| --- | --- | --- | --- | --- | --- | 
| 1 | Joy | joy@gmail | 33 | NY |  | 
| 2 | Tim | tim@gmail | 45 | OH |  | 
| 3 | Rose | rose@gmail | 23 | NJ |  | 
| 4 | Tim | tim@gmail | 42 | OH |  | 
| 5 | Rose | rose@gmail | 23 | NJ |  | 
| 6 | Tim | tim@gmail | 42 | OH | this is a duplicate row and matches completely on all values as row \$14 | 
| 7 | Rose | rose@gmail | 23 | NJ | This is a duplicate row and matches completely on all values as row \$15 | 

If you choose to match entire rows, rows 6 and 7 will be removed from the data set. The data set is now:


**Data Set After Matching Entire Rows**  

| Row | Name | Email | Age | State | 
| --- | --- | --- | --- | --- | 
| 1 | Joy | joy@gmail | 33 | NY | 
| 2 | Tim | tim@gmail | 45 | OH | 
| 3 | Rose | rose@gmail | 23 | NJ | 
| 4 | Tim | tim@gmail | 42 | OH | 
| 5 | Rose | rose@gmail | 23 | NJ | 

If you chose to specify keys, you can choose to remove rows that match on 'name' and 'email'. This gives you finer control of what is a 'duplicate row' for your data set. By specifying 'name' and 'email', the data set is now:


**Data Set After Specifying Keys**  

| Row | Name | Email | Age | State | 
| --- | --- | --- | --- | --- | 
| 1 | Joy | joy@gmail | 33 | NY | 
| 2 | Tim | tim@gmail | 45 | OH | 
| 3 | Rose | rose@gmail | 23 | NJ | 

Some things to keep in mind:
+ In order for rows to be recognized as a duplicate, values are case sensitive. all values in rows need to have the same casing - this applies to either option you choose (Match entire rows or Specify keys).
+ All values are read in as strings.
+ The Drop duplicates transform utilizes the Spark dropDuplicates command.
+ When using the Drop duplicates transform, the first row is kept and other rows are dropped.
+ The Drop duplicates transform does not change the schema of the dataframe.

**To add a Drop duplicates transform node to your job diagram**

1. Open the Resource panel and then choose Drop duplicates to add a new transform to your diagram.

1. (Optional) Click on the rename node icon to enter a new name for the node in the job diagram.

1. Click on the node and view the Node properties panel.

1. Choose if you prefer to drop duplicates by matching entire rows or specific keys.

1. (Optional) After configuring the node properties and transform properties, you can preview using the Data preview tab in job diagram.

![\[The Amazon SageMaker Unified Studio UI showing the configuration options for the Drop duplicates transform.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/vis-etl/vis-etl-drop-dupes.png)


# Drop nulls transform


Use the Drop nulls transform to remove fields from the dataset if all values in the field are 'null'. By default, Visual ETL will recognize null objects, but some values such as empty strings, strings that are "null", -1 integers or other placeholders such as zeros, are not automatically recognized as nulls.

**To use the Drop nulls**

1. Add a Drop nulls node to the diagram, if needed.

1.  Set a number as threshold value, if the rows(s) need to be drop only if a certain number of nulls is present. 

1.  Choose if you prefer to drop nulls for all columns or only check null values for specific columns. If you choose the subset option, select the desired column names from the drop down list. 

![\[The Amazon SageMaker Unified Studio UI showing the configuration options for the Drop nulls transform.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/vis-etl/vis-etl-dropnull.png)


# Explode array or map into rows transform


You can use the Explode transform to extract values from a nested structure into individual rows that are easier to manipulate. In the case of an array, the transform will generate a row for each value of the array, replicating the values for the other columns in the row. In the case of a map, the transform will generate a row for each entry with the key and value as columns plus any other columns in the row.

For example, if we have this dataset which has a “category” array column with multiple values.


| product\$1id | category | 
| --- | --- | 
| 1 | [sports, winter] | 
| 2 | [garden, tools] | 
| 3 | [videos] | 
| 4 | [games, electronics, social] | 
| 5 | [] | 

If you explode the 'category' column into a column with the same name, you will override the column. You can select that you want NULLs included to get the following (ordered for illustration purposes):


| 1 | sports | 
| --- | --- | 
| 1 | winter | 
| 2 | garden | 
| 2 | tools | 
| 3 | videos | 
| 4 | games | 
| 4 | electronics | 
| 4 | social | 
| 5 |  | 

**To add an Explode Array Or Map Into Rows transform:**

1. Navigate to your visual ETL job in Amazon SageMaker Unified Studio.

1. Choose the plus icon to open the **Add nodes** menu.

1. Under **Transforms**, choose **Explode Array or Map Into Rows**.

1. Select the diagram to add the node to your visual ETL job.

1. Select the node on the diagram to view details about the transform.

1. Under **Column to explode**, enter the name of an existing column that will be exploded.

1. Under **New column name**, enter the name of a new column that will be generated.

1. (Optional) Under **Values column**, if you are exploding a dictionary, specify a name for a column to contain the values.

1. Use the toggle to indicate whether or not to include null values. By default, if the column to explode is NULL or has an empty structure, it will be omitted on the exploded dataset.

# Extract JSON path transform


Use this transform to extract new columns from a string column with JSON data.

**To add an Extract JSON path transform:**

1. Navigate to your visual ETL job in Amazon SageMaker Unified Studio.

1. Choose the plus icon to open the **Add nodes** menu.

1. Under **Transforms**, choose **Extract JSON Path**.

1. Select the diagram to add the node to your visual ETL job.

1. Connect the transform node to a data source node.

1. Select the node on the diagram to view details about the transform.

1. Under **JSON column**, select a column from the data source.

1. Under **JSON paths**, enter a JSON path or multiple paths separated by commas to specify the location to extract the data from.

1. Under **Output columns**, enter the names of the columns to create from each of the JSON paths.

1. (Optional) If desired, use the toggle to drop the JSON column.

# Fill nulls transform


Fill nulls allows you to fill in null value in a column with a chosen fill value. Select a column with nulls you want to fill and provide a fill value. You can use multiple fill conditions at one time. Select a subset of the data to consider by using a dropdown menu to select columns.

**To add a Fill nulls node to your job diagram**

1. Open the menu and then choose Fill nulls to add a new transform to your job diagram.

1. (Optional) Click on the rename node icon to enter a new name for the node in the job diagram.

1. Modify the input schema:

   1. To create a subset, use the dropdown menu and select a column or columns to consider.

   1. To specify a value to fill nulls with, select "Add new fill null value", then choose a column to fill and provide a Fill value.

1. (Optional) After configuring the node properties and transform properties, you can preview the modified dataset by choosing the Data preview tab in the node details panel.

![\[The Amazon SageMaker Unified Studio UI showing the configuration options for the Fill nullss transform.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/vis-etl/vis-etl-fill-null.png)


# Filter transform


Use the Filter transform to create a new dataset by filtering records from the input dataset. Rows that don't satisfy the filter condition are removed from the output. You can select from two filter types: Global AND or Global OR. You must select a column name to serve as the key, a comparison operation, and provide a value to filter on.

**To add a Filter node to your job diagram**

1. Open the menu and then choose Filter to add a new transform to your job diagram.

1. (Optional) Click on the rename node icon to enter a new name for the node in the job diagram.

1. Modify the input schema:

   1. Select "Add new filter condition".

   1. Choose a filter type: "Global AND" or "Global or".

   1. Select a Key column to filter.

   1. Select a comparison operation.

   1. Type in a value to compare in the "Value" box.

1. (Optional) After configuring the node properties and transform properties, you can preview the modified dataset by choosing the Data preview tab in the node details panel.

![\[The Amazon SageMaker Unified Studio UI showing the configuration options for the Filter transform.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/vis-etl/vis-etl-filter.png)


# Flatten transform


You can flatten the fields of nested structs in the data so they become top level fields. The new fields are named using the field name prefixed with the names of the struct fields to reach it, separated by dots.

For example, if the data has a field of type Struct named “phone\$1numbers”, which among other fields has one of type “Struct” named “home\$1phone” with two fields: “country\$1code” and “number”. After they are flattened, these two fields will become top level fields named “phone\$1numbers.home\$1phone.country\$1code” and “phone\$1numbers.home\$1phone.number” respectively.

**To add a Flatten transform:**

1. Navigate to your visual ETL job in Amazon SageMaker Unified Studio.

1. Choose the plus icon to open the **Add nodes** menu.

1. Under **Transforms**, choose **Flatten**.

1. Select the diagram to add the node to your visual ETL job.

1. Select the node on the diagram to view details about the transform.

1. Under **Max levels to flatten**, enter the a maximum limit.

1. Under **Separator**, enter a character to use in the new column names to separate the different levels.

# Format timestamp transform


Use this transform to convert a timestamp colum into a formatted string.

**To add a Format Timestamp transform:**

1. Navigate to your visual ETL job in Amazon SageMaker Unified Studio.

1. Choose the plus icon to open the **Add nodes** menu.

1. Under **Transforms**, choose **Format Timestamp**.

1. Select the diagram to add the node to your visual ETL job.

1. Select the node on the diagram to view details about the transform.

1. Under **Column**, choose the column that contains the timestamps.

1. Under **Timestamp format**, input the format you want to use for the strings.

1. (Optional) Enter a name for the new formatted column.

# Identifier transform


Use this transform to assign a numeric identifier for each row in the dataset.

**To add an Identifier transform:**

1. Navigate to your visual ETL job in Amazon SageMaker Unified Studio.

1. Choose the plus icon to open the **Add nodes** menu.

1. Under **Transforms**, choose **Identifier**.

1. Select the diagram to add the node to your visual ETL job.

1. Connect the transform node to a data source node.

1. Select the node on the diagram to view details about the transform.

1. Under **Column name**, enter a name for the new column. This column is named id by default.

1. (Optional) If you want to make the ID unique among the job runs, switch the toggle to on to include a timestamp and extend the column type from long to decimal.

# Intersect transform


Use this transform to select only the common rows in two datasets. Duplicates are removed by default, and you can choose to keep them in by checking the **Intersect all** checkbox.

**To add an Intersect transform:**

1. Navigate to your visual ETL job in Amazon SageMaker Unified Studio.

1. Choose the plus icon to open the **Add nodes** menu.

1. Under **Transforms**, choose **Intersect**.

1. Select the diagram to add the node to your visual ETL job.

1. Select the node on the diagram to view details about the transform.

1. (Optional) If you want to intersect duplicated rows as well, check the **Intersect all** checkbox.

# Join transform


The Join transform allows you to combine two datasets into one. You specify the key names in the schema of each dataset to compare. The output frame contains rows where keys meet the join condition. The rows in each dataset that meet the join condition are combined into a single row in the output from that contains all the columns found in either dataset. You can select from one of the following join types: Inner, Left, Right, Full, Cross, Semi, and Anti.

**To add a Join node to your job diagram**

1. Open the menu and then choose Join to add a new transform to your job diagram, if needed.

1. (Optional) Click on the rename node icon to enter a new name for the node in the job diagram.

1. Optional) Ensure two data sources are connected to the Join node.

1. Modify the input schema:

   1. Select a join type from the "Join type" dropdown menu.Optional).

   1. Select a column for the "Left data source" using the dropdown menu.

   1. Select a column for the "Right data source" using the dropdown menu.

1. (Optional) After configuring the node properties and transform properties, you can preview the modified dataset by choosing the Data preview tab in the node details panel.

![\[The Amazon SageMaker Unified Studio UI showing the configuration options for the Join transform.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/vis-etl/vis-etl-join.png)


**Note**  
 If you have columns using the same names in your data sources, the join will result in a COLUMN\$1ALREADY\$1EXISTS error. To avoid this error, either: (1) use Rename columns before the join for one of your data sources or (2) use Drop Columns after the join to remove both duplicated columns. 

# Limit tranform


Use this transform to limit the number of rows in a dataset.

**To add a Limit transform:**

1. Navigate to your visual ETL job in Amazon SageMaker Unified Studio.

1. Choose the plus icon to open the **Add nodes** menu.

1. Under **Transforms**, choose **Limit**.

1. Select the diagram to add the node to your visual ETL job.

1. Select the node on the diagram to view details about the transform.

1. Under **Number of records**, enter the maximum number of rows you want to see or process from the dataset.

# Lookup transform


Use this transform to add matching columns by looking them up on another catalog table.

**To add a Lookup transform:**

1. Navigate to your visual ETL job in Amazon SageMaker Unified Studio.

1. Choose the plus icon to open the **Add nodes** menu.

1. Under **Transforms**, choose **Lookup**.

1. Select the diagram to add the node to your visual ETL job.

1. Connect the transform node to a data source node.

1. Select the node on the diagram to view details about the transform.

1. Under **Catalog**, select the catalog to use for lookup.

1. Under **Database**, select a database.

1. Under **Table**, select a table.

1. Under **Lookup key columns to match**, enter columns in the lookup table, separated by commas.

1. Under **Lookup columns to take**, enter columns in the lookup table to add to the data when a match is found.

# Parse JSON column transform


Use this transform to parse a string column that contains JSON data.

**To add a Parse JSON Column transform:**

1. Navigate to your visual ETL job in Amazon SageMaker Unified Studio.

1. Choose the plus icon to open the **Add nodes** menu.

1. Under **Transforms**, choose **Parse JSON Column**.

1. Select the diagram to add the node to your visual ETL job.

1. Connect the transform node to a data source node.

1. Select the node on the diagram to view details about the transform.

1. Under **JSON column**, choose the column that contains JSON data

1. Enter a number to use as a ratio of samples to use for inferring schema. This must be a value between 0.1 and 1, with 1 meaning all samples are used. If your JSON strings are consistent, you can use a lower value to speed up the inference.

1. (Optional) Enter a name for the new column.

# Pivot rows into columns transform


Use this transform to aggregate the values of a numeric column into categories by using the values of the columns configured.

**To add a Pivot Rows Into Columns transform:**

1. Navigate to your visual ETL job in Amazon SageMaker Unified Studio.

1. Choose the plus icon to open the **Add nodes** menu.

1. Under **Transforms**, choose **Pivot Rows Into Columns**.

1. Select the diagram to add the node to your visual ETL job.

1. Connect the transform node to a data source node.

1. Select the node on the diagram to view details about the transform.

1. Under **Aggregation column**, select a numeric column to use.

1. Under **Aggregation**, select a Spark function to apply to the aggregation column.

1. Under **Columns to convert**, select columns whose values will become new columns.

# Regex extract transform


Use this transform to extract new columns from a string column with JSON data.

**To add a Regex Extract transform:**

1. Navigate to your visual ETL job in Amazon SageMaker Unified Studio.

1. Choose the plus icon to open the **Add nodes** menu.

1. Under **Transforms**, choose **Regex Extract**.

1. Select the diagram to add the node to your visual ETL job.

1. Select the node on the diagram to view details about the transform.

1. Under **Column to extract from**, choose a column from the data source you connected to the transform.

1. Under **Regular expression**, enter a regular expression to apply on the column. If there are multiple columns, the expression must have a number of groups equal to the number of columns.

1. Under **Extracted column**, enter the name of the column to contain the extracted regex.

# Rename columns transform


You can use the Rename columns transform to change the name of multiple columns in the dataset. You select a column that you want to rename from a list of available columns and provide the string that you want to update the column name to.

**To add a Rename columns node to your job diagram**

1. Open the menu and then choose Rename columns to add a new transform to your job diagram, if needed.

1. (Optional) Click on the rename node icon to enter a new name for the node in the job diagram.

1. Modify the input schema:

   1. Select "Add new rename pair".

   1. Select the current column to rename from the "Current name" column.

   1. Provide a name for the column in the "New name" box.

1. (Optional) After configuring the node properties and transform properties, you can preview the modified dataset by choosing the Data preview tab in the node details panel.

![\[The Amazon SageMaker Unified Studio UI showing the configuration options for the Rename transform.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/vis-etl/vis-etl-renamecolumns.png)


# Select columns transform


You can create a subset of columns in a dataset using the Select columns transform. You indicate which columns you want to keep and the rest are removed from the dataset.

**To add a Select columns node to your job diagram**

1. Open the menu and then choose Filter to add a new transform to your job diagram, if needed.

1. (Optional) Click on the rename node icon to enter a new name for the node in the job diagram.

1. Modify the input schema:

   1. Select the columns you want to keep in the output frame by checking the corresponding box under the "Field" option.

1. (Optional) After configuring the node properties and transform properties, you can preview the modified dataset by choosing the Data preview tab in the node details panel.

![\[The Amazon SageMaker Unified Studio UI showing the configuration options for the Select columns transform.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/vis-etl/vis-etl-selectclumns.png)


# Split string transform


Use this transform to split a string column into an array.

**To add a Split String transform:**

1. Navigate to your visual ETL job in Amazon SageMaker Unified Studio.

1. Choose the plus icon to open the **Add nodes** menu.

1. Under **Transforms**, choose **Split String**.

1. Select the diagram to add the node to your visual ETL job.

1. Connect the transform node to a data source node.

1. Select the node on the diagram to view details about the transform.

1. Under **Column to split**, select the column that you want to split.

1. Under **Splitting regular expression**, enter a regular expression that defines the separator token.

1. (Optional) Under **Array column name, enter a name for the new column that contains the extracted array.**

# SQL query transform


You can use a SQL query transform to write your own transform in the form of a SQL query. Writing the SQL query in Visual ETL creates a subset of the data corresponding to the query. A SQL transform node can have multiple datasets as inputs, but produces only a single dataset as output. It contains a text field, where you enter the SQL query.

**To add an SQL query node to your job diagram**

1. Open the menu and then choose SQL query to add a new transform to your job diagram, if needed.

1. (Optional) Click on the rename node icon to enter a new name for the node in the job diagram.

1. Modify the input schema:

   1. Ensure the alias in the "SQL aliases" box is appropriate. Visual ETL will autopopulate this field, but you can change it.

   1. Write an SQL statement that queries the data to suit your needs

1. (Optional) After configuring the node properties and transform properties, you can preview the modified dataset by choosing the Data preview tab in the node details panel.

![\[The Amazon SageMaker Unified Studio UI showing the configuration options for the SQL query transform.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/vis-etl/vis-etl-sqlquery.png)


# To timestamp transform


You can use the transform To Timestamp to change the data type of a numeric or string column into timestamp, so that it can be stored with that data type or applied to other transforms that require a timestamp.

**To add a To Timestamp transform:**

1. Navigate to your visual ETL job in Amazon SageMaker Unified Studio.

1. Choose the plus icon to open the **Add nodes** menu.

1. Under **Transforms**, choose **To Timestamp**.

1. Select the diagram to add the node to your visual ETL job.

1. Select the node on the diagram to view details about the transform.

1. Under **Column to convert**, select the column that you want to convert.

1. Under **Time format**, define how to parse the column selected by choosing the format.

   If the value is a number, it can be expressed in seconds (Unix/Python timestamp), milliseconds, or microseconds.

   If the value is a formatted string, choose the "iso" type. The string needs to conform to one of the variants of the ISO format, for example: “2022-11-02T14:40:59.915Z“.

   If you don’t know the type at this point or different rows use different types, then you can choose ”autodetect“ and the system will make its best guess, with a small performance cost.

1. Under **Column name**, enter a name for a new column that will be generated for the timestamps. If a column name is not specified, the existing column will be replaced instead of generating a new column.

# Union transform


You use the Union transform node when you want to combine rows from more than one data source that have the same schema. When applying Union transformations you can select to "Union by name" to have the union done on columns with the same name (rather than by position). Selecting this option also lets you select "Allow missing columns", which will produce a frame that has missing columns filled with Null values.

**To add a Union transform node to your job diagram**

1. Open the menu and then choose "Union transform" to add a new transform to your job diagram, if needed.

1. (Optional) Click on the union node icon to enter a new name for the node in the job diagram.

1. Modify the input schema:

   1. Select "Union by name" if you want the union to be done on columns with the same name.

   1. If you have enabled Union by name, select "Allow missing columns" if you want to fill missing columns with null values.

1. (Optional) After configuring the node properties and transform properties, you can preview the modified dataset by choosing the Data preview tab in the node details panel.

![\[The Amazon SageMaker Unified Studio UI showing the configuration options for the Union transform.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/vis-etl/vis-etlunion.png)


# Unpivot columns into rows transform


Use this transform to convert column names into values of a new column and generate one row for each column in the original row.

**To add an Unpivot Columns Into Rows transform:**

1. Navigate to your visual ETL job in Amazon SageMaker Unified Studio.

1. Choose the plus icon to open the **Add nodes** menu.

1. Under **Transforms**, choose **Unpivot Columns Into Rows**.

1. Select the diagram to add the node to your visual ETL job.

1. Connect the transform node to a data source node.

1. Select the node on the diagram to view details about the transform.

1. Under **Unpivot names column**, enter a column to create out of the source column names.

1. Under **Unpivot values column**, enter a column to create out of the source column values.

1. Under **Columns to unpivot into the new value column**, select the column or columns whose names will become the values of the new column.

# UUID transform


Use this transform to generate a 36-character unique identifier for each row in your data source.

**To add a UUID transform:**

1. Navigate to your visual ETL job in Amazon SageMaker Unified Studio.

1. Choose the plus icon to open the **Add nodes** menu.

1. Under **Transforms**, choose **UUID**.

1. Select the diagram to add the node to your visual ETL job.

1. Connect the transform node to a data source node.

1. Select the node on the diagram to view details about the transform.

1. Under **UUID column name**, enter a name to use for the new column that contains the unique identifiers.