

# Column structure recipe steps


Use these column structure recipe steps to modify the column structure of your data.

**Topics**
+ [

# BOOLEAN\$1OPERATION
](recipe-actions.BOOLEAN_OPERATION.md)
+ [

# CASE\$1OPERATION
](recipe-actions.CASE_OPERATION.md)
+ [

# FLAG\$1COLUMN\$1FROM\$1NULL
](recipe-actions.FLAG_COLUMN_FROM_NULL.md)
+ [

# FLAG\$1COLUMN\$1FROM\$1PATTERN
](recipe-actions.FLAG_COLUMN_FROM_PATTERN.md)
+ [

# MERGE
](recipe-actions.MERGE.md)
+ [

# SPLIT\$1COLUMN\$1BETWEEN\$1DELIMITER
](recipe-actions.SPLIT_COLUMN_BETWEEN_DELIMITER.md)
+ [

# SPLIT\$1COLUMN\$1BETWEEN\$1POSITIONS
](recipe-actions.SPLIT_COLUMN_BETWEEN_POSITIONS.md)
+ [

# SPLIT\$1COLUMN\$1FROM\$1END
](recipe-actions.SPLIT_COLUMN_FROM_END.md)
+ [

# SPLIT\$1COLUMN\$1FROM\$1START
](recipe-actions.SPLIT_COLUMN_FROM_START.md)
+ [

# SPLIT\$1COLUMN\$1MULTIPLE\$1DELIMITER
](recipe-actions.SPLIT_COLUMN_MULTIPLE_DELIMITER.md)
+ [

# SPLIT\$1COLUMN\$1SINGLE\$1DELIMITER
](recipe-actions.SPLIT_COLUMN_SINGLE_DELIMITER.md)
+ [

# SPLIT\$1COLUMN\$1WITH\$1INTERVALS
](recipe-actions.SPLIT_COLUMN_WITH_INTERVALS.md)

# BOOLEAN\$1OPERATION


Create a new column, based on the result of logical condition IF. Return true value if the boolean expression is true, false value if the boolean expression is false, or return a custom value.

**Parameters**
+ `trueValueExpression` – Result when the condition is met.
+ `falseValueExpression` – Result when the condition is not met.
+ `valueExpression` – Boolean condition.
+ `withExpressions` – Configuration for aggregate results.
+ `targetColumn` – A name for the newly created column.

You can use constant values, column references, and aggregate results in trueValueExpression, falseValueExpression and valueExpression.

**Example: Constant values**  
Values that remain unchanged, like a number or a sentence.  
  

```
{
  "RecipeStep": {
    "Action": {
      "Operation": "BOOLEAN_OPERATION",
      "Parameters": {
        "trueValueExpression": "It is true.",
        "falseValueExpression": "It is false.",
        "valueExpression": "`column.1` < 2000",
        "targetColumn": "result.column"
      }
    }
  }
}
```

**Example: Column references**  
Values that are columns in the dataset.  
  

```
{
  "RecipeStep": {
    "Action": {
      "Operation": "BOOLEAN_OPERATION",
      "Parameters": {
        "trueValueExpression": "`column.2`",
        "falseValueExpression": "`column.3`",
        "valueExpression": "`column.1` < `column.4`",
        "targetColumn": "result.column"
      }
    }
  }
}
```

**Example: Aggregate results**  
Values that are calculated by aggregate functions. An aggregate function performs a calculation on a column, and returns a single value.  
  

```
{
  "RecipeStep": {
    "Action": {
      "Operation": "BOOLEAN_OPERATION",
      "Parameters": {
        "trueValueExpression": "`:mincolumn.2`",
        "falseValueExpression": "`:maxcolumn.3`",
        "valueExpression": "`column.1` < `:avgcolumn.4`",
        "withExpressions": "[{\"name\":\"mincolumn.2\",\"value\":\"min(`column.2`)\",\"type\":\"aggregate\"},{\"name\":\"maxcolumn.3\",\"value\":\"max(`column.3`)\",\"type\":\"aggregate\"},{\"name\":\"avgcolumn.4\",\"value\":\"avg(`column.4`)\",\"type\":\"aggregate\"}]",
        "targetColumn": "result.column"
      }
    }
  }
}
```
Users need to convert the JSON to a string by escaping.  
Note that the parameter names in trueValueExpression, falseValueExpression, and valueExpression must match the names in withExpressions. To use the aggregate results from some columns, you need to create parameters for them and provide the aggregate functions.

**Example:**  

```
{
  "RecipeStep": {
    "Action": {
      "Operation": "BOOLEAN_OPERATION",
      "Parameters": {
        "trueValueExpression": "It is true.",
        "falseValueExpression": "It is false.",
        "valueExpression": "`column.1` < 2000",
        "targetColumn": "result.column"
      }
    }
  }
}
```

**Example: and/or**  
You can use and and or to combine multiple conditions.  
  

```
{
  "RecipeStep": {
    "Action": {
      "Operation": "BOOLEAN_OPERATION",
      "Parameters": {
        "trueValueExpression": "It is true.",
        "falseValueExpression": "It is false.",
        "valueExpression": "`column.1` < 2000 and `column.2` >= `column.3",
        "targetColumn": "result.column"
      }
    }
  }
}
{
  "RecipeStep": {
    "Action": {
      "Operation": "BOOLEAN_OPERATION",
      "Parameters": {
        "trueValueExpression": "`column.4`",
        "falseValueExpression": "`column.5`",
        "valueExpression": "startsWith(`column1`, 'value1') or endsWith(`column2`, 'value2')",
        "targetColumn": "result.column"
      }
    }
  }
}
```

## Valid aggregate functions


The table below shows all of the valid aggregate functions that can be used in a boolean operation.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/databrew/latest/dg/recipe-actions.BOOLEAN_OPERATION.html)

## Valid conditions in a valueExpression


The table below shows supported conditions and the value expressions you can use.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/databrew/latest/dg/recipe-actions.BOOLEAN_OPERATION.html)

# CASE\$1OPERATION


Create a new column, based on the result of logical condition CASE. The case operation goes through case conditions and returns a value when the first condition is met. Once a condition is true, the operation stops reading and returns the result. If no conditions are true, it returns the default value.

**Parameters**
+ `valueExpression` – Conditions.
+ `withExpressions` – Configuration for aggregate results.
+ `targetColumn` – Name for the newly created column.

**Example**  
  

```
{
  "RecipeStep": {
    "Action": {
      "Operation": "CASE_OPERATION",
      "Parameters": {
        "valueExpression": "case when `column11` < `column.2` then 'result1' when `column2` < 'value2' then 'result2' else 'high' end",
        "targetColumn": "result.column"
      }
    }
  }
}
```

## Valid aggregate functions


The table below shows all of the valid aggregate functions that can be used in a case operation.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/databrew/latest/dg/recipe-actions.CASE_OPERATION.html)

## Valid conditions in a valueExpression


The table below shows supported conditions and the value expressions you can use.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/databrew/latest/dg/recipe-actions.CASE_OPERATION.html)

# FLAG\$1COLUMN\$1FROM\$1NULL


Creates a new column, based on the presence of null values in an existing column.

**Parameters**
+ `sourceColumn` – The name of an existing column.
+ `targetColumn` – The name of a new column to be created.
+ `flagType` – A value that must be set to `Null values`.
+ `trueString` – A value for the new column, if a null value is found in the source. If no value is specified, the default is `True`.
+ `falseString` – A value for the new column, if a non-null value is found in the source. If no value is specified, the default is `False`.

**Example**  
  

```
{
    "RecipeAction": {
        "Operation": "FLAG_COLUMN_FROM_NULL",
        "Parameters": {
            "flagType": "Null values",
            "sourceColumn": "weight_kg",
            "targetColumn": "is_weight_kg_missing"
        }
    }
}
```

# FLAG\$1COLUMN\$1FROM\$1PATTERN


Creates a new column, based on the presence of a user-specified pattern in an existing column.

**Parameters**
+ `sourceColumn` – The name of an existing column.
+ `targetColumn` – The name of a new column to be created.
+ `flagType` – A value that must be set to `Pattern`.
+ `pattern` – A regular expression, indicating the pattern to be evaluated.
+ `trueString` – A value for the new column, if a null value is found in the source. If no value is specified, the default is `True`.
+ `falseString` – A value for the new column, if a non-null value is found in the source. If no value is specified, the default is `False`.

**Example**  
  

```
{
    "RecipeAction": {
        "Operation": "FLAG_COLUMN_FROM_PATTERN",
        "Parameters": {
            "falseString": "No",
            "flagType": "Pattern",
            "pattern": "N.*",
            "sourceColumn": "wind_direction",
            "targetColumn": "northerly",
            "trueString": "yes"
        }
    }
}
```

# MERGE


Merges two or more columns into a new column.

**Parameters**
+ `sourceColumns` – A JSON-encoded string representing a list of one or more columns to be merged.
+ `delimiter` – An optional separator between the values, to appear in the target column.
+ `targetColumn` – The name of the merged column to be created.

**Example**  
  

```
{
    "RecipeAction": {
        "Operation": "MERGE",
        "Parameters": {
            "delimiter": " ",
            "sourceColumns": "[\"first_name\",\"last_name\"]",
            "targetColumn": "Merged Column 1"
        }
    }
}
```

# SPLIT\$1COLUMN\$1BETWEEN\$1DELIMITER


Splits a column into three new columns, according to a beginning and ending delimiter.

**Parameters**
+ `sourceColumn` – The name of an existing column.
+ `patternOption1` – A JSON-encoded string representing one or more characters that indicate the first delimiter.
+ `patternOption2` – A JSON-encoded string representing one or more characters that indicate the second delimiter.
+ `pattern` – One or more characters to use as a separator, when splitting the data.
+ `includeInSplit` – If true, includes the pattern in the new column; otherwise, the pattern is discarded.

**Example**  
  

```
{
    "RecipeAction": {
        "Operation": "SPLIT_COLUMN_BETWEEN_DELIMITER",
        "Parameters": {
            "patternOption1": "{\"pattern\":\"H\",\"includeInSplit\":true}",
            "patternOption2": "{\"pattern\":\"M\",\"includeInSplit\":true}",
            "sourceColumn": "last_name"
        }
    }
}
```

# SPLIT\$1COLUMN\$1BETWEEN\$1POSITIONS


Splits a column into three new columns, according to offsets that you specify.

**Parameters**
+ `sourceColumn` – The name of an existing column.
+ `startPosition` – The character position where the split is to begin.
+ `endPosition` – The character position where the split is to end.

**Example**  
  

```
{
    "RecipeAction": {
        "Operation": "SPLIT_COLUMN_BETWEEN_POSITIONS",
        "Parameters": {
            "endPosition": "12",
            "sourceColumn": "last_name",
            "startPosition": "2"
        }
    }
}
```

# SPLIT\$1COLUMN\$1FROM\$1END


Splits a column into two new columns, at an offset from the end of the string.

**Parameters**
+ `sourceColumn` – The name of an existing column.
+ `position` – The character position, from the right end of the string, where the split is to occur.

**Example**  
  

```
{
    "RecipeAction": {
        "Operation": "SPLIT_COLUMN_FROM_END",
        "Parameters": {
            "position": "1",
            "sourceColumn": "nationality"
        }
    }
}
```

# SPLIT\$1COLUMN\$1FROM\$1START


Splits a column into two new columns, at an offset from the beginning of the string.

**Parameters**
+ `sourceColumn` – The name of an existing column.
+ `position` – The character position, from the left end of the string, where the split is to occur.

**Example**  
  

```
{
    "RecipeAction": {
        "Operation": "SPLIT_COLUMN_FROM_START",
        "Parameters": {
            "position": "1",
            "sourceColumn": "first_name"
        }
    }
}
```

# SPLIT\$1COLUMN\$1MULTIPLE\$1DELIMITER


Splits a column according to multiple delimiters.

**Parameters**
+ `sourceColumn` – The name of an existing column.
+ `patternOptions` – A JSON-encoded string representing one or more patterns that determine the split criteria.
+ `pattern` – One or more characters to use as a separator, when splitting the data.
+ `limit` – How many splits to perform. The minimum is 1; the maximum is 20.
+ `includeInSplit` – If true, includes the pattern in the new column; otherwise, the pattern is discarded.

**Example**  
  

```
{
    "RecipeAction": {
        "Operation": "SPLIT_COLUMN_MULTIPLE_DELIMITER",
        "Parameters": {
            "limit": "1",
            "patternOptions": "[{\"pattern\":\",\",\"includeInSplit\":true},{\"pattern\":\" \",\"includeInSplit\":true}]",
            "sourceColumn": "description"
        }
    }
}
```

# SPLIT\$1COLUMN\$1SINGLE\$1DELIMITER


Splits a column into one or more new columns, according to a specific delimiter.

**Parameters**
+ `sourceColumn` – The name of an existing column.
+ `pattern` – One or more characters to use as a separator, when splitting the data.
+ `limit` – How many splits to perform. The minimum is 1; the maximum is 20.
+ `includeInSplit` – If true, includes the pattern in the new column; otherwise, the pattern is discarded.

**Example**  
  

```
{
    "RecipeAction": {
        "Operation": "SPLIT_COLUMN_SINGLE_DELIMITER",
        "Parameters": {
            "includeInSplit": "true",
            "limit": "1",
            "pattern": "/",
            "sourceColumn": "info_url"
        }
    }
}
```

# SPLIT\$1COLUMN\$1WITH\$1INTERVALS


Splits a column at intervals of *n* characters, where you specify *n*.

**Parameters**
+ `sourceColumn` – The name of an existing column.
+ `startPosition` – The character position where the split is to begin.
+ `interval` – The number of characters to skip before the next split.

**Example**  
  

```
{
    "RecipeAction": {
        "Operation": "SPLIT_COLUMN_WITH_INTERVALS",
        "Parameters": {
            "interval": "4",
            "sourceColumn": "nationality",
            "startPosition": "1"
        }
    }
}
```