

# Data cleaning recipe steps


Use these data cleaning recipe steps to perform simple transformations on existing data.

**Topics**
+ [

# CAPITAL\$1CASE
](recipe-actions.CAPITAL_CASE.md)
+ [

# FORMAT\$1DATE
](recipe-actions.FORMAT_DATE.md)
+ [

# LOWER\$1CASE
](recipe-actions.LOWER_CASE.md)
+ [

# UPPER\$1CASE
](recipe-actions.UPPER_CASE.md)
+ [

# SENTENCE\$1CASE
](recipe-actions.SENTENCE_CASE.md)
+ [

# ADD\$1DOUBLE\$1QUOTES
](recipe-actions.ADD_DOUBLE_QUOTES.md)
+ [

# ADD\$1PREFIX
](recipe-actions.ADD_PREFIX.md)
+ [

# ADD\$1SINGLE\$1QUOTES
](recipe-actions.ADD_SINGLE_QUOTES.md)
+ [

# ADD\$1SUFFIX
](recipe-actions.ADD_SUFFIX.md)
+ [

# EXTRACT\$1BETWEEN\$1DELIMITERS
](recipe-actions.EXTRACT_BETWEEN_DELIMITERS.md)
+ [

# EXTRACT\$1BETWEEN\$1POSITIONS
](recipe-actions.EXTRACT_BETWEEN_POSITIONS.md)
+ [

# EXTRACT\$1PATTERN
](recipe-actions.EXTRACT_PATTERN.md)
+ [

# EXTRACT\$1VALUE
](recipe-actions.EXTRACT_VALUE.md)
+ [

# REMOVE\$1COMBINED
](recipe-actions.REMOVE_COMBINED.md)
+ [

# REPLACE\$1BETWEEN\$1DELIMITERS
](recipe-actions.REPLACE_BETWEEN_DELIMITERS.md)
+ [

# REPLACE\$1BETWEEN\$1POSITIONS
](recipe-actions.REPLACE_BETWEEN_POSITIONS.md)
+ [

# REPLACE\$1TEXT
](recipe-actions.REPLACE_TEXT.md)

# CAPITAL\$1CASE


Changes each string in a column to capitalize each word. In *capital case, *the first letter of each word is capitalized and the rest of the word is transformed to lowercase. An example is: The Quick Brown Fox Jumped Over The Fence.

**Parameters**
+ `sourceColumn` – The name of an existing column.

**Example**  
  

```
{
    "RecipeAction": {
        "Operation": "CAPITAL_CASE",
        "Parameters": {
            "sourceColumn": "last_name"
        }
    }
}
```

# FORMAT\$1DATE


Returns a column in which a date string is converted into a formatted value.

**Parameters**
+ `sourceColumn` – The name of an existing column.
+ `targetDateFormat` – One of the following date formats:
  + `mm/dd/yyyy`
  + `mm-dd-yyyy`
  + `dd month yyyy`
  + `month yyyy`
  + `dd month`

**Example**  
  

```
{
    "RecipeAction": {
        "Operation": "FORMAT_DATE",
        "Parameters": {
            "sourceColumn": "birth_date",
            "targetDateFormat": "mm-dd-yyyy"
        }
    }
}
```

# LOWER\$1CASE


Changes each string in a column to lowercase, for example: the quick brown fox jumped over the fence

**Parameters**
+ `sourceColumn` – The name of an existing column.

**Example**  
  

```
{
    "RecipeAction": {
        "Operation": "LOWER_CASE",
        "Parameters": {
            "sourceColumn": "nationality"
        }
    }
}
```

# UPPER\$1CASE


Changes each string in a column to uppercase, for example: THE QUICK BROWN FOX JUMPED OVER THE FENCE

**Parameters**
+ `sourceColumn` – The name of an existing column.

**Example**  
  

```
{
    "RecipeAction": {
        "Operation": "UPPER_CASE",
        "Parameters": {
            "sourceColumn": "nationality"
        }
    }
}
```

# SENTENCE\$1CASE


Changes each string in a column to sentence case. In *sentence case, *the first letter of each sentence is capitalized, and the rest of the sentence is transformed to lowercase. An example is: The quick brown fox. Jumped over. The fence

**Parameters**
+ `sourceColumn` – The name of an existing column.

**Example**  
  

```
{
    "RecipeAction": {
        "Operation": "SENTENCE_CASE",
        "Parameters": {
            "sourceColumn": "description"
        }
    }
}
```

# ADD\$1DOUBLE\$1QUOTES


Encloses the characters in a column with double quotation marks.

**Parameters**
+ `sourceColumn` – The name of an existing column.

**Example**  
  

```
{
    "RecipeAction": {
        "Operation": "ADD_DOUBLE_QUOTES",
        "Parameters": {
            "sourceColumn": "info_url"
        }
    }
}
```

# ADD\$1PREFIX


Adds one or more characters, concatenating them as a prefix to the beginning of a column.

**Parameters**
+ `sourceColumn` – The name of an existing column.
+ `pattern` – The character or characters to place at the beginning of the column values.

**Example**  
  

```
{
    "RecipeAction": {
        "Operation": "ADD_PREFIX",
        "Parameters": {
            "pattern": "aaa",
            "sourceColumn": "info_url"
        }
    }
}
```

# ADD\$1SINGLE\$1QUOTES


Encloses the characters in a column with single quotation marks.

**Parameters**
+ `sourceColumn` – The name of an existing column.

**Example**  
  

```
{
    "RecipeAction": {
        "Operation": "ADD_SINGLE_QUOTES",
        "Parameters": {
            "sourceColumn": "info_url"
        }
    }
}
```

# ADD\$1SUFFIX


Adds one more characters concatenating them as a suffix to the end of a column.

**Parameters**
+ `sourceColumn` – The name of an existing column.
+ `pattern` – The character or characters to place at the end of the column.

**Example**  
  

```
{
    "RecipeAction": {
        "Operation": "ADD_SUFFIX",
        "Parameters": {
            "pattern": "bbb",
            "sourceColumn": "info_url"
        }
    }
}
```

# EXTRACT\$1BETWEEN\$1DELIMITERS


Creates a new column, based on delimiters, from the values in an existing column. 

**Parameters**
+ `sourceColumn` – The name of an existing column.
+ `targetColumn` – The name of the new column to be created.
+ `startPattern` – A regular expression, indicating the character or characters that begin the delimited values.
+ `endPattern` – A regular expression, indicating the delimiter character or characters that end the delimited values.

**Example**  
  

```
{
    "RecipeAction": {
        "Operation": "EXTRACT_BETWEEN_DELIMITERS",
        "Parameters": {
            "endPattern": "\\/",
            "sourceColumn": "info_url",
            "startPattern": "\\/\\/",
            "targetColumn": "raw_url"
        }
    }
}
```

# EXTRACT\$1BETWEEN\$1POSITIONS


Creates a new column, based on character positions, from the values in an existing column. 

**Parameters**
+ `sourceColumn` – The name of an existing column.
+ `targetColumn` – The name of the new column to be created.
+ `startPosition` – The character position at which to perform the extract.
+ `endPosition` – The character position at which to end the extract.

**Example**  
  

```
{
    "RecipeAction": {
        "Operation": "EXTRACT_BETWEEN_POSITIONS",
        "Parameters": {
            "endPosition": "9",
            "sourceColumn": "last_name",
            "startPosition": "3",
            "targetColumn": "characters_3_to_9"
        }
    }
}
```

# EXTRACT\$1PATTERN


Creates a new column, based on a regular expression, from the values in an existing column. 

**Parameters**
+ `sourceColumn` – The name of an existing column.
+ `targetColumn` – The name of the new column to be created.
+ `pattern` – A regular expression that indicates which character or characters to extract and create the new column from.

**Example**  
  

```
{
    "RecipeAction": {
        "Operation": "EXTRACT_PATTERN",
        "Parameters": {
            "pattern": "^....*...$",
            "sourceColumn": "last_name",
            "targetColumn": "first_and_last_few_characters"
        }
    }
}
```

# EXTRACT\$1VALUE


Creates a new column with an extracted value from a user-specified path. If the source column is of the Map, Array, or Struct type, each field in the path should be escaped using back ticks (for example, `name`).

**Parameters**
+ `targetColumn` – The name of the target column.
+ `sourceColumn` – Name of the source column from which the value is to be extracted.
+ `path` – The path to the specific key that the user wants to extract. If the source column is of the Map, Array, or Struct type, each field in the path should be escaped using back ticks (for example, `name`).

  Consider the following example of user information:

  ```
                     user {
                        name: “Ammy” 
                        address: {
                           state: "CA",
                           zipcode: 12345
                        },
                        phoneNumber:{"home": "123123123", "work": "456456456"}
                        citizenship: ["Canada", "USA", "Mexico", "India"]
                     }
  ```

  The following are examples of the paths you would provide, depending on the type of the source column:
  + If the source column is of the type **map**, the path for extracting the home phone number is:

    ``user`.`phoneNumber`.`home``
  + If the source column is of the type **array**, the path for extracting the second "citizenship" value is:

    ``user`.`citizenship`[1]`
  + If the source column is of the type **struct**, the path for extracting the zip code is:

    ``user`.`address`.`zipcode``



**Example**  
  

```
{
    "RecipeAction": {
        "Operation": "EXTRACT_VALUE",
        "Parameters": {
            "sourceColumn": "age",
            "targetColumn": "columnName",
            "path": "`age`.`name`",
        }
    }
}
```

# REMOVE\$1COMBINED


Removes one or more characters from a column, according to what a user specifies.

**Parameters**
+ `sourceColumn` – The name of an existing column.
+ `collapseConsecutiveWhitespace` – If `true`, replaces two or more white-space characters with exactly one white-space character. 
+ `removeAllPunctuation` – If `true`, removes all of the following characters: `. ! , ?`
+ `removeAllQuotes` – If `true`, removes all single quotation marks and double quotation marks.
+ `removeAllWhitespace` – If `true`, removes all white-space characters.
+ `customCharacters` – One or more characters that can be acted upon.
+ `customValue` – A value that can be acted upon.
+ `removeCustomCharacters` – If `true`, removes all characters specified by `customCharacters` parameter.
+ `removeCustomValue` – If `true`, removes all characters specified by `customValue` parameter.
+ `punctuationally` – If `true`, removes the following characters if they occur at the start or end of the value:`. ! , ?`
+ `antidisestablishmentarianism` – If `true`, removes single quotation marks and double quotation marks from the beginning and end of the value.
+ `removeLeadingAndTrailingWhitespace` – If `true`, removes all white spaces from the beginning and end of the value.
+ `removeLetters` – If `true`, removes all uppercase and lowercase alphabetic characters (`A` through `Z`; `a` through `z`).
+ `removeNumbers` – If `true`, removes all numeric characters (`0` through `9`).
+ `removeSpecialCharacters` – If `true`, removes all of the following characters: `! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~`

**Examples**  
  

```
{
    "RecipeAction": {
        "Operation": "REMOVE_COMBINED",
        "Parameters": {
            "collapseConsecutiveWhitespace": "false",
            "removeAllPunctuation": "false",
            "removeAllQuotes": "false",
            "removeAllWhitespace": "false",
            "removeCustomCharacters": "false",
            "removeCustomValue": "false",
            "removeLeadingAndTrailingPunctuation": "false",
            "removeLeadingAndTrailingQuotes": "false",
            "removeLeadingAndTrailingWhitespace": "false",
            "removeLetters": "false",
            "removeNumbers": "false",
            "removeSpecialCharacters": "true",
            "sourceColumn": "info_url"
        }
    }
}
```

```
{
    "RecipeAction": {
        "Operation": "REMOVE_COMBINED",
        "Parameters": {
            "collapseConsecutiveWhitespace": "false",
            "customCharacters": "¶",
            "removeAllPunctuation": "false",
            "removeAllQuotes": "false",
            "removeAllWhitespace": "false",
            "removeCustomCharacters": "true",
            "removeCustomValue": "false",
            "removeLeadingAndTrailingPunctuation": "false",
            "removeLeadingAndTrailingQuotes": "false",
            "removeLeadingAndTrailingWhitespace": "false",
            "removeLetters": "false",
            "removeNumbers": "false",
            "removeSpecialCharacters": "false",
            "sourceColumn": "info_url"
        }
    }
}
```

```
{
    "RecipeAction": {
        "Operation": "REMOVE_COMBINED",
        "Parameters": {
            "collapseConsecutiveWhitespace": "true",
            "customValue": "M",
            "removeAllPunctuation": "true",
            "removeAllQuotes": "false",
            "removeAllWhitespace": "false",
            "removeCustomCharacters": "false",
            "removeCustomValue": "true",
            "removeLeadingAndTrailingPunctuation": "false",
            "removeLeadingAndTrailingQuotes": "true",
            "removeLeadingAndTrailingWhitespace": "true",
            "removeLetters": "true",
            "removeNumbers": "true",
            "removeSpecialCharacters": "false",
            "sourceColumn": "info_url"
        }
    }
}
```

```
{
    "RecipeAction": {
        "Operation": "REMOVE_COMBINED",
        "Parameters": {
            "collapseConsecutiveWhitespace": "false",
            "removeAllPunctuation": "false",
            "removeAllQuotes": "false",
            "removeAllWhitespace": "false",
            "removeCustomCharacters": "false",
            "removeCustomValue": "false",
            "removeLeadingAndTrailingPunctuation": "false",
            "removeLeadingAndTrailingQuotes": "false",
            "removeLeadingAndTrailingWhitespace": "false",
            "removeLetters": "false",
            "removeNumbers": "true",
            "removeSpecialCharacters": "false",
            "sourceColumn": "first_name"
        }
    }
}
```

```
{
    "RecipeAction": {
        "Operation": "REMOVE_COMBINED",
        "Parameters": {
            "collapseConsecutiveWhitespace": "false",
            "removeAllPunctuation": "false",
            "removeAllQuotes": "false",
            "removeAllWhitespace": "false",
            "removeCustomCharacters": "false",
            "removeCustomValue": "false",
            "removeLeadingAndTrailingPunctuation": "false",
            "removeLeadingAndTrailingQuotes": "false",
            "removeLeadingAndTrailingWhitespace": "false",
            "removeLetters": "false",
            "removeNumbers": "true",
            "removeSpecialCharacters": "false",
            "sourceColumn": "first_name"
        }
    }
}
```

# REPLACE\$1BETWEEN\$1DELIMITERS


Replaces the characters between two delimiters with user-specified text.

**Parameters**
+ `sourceColumn` – The name of an existing column.
+ `startPattern` – Character or characters or a regular expression, indicating where the substitution is to begin.
+ `endPattern` – Character or characters or a regular expression, indicating where the substitution is to end.
+ `value` – The replacement character or characters to be substituted.

**Example**  
  

```
{
    "RecipeAction": {
        "Operation": "REPLACE_BETWEEN_DELIMITERS",
        "Parameters": {
            "endPattern": ">",
            "sourceColumn": "last_name",
            "startPattern": "&lt;",
            "value": "?"
        }
    }
}
```

# REPLACE\$1BETWEEN\$1POSITIONS


Replaces the characters between two positions with user-specified text.

**Parameters**
+ `sourceColumn` – The name of an existing column.
+ `startPosition` – A number indicting at what character position in the string the substitution is to begin.
+ `endPosition` – A number indicting at what character position in the string the substitution is to end.
+ `value` – The replacement character or characters to be substituted.

**Example**  
  

```
{
    "RecipeAction": {
        "Operation": "REPLACE_BETWEEN_POSITIONS",
        "Parameters": {
            "endPosition": "20",
            "sourceColumn": "nationality",
            "startPosition": "10",
            "value": "E"
        }
    }
}
```

# REPLACE\$1TEXT


Replaces a specified sequence of characters with another.

**Parameters**
+ `sourceColumn` – The name of an existing column.
+ `pattern` – Character or characters or a regular expression, indicating which characters should be replaced in the source column.
+ `value` – The replacement character or characters to be substituted.

**Examples**  
  

```
{
    "RecipeAction": {
        "Operation": "REPLACE_TEXT",
        "Parameters": {
            "pattern": "x",
            "sourceColumn": "first_name",
            "value": "a"
        }
    }
}
```

```
{
    "RecipeAction": {
        "Operation": "REPLACE_TEXT",
        "Parameters": {
            "pattern": "[0-9]",
            "sourceColumn": "nationality",
            "value": "!"
        }
    }
}
```