

# Use path extractors
<a name="ion-serde-using-path-extractors"></a>

Amazon Ion is a document style file format, but Apache Hive is a flat columnar format. You can use special Amazon Ion SerDe properties called `path extractors` to map between the two formats. Path extractors flatten the hierarchical Amazon Ion format, map Amazon Ion values to Hive columns, and can be used to rename fields.

Athena can generate the extractors for you, but you can also define your own extractors if necessary.

**Topics**
+ [Use Athena generated path extractors](ion-serde-generated-path-extractors.md)
+ [Specify your own path extractors](ion-serde-specifying-your-own-path-extractors.md)
+ [Use search paths in path extractors](ion-serde-using-search-paths-in-path-extractors.md)
+ [Path extractor examples](ion-serde-examples.md)

# Use Athena generated path extractors
<a name="ion-serde-generated-path-extractors"></a>

By default, Athena searches for top level Amazon Ion values that match Hive column names and creates path extractors at runtime based on these matching values. If your Amazon Ion data format matches the Hive table schema, Athena dynamically generates the extractors for you, and you do not need to add any additional path extractors. These default path extractors are not stored in the table metadata.

The following example shows how Athena generates extractors based on column name.

```
-- Example Amazon Ion Document
{
    identification: {
        name: "John Smith",
        driver_license: "XXXX"
    },
    
    alias: "Johnny"    
}

-- Example DDL
CREATE EXTERNAL TABLE example_schema2 (
    identification MAP<STRING, STRING>,
    alias STRING
)
STORED AS ION
LOCATION 's3://amzn-s3-demo-bucket/path_extraction1/'
```

The following example extractors are generated by Athena. The first extracts the `identification` field to the `identification` column, and the second extracts the `alias` field to the `alias` column.

```
'ion.identification.path_extractor' = '(identification)'
'ion.alias.path_extractor' = '(alias)'
```

The following example shows the extracted table.

```
|                  identification                    |  alias   |
|----------------------------------------------------|----------|
|{["name", "driver_license"],["John Smith", "XXXX"]} | "Johnny" |
```

# Specify your own path extractors
<a name="ion-serde-specifying-your-own-path-extractors"></a>

If your Amazon Ion fields do not map neatly to Hive columns, you can specify your own path extractors. In the `WITH SERDEPROPERTIES` clause of your `CREATE TABLE` statement, use the following syntax.

```
WITH SERDEPROPERTIES (
   "ion.path_extractor.case_sensitive" = "<Boolean>", 
   "ion.<column_name>.path_extractor" = "<path_extractor_expression>"
)
```

**Note**  
By default, path extractors are case insensitive. To override this setting, set the [ion.path_extractor.case_sensitive](ion-serde-using-ion-serde-properties.md#ioncase) SerDe property to `true`.

# Use search paths in path extractors
<a name="ion-serde-using-search-paths-in-path-extractors"></a>

The SerDe property syntax for path extractor contains a *<path\$1extractor\$1expression>*:

```
"ion.<column_name>.path_extractor" = "<path_extractor_expression>"         
```

You can use the *<path\$1extractor\$1expression>* to specify a search path that parses the Amazon Ion document and finds matching data. The search path is enclosed in parenthesis and can contain one or more of the following components separated by spaces.
+ **Wild card** – Matches all values.
+ **Index** – Matches the value at the specified numerical index. Indices are zero-based.
+ **Text** – Matches all values whose field names match are equivalent to the specified text.
+ **Annotations** – Matches values specified by a wrapped path component that has the annotations specified.

The following example shows an Amazon Ion document and some example search paths.

```
-- Amazon Ion document
{
    foo: ["foo1", "foo2"] ,
    bar: "myBarValue", 
    bar: A::"annotatedValue"
}

-- Example search paths
(foo 0)       # matches "foo1"
(1)           # matches "myBarValue"
(*)           # matches ["foo1", "foo2"], "myBarValue" and A::"annotatedValue"
()            # matches {foo: ["foo1", "foo2"] , bar: "myBarValue", bar: A::"annotatedValue"}
(bar)         # matches "myBarValue" and A::"annotatedValue"
(A::bar)      # matches A::"annotatedValue"
```

# Path extractor examples
<a name="ion-serde-examples"></a>

The following path extractor examples show how to flatten and rename fields or extract data as Amazon Ion text.

## Flatten and rename fields
<a name="ion-serde-flattening-and-renaming-fields"></a>

The following example shows a set of search paths that flatten and rename fields. The example uses search paths to do the following:
+ Map the `nickname` column to the `alias` field
+ Map the `name` column to the `name` subfield located in the `identification` struct.

Following is the example Amazon Ion document.

```
-- Example Amazon Ion Document
{
    identification: {
        name: "John Smith",
        driver_license: "XXXX"
    },
    
    alias: "Johnny"    
}
```

The following is the example `CREATE TABLE` statement that defines the path extractors.

```
-- Example DDL Query
CREATE EXTERNAL TABLE example_schema2 (
    name STRING,
    nickname STRING
)
ROW FORMAT SERDE
 'com.amazon.ionhiveserde.IonHiveSerDe'
WITH SERDEPROPERTIES (
 'ion.nickname.path_extractor' = '(alias)',
 'ion.name.path_extractor' = '(identification name)'
 )
STORED AS ION
LOCATION 's3://amzn-s3-demo-bucket/path_extraction2/'
```

The following example shows the extracted data.

```
-- Extracted Table
| name         |   nickname   |
|--------------|--------------|
| "John Smith" |  "Johnny"    |
```

For more information about search paths and additional search path examples, see the [Ion Java Path Extraction](https://github.com/amzn/ion-java-path-extraction) page on GitHub.

## Extract flight data to text format
<a name="ion-serde-extracting-flight-data-to-text-format"></a>

The following example `CREATE TABLE` query uses `WITH SERDEPROPERTIES` to add path extractors to extract flight data and specify the output encoding as Amazon Ion text. The example uses the `STORED AS ION` syntax.

```
CREATE EXTERNAL TABLE flights_ion (
    yr INT,
    quarter INT,
    month INT,
    dayofmonth INT,
    dayofweek INT,
    flightdate STRING,
    uniquecarrier STRING,
    airlineid INT,
)
ROW FORMAT SERDE
 'com.amazon.ionhiveserde.IonHiveSerDe'
WITH SERDEPROPERTIES (
 'ion.encoding' = 'TEXT',
 'ion.yr.path_extractor'='(year)',
 'ion.quarter.path_extractor'='(results quarter)',
 'ion.month.path_extractor'='(date month)')
STORED AS ION
LOCATION 's3://amzn-s3-demo-bucket/'
```