

# Work with CSV data in AWS Glue
<a name="schema-csv"></a>

This page describes how to use AWS Glue to create schema from CSV files that have quotes around the data values for each column or from CSV files that include header values.

## Handling CSV data enclosed in quotes
<a name="schema-csv-quotes"></a>

Suppose a CSV file has data fields enclosed in double quotes, as in the following example.

```
"John","Doe","123-555-1231","John said \"hello\""
"Jane","Doe","123-555-9876","Jane said \"hello\""
```

To run a query in Athena on a table created from a CSV file that has quoted values, you must modify the table properties in AWS Glue to use the OpenCSVSerDe. For more information about the OpenCSV SerDe, see [Open CSV SerDe for processing CSV](csv-serde.md).

**To edit table properties in the AWS Glue console**

1. In the AWS Glue console navigation pane, choose **Tables**.

1. Choose the link for the table that you want to edit, and then choose **Actions**, **Edit table**.

1. On the **Edit table** page, make the following changes:
   + For **Serialization lib**, enter `org.apache.hadoop.hive.serde2.OpenCSVSerde`.
   + For **Serde parameters**, enter the following values for the keys `escapeChar`, `quoteChar`, and `separatorChar`: 
     + For `escapeChar`, enter a backslash (**\$1**).
     + For `quoteChar`, enter a double quote (**"**).
     + For `separatorChar`, enter a comma (**,**).

1. Choose **Save**.

For more information, see [Viewing and editing table details](https://docs.aws.amazon.com/glue/latest/dg/console-tables.html#console-tables-details) in the *AWS Glue Developer Guide*.

You can also update AWS Glue table properties programmatically. Use the AWS Glue [UpdateTable](https://docs.aws.amazon.com/glue/latest/webapi/API_UpdateTable.html) API operation or the [update-table](https://docs.aws.amazon.com/cli/latest/reference/glue/update-table.html) AWS CLI command to modify the `SerDeInfo` block in the table definition, as in the following JSON example.

```
"SerDeInfo": {
   "name": "",
   "serializationLib": "org.apache.hadoop.hive.serde2.OpenCSVSerde",
   "parameters": {
      "separatorChar": ","
      "quoteChar": "\""
      "escapeChar": "\\"
      }
},
```

## Handling CSV files with headers
<a name="schema-csv-headers"></a>

When you define a table in Athena with a `CREATE TABLE` statement, you can use the `skip.header.line.count` table property to ignore headers in your CSV data, as in the following example.

```
...
STORED AS TEXTFILE
LOCATION 's3://amzn-s3-demo-bucket/csvdata_folder/';
TBLPROPERTIES ("skip.header.line.count"="1")
```

Alternatively, you can remove the CSV headers beforehand so that the header information is not included in Athena query results. One way to achieve this is to use AWS Glue jobs, which perform extract, transform, and load (ETL) work. You can write scripts in AWS Glue using a language that is an extension of the PySpark Python dialect. For more information, see [Authoring Jobs in AWS Glue](https://docs.aws.amazon.com/glue/latest/dg/author-job-glue.html) in the *AWS Glue Developer Guide*.

The following example shows a function in an AWS Glue script that writes out a dynamic frame using `from_options`, and sets the `writeHeader` format option to false, which removes the header information:

```
glueContext.write_dynamic_frame.from_options(frame = applymapping1, connection_type = "s3", connection_options = {"path": "s3://amzn-s3-demo-bucket/MYTABLEDATA/"}, format = "csv", format_options = {"writeHeader": False}, transformation_ctx = "datasink2")
```