

# Working with CQL queries in Amazon Keyspaces
Working with CQL queries

This section gives an introduction into working with queries in Amazon Keyspaces (for Apache Cassandra). The CQL statements available to query, transform, and manage data are `SELECT`, `INSERT`, `UPDATE`, and `DELETE`. The following topics outline some of the more complex options available when working with queries. For the complete language syntax with examples, see [DML statements (data manipulation language) in Amazon Keyspaces](cql.dml.md).

**Topics**
+ [

# Use the `IN` operator with the `SELECT` statement in a query in Amazon Keyspaces
](in.select.md)
+ [

# Use batch statements in Amazon Keyspaces
](batchStatements.md)
+ [

# Order results with `ORDER BY` in Amazon Keyspaces
](ordering-results.md)
+ [

# Paginate results in Amazon Keyspaces
](paginating-results.md)

# Use the `IN` operator with the `SELECT` statement in a query in Amazon Keyspaces
Use `IN` `SELECT`

**SELECT IN**

You can query data from tables using the `SELECT` statement, which reads one or more columns for one or more rows in a table and returns a result-set containing the rows matching the request. A `SELECT` statement contains a `select_clause` that determines which columns to read and to return in the result-set. The clause can contain instructions to transform the data before returning it. The optional `WHERE` clause specifies which rows must be queried and is composed of relations on the columns that are part of the primary key. Amazon Keyspaces supports the `IN` keyword in the `WHERE` clause. This section uses examples to show how Amazon Keyspaces processes `SELECT` statements with the `IN` keyword.

This examples demonstrates how Amazon Keyspaces breaks down the `SELECT` statement with the `IN` keyword into *subqueries*. In this example we use a table with the name `my_keyspace.customers`. The table has one primary key column `department_id`, two clustering columns `sales_region_id` and `sales_representative_id`, and one column that contains the name of the customer in the `customer_name` column.

```
SELECT * FROM my_keyspace.customers;

         department_id | sales_region_id | sales_representative_id | customer_name
        ---------------+-----------------+-------------------------+--------------
          0            |        0        |            0            |    a
          0            |        0        |            1            |    b
          0            |        1        |            0            |    c
          0            |        1        |            1            |    d
          1            |        0        |            0            |    e
          1            |        0        |            1            |    f
          1            |        1        |            0            |    g
          1            |        1        |            1            |    h
```

Using this table, you can run the following `SELECT` statement to find the customers in the departments and sales regions that you are interested in with the `IN` keyword in the `WHERE` clause. The following statement is an example of this.

```
SELECT * FROM my_keyspace.customers WHERE department_id IN (0, 1) AND sales_region_id IN (0, 1);
```

Amazon Keyspaces divides this statement into four subqueries as shown in the following output.

```
SELECT * FROM my_keyspace.customers WHERE department_id = 0 AND sales_region_id = 0;

 department_id | sales_region_id | sales_representative_id | customer_name
---------------+-----------------+-------------------------+--------------
  0            |        0        |           0             |    a
  0            |        0        |           1             |    b

SELECT * FROM my_keyspace.customers WHERE department_id = 0 AND sales_region_id = 1;

 department_id | sales_region_id | sales_representative_id | customer_name
---------------+-----------------+-------------------------+--------------
  0            |        1        |          0              |    c
  0            |        1        |          1              |    d

SELECT * FROM my_keyspace.customers WHERE department_id = 1 AND sales_region_id = 0;

 department_id | sales_region_id | sales_representative_id | customer_name
---------------+-----------------+-------------------------+--------------
  1            |        0        |          0              |    e
  1            |        0        |          1              |    f

SELECT * FROM my_keyspace.customers WHERE department_id = 1 AND sales_region_id = 1;

 department_id | sales_region_id | sales_representative_id | customer_name
---------------+-----------------+-------------------------+--------------
  1            |        1        |           0             |    g
  1            |        1        |           1             |    h
```

When the `IN` keyword is used, Amazon Keyspaces automatically paginates the results in any of the following cases:
+ After every 10th subquery is processed.
+ After processing 1MB of logical IO.
+ If you configured a `PAGE SIZE`, Amazon Keyspaces paginates after reading the number of queries for processing based on the set `PAGE SIZE`.
+ When you use the `LIMIT` keyword to reduce the number of rows returned, Amazon Keyspaces paginates after reading the number of queries for processing based on the set `LIMIT`.

 The following table is used to illustrate this with an example.

For more information about pagination, see [Paginate results in Amazon Keyspaces](paginating-results.md).

```
SELECT * FROM my_keyspace.customers;

         department_id | sales_region_id | sales_representative_id | customer_name
        ---------------+-----------------+-------------------------+--------------
          2            |        0        |          0              |    g
          2            |        1        |          1              |    h
          2            |        2        |          2              |    i
          0            |        0        |          0              |    a
          0            |        1        |          1              |    b
          0            |        2        |          2              |    c
          1            |        0        |          0              |    d
          1            |        1        |          1              |    e
          1            |        2        |          2              |    f
          3            |        0        |          0              |    j
          3            |        1        |          1              |    k
          3            |        2        |          2              |    l
```

You can run the following statement on this table to see how pagination works.

```
SELECT * FROM my_keyspace.customers WHERE department_id IN (0, 1, 2, 3) AND sales_region_id IN (0, 1, 2) AND sales_representative_id IN (0, 1);
```

Amazon Keyspaces processes this statement as 24 subqueries, because the cardinality of the Cartesian product of all the `IN` terms contained in this query is 24.

```
 department_id | sales_region_id | sales_representative_id | customer_name
---------------+-----------------+-------------------------+--------------
  0            |        0        |          0              |    a
  0            |        1        |          1              |    b
  1            |        0        |          0              |    d
  1            |        1        |          1              |    e

---MORE---
 department_id | sales_region_id | sales_representative_id | customer_name
---------------+-----------------+-------------------------+--------------
  2            |        0        |          0              |    g
  2            |        1        |          1              |    h
  3            |        0        |          0              |    j

---MORE---
 department_id | sales_region_id | sales_representative_id | customer_name
---------------+-----------------+-------------------------+--------------
  3            |        1        |          1              |    k
```

This example shows how you can use the `ORDER BY` clause in a `SELECT` statement with the `IN` keyword.

```
SELECT * FROM my_keyspace.customers WHERE department_id IN (3, 2, 1) ORDER BY sales_region_id DESC;
        
         department_id | sales_region_id | sales_representative_id | customer_name
        ---------------+-----------------+-------------------------+--------------
          3            |        2        |          2              |    l
          3            |        1        |          1              |    k
          3            |        0        |          0              |    j
          2            |        2        |          2              |    i
          2            |        1        |          1              |    h
          2            |        0        |          0              |    g
          1            |        2        |          2              |    f
          1            |        1        |          1              |    e
          1            |        0        |          0              |    d
```

Subqueries are processed in the order in which the partition key and clustering key columns are presented in the query. In the example below, subqueries for partition key value ”2“ are processed first, followed by subqueries for partition key value ”3“ and ”1“. Results of a given subquery are ordered according to the query's ordering clause, if present, or the table's clustering order defined during table creation. 

```
SELECT * FROM my_keyspace.customers WHERE department_id IN (2, 3, 1) ORDER BY sales_region_id DESC;

         department_id | sales_region_id | sales_representative_id | customer_name
        ---------------+-----------------+-------------------------+--------------
          2            |        2        |          2              |    i
          2            |        1        |          1              |    h
          2            |        0        |          0              |    g
          3            |        2        |          2              |    l
          3            |        1        |          1              |    k
          3            |        0        |          0              |    j
          1            |        2        |          2              |    f
          1            |        1        |          1              |    e
          1            |        0        |          0              |    d
```

# Use batch statements in Amazon Keyspaces
Use batch statements

You can combine multiple `INSERT`, `UPDATE`, and `DELETE` operations into a `BATCH` statement. `LOGGED` batches are the default.

```
batch_statement ::=     BEGIN [ UNLOGGED ] BATCH
                        [ USING update_parameter( AND update_parameter)* ]
                        modification_statement ( ';' modification_statement )*
                        APPLY BATCH
modification_statement ::= insert_statement | update_statement | delete_statement
```

When you run a batch statement, the driver combines all statements in the batch into a single batch operation.

To decide which type of batch operation to use, you can consider the following guidelines.

Use logged batches when:  
+ You need atomic transaction guarantees.
+ Slightly higher latencies are an acceptable trade-off.

Use unlogged batches when:  
+ You need to optimize single-partition operations.
+ You want to reduce network overhead.
+ You have high-throughput requirements.

For information about batch statement quotas, see [Quotas for Amazon Keyspaces (for Apache Cassandra)](quotas.md). 

## Unlogged batches


With **unlogged batches**, Amazon Keyspaces processes multiple operations as a single request without maintaining a batch log. With an unlogged batch operation, it's possible that some of the actions succeed while others fail. Unlogged batches are useful when you want to:
+ Optimize operations within a single partition.
+ Reduce network traffic by grouping related requests.

The syntax for an unlogged batch is similar to that of a logged batch, with the addition of the `UNLOGGED` keyword.

```
BEGIN UNLOGGED BATCH
    INSERT INTO users (id, firstname, lastname) VALUES (1, 'John', 'Doe');
    INSERT INTO users (id, firstname, lastname) VALUES (2, 'Jane', 'Smith');
APPLY BATCH;
```

## Logged batches


A **logged** batch combines multiple write actions into a single atomic operation. When you run a logged batch:
+ All actions either succeed together or fail together.
+ The operation is synchronous and idempotent.
+ You can write to multiple Amazon Keyspaces tables, as long as they are in the same AWS account and AWS Region.

Logged batches may have slightly higher latencies. For high-throughput applications, consider using unlogged batches.

There is no additional cost to use logged batches in Amazon Keyspaces. You pay only for the writes that are part of your batch operations. Amazon Keyspaces performs two underlying writes of every row in the batch: one to prepare the row for the batch and one to commit the batch. When planning capacity for tables that use logged batches, remember that each row in a batch requires twice the capacity of a standard write operation. For example, if your application runs one logged batch per second with three 1KB rows, you need to provision six write capacity units (WCUs) compared to only three WCUs for individual writes or unlogged batches. 

For information about pricing, see [Amazon Keyspaces (for Apache Cassandra) pricing](https://aws.amazon.com/keyspaces/pricing).

### Best practices for batch operations


Consider the following recommended practices when using Amazon Keyspaces batch operations.
+ Enable automatic scaling so that you have sufficient throughput capacity for your tables to handle batch operations and the additional throughput requirements of logged batches.
+ Use individual operations or unlogged batches when operations can run independently without affecting application correctness.
+ Design your application to minimize concurrent updates to the same rows, as simultaneous batch operations can conflict and fail.
+ For high-throughput bulk data ingestion without atomicity requirements, use individual write operations or unlogged batches.

### Consistency and concurrency


Amazon Keyspaces enforces the following consistency and concurrency rules for logged batches:
+ All batch operations use `LOCAL_QUORUM` consistency level.
+ Concurrent batches affecting different rows can execute simultaneously.
+ Concurrent `INSERT`, `UPDATE`, or `DELETE` operations on rows involved in an ongoing batch fail with a conflict.

### Supported operators and conditions


Supported `WHERE` clause operators:  
+ Equality (=)

Unsupported operators:  
+ Range operators (>, <, >=, <=)
+ `IN` operator
+ `LIKE` operator
+ `BETWEEN` operator

Not supported in logged batches:  
+ Multiple statements affecting the same row
+ Counter operations
+ Range deletes

### Failure conditions of logged batch statements


A logged batch operation may fail in any of the following cases:
+ Condition expressions (like `IF NOT EXISTS` or `IF`) evaluate to false.
+ One or more operations contain invalid parameters.
+ The request conflicts with another batch operation running on the same rows.
+ The table lacks sufficient provisioned capacity.
+ A row exceeds the maximum size limit.
+ The input data format is invalid.

### Batch statements and multi-Region replication


In multi-Region deployments:
+ Source Region operations are synchronous and atomic.
+ Destination Region operations are asynchronous.
+ All batch operations replicate to destination Regions, but may not maintain isolation during application.

### Monitor batch operations
Monitor batch operations

You can monitor batch operations using Amazon CloudWatch metrics to track performance, errors, and usage patterns. Amazon Keyspaces provides the following CloudWatch metrics for monitoring batch operations per table:
+ `SuccessfulRequestCount` – Track successful batch operations.
+ `Latency` – Measure batch operation performance.
+ `ConsumedWriteCapacityUnits` – Monitor capacity consumption of batch operations.

For more information, see [Amazon Keyspaces metrics](metrics-dimensions.md#keyspaces-metrics-dimensions).

In addition to CloudWatch metrics, you can use AWS CloudTrail to log all Amazon Keyspaces API actions. Each API action in the batch is logged in CloudTrail making it easier to track and audit batch operations in your Amazon Keyspaces tables.

### Batch operation examples
Examples

The following is an example of a basic logged batch statement.

```
BEGIN BATCH
    INSERT INTO users (id, firstname, lastname) VALUES (1, 'John', 'Doe');
    INSERT INTO users (id, firstname, lastname) VALUES (2, 'Jane', 'Smith');
APPLY BATCH;
```

This is an example of a batch that includes `INSERT`, `UPDATE`, and `DELETE` statements.

```
BEGIN BATCH
    INSERT INTO users (id, firstname, lastname) VALUES (1, 'John', 'Doe');
    UPDATE users SET firstname = 'Johnny' WHERE id = 2;
    DELETE FROM users WHERE id = 3;
APPLY BATCH;
```

This is an example of a batch using client-side timestamps.

```
BEGIN BATCH
    INSERT INTO users (id, firstname, lastname) VALUES (1, 'John', 'Stiles') USING TIMESTAMP 1669069624;
    INSERT INTO users (id, firstname, lastname) VALUES (2, 'Jane', 'Doe') USING TIMESTAMP 1669069624;
APPLY BATCH;

BEGIN BATCH
    UPDATE users USING TIMESTAMP 1669069624 SET firstname = 'Carlos' WHERE id = 1;
    UPDATE users USING TIMESTAMP 1669069624 SET firstname = 'Diego' WHERE id = 2;
APPLY BATCH;
```

This is an example of a conditional batch.

```
BEGIN BATCH
    INSERT INTO users (id, firstname, lastname) VALUES (1, 'Jane', 'Doe') IF NOT EXISTS;
    INSERT INTO users (id, firstname, lastname) VALUES (2, 'John', 'Doe') IF NOT EXISTS;
APPLY BATCH;


BEGIN BATCH
    UPDATE users SET lastname = 'Stiles' WHERE id = 1 IF lastname = 'Doe';
    UPDATE users SET lastname = 'Stiles' WHERE id = 2 IF lastname = 'Doe';
APPLY BATCH;
```

This is an example of a batch using Time to Live (TTL).

```
BEGIN BATCH
    INSERT INTO users (id, firstname, lastname) VALUES (1, 'John', 'Doe') USING TTL 3600;
    INSERT INTO users (id, firstname, lastname) VALUES (2, 'Jane', 'Smith') USING TTL 7200;
APPLY BATCH;
```

This is an example of a batch statement that updates multiple tables.

```
BEGIN BATCH
    INSERT INTO users (id, firstname) VALUES (1, 'John');
    INSERT INTO user_emails (user_id, email) VALUES (1, 'john@example.com');
APPLY BATCH;
```

This is an example of a batch operation using user-defined types (UDTs). The example assumes that the UDT `address` exists.

```
BEGIN BATCH
    INSERT INTO users (id, firstname, address)
    VALUES (1, 'John', {street: '123 Main St', city: 'NYC', zip: '10001'});
    INSERT INTO users (id, firstname, address)
    VALUES (2, 'Jane', {street: '456 Oak Ave', city: 'LA', zip: '90210'});
APPLY BATCH;

BEGIN BATCH
    UPDATE users SET address.zip = '10002' WHERE id = 1;
    UPDATE users SET address.city = 'Boston' WHERE id = 2;
APPLY BATCH;
```

# Order results with `ORDER BY` in Amazon Keyspaces
Order results

The `ORDER BY` clause specifies the sort order of the results returned in a `SELECT` statement. The statement takes a list of column names as arguments and for each column you can specify the sort order for the data. You can only specify clustering columns in ordering clauses, non-clustering columns are not allowed.

The two available sort order options for the returned results are `ASC` for ascending and `DESC` for descending sort order. 

```
SELECT * FROM my_keyspace.my_table ORDER BY (col1 ASC, col2 DESC, col3 ASC);

         col1 | col2 | col3  
        ------+------+------
          0   |  6   |  a   
          1   |  5   |  b   
          2   |  4   |  c   
          3   |  3   |  d   
          4   |  2   |  e   
          5   |  1   |  f   
          6   |  0   |  g
```

```
SELECT * FROM my_keyspace.my_table ORDER BY (col1 DESC, col2 ASC, col3 DESC);

         col1 | col2 | col3  
        ------+------+------
          6   |  0   |  g   
          5   |  1   |  f   
          4   |  2   |  e   
          3   |  3   |  d   
          2   |  4   |  c   
          1   |  5   |  b   
          0   |  6   |  a
```

If you don't specify the sort order in the query statement, the default ordering of the clustering column is used. 

The possible sort orders you can use in an ordering clause depend on the sort order assigned to each clustering column at table creation. Query results can only be sorted in the order defined for all clustering columns at table creation or the inverse of the defined sort order. Other possible combinations are not allowed.

For example, if the table's `CLUSTERING ORDER` is (col1 ASC, col2 DESC, col3 ASC), then the valid parameters for `ORDER BY` are either (col1 ASC, col2 DESC, col3 ASC) or (col1 DESC, col2 ASC, col3 DESC). For more information on `CLUSTERING ORDER`, see `table_options` under [CREATE TABLE](cql.ddl.table.md#cql.ddl.table.create).

# Paginate results in Amazon Keyspaces
Paginate results

Amazon Keyspaces automatically *paginates* the results from `SELECT` statements when the data read to process the `SELECT` statement exceeds 1 MB. With pagination, the `SELECT` statement results are divided into "pages" of data that are 1 MB in size (or less). An application can process the first page of results, then the second page, and so on. Clients should always check for pagination tokens when processing `SELECT` queries that return multiple rows.

 If a client supplies a `PAGE SIZE` that requires reading more than 1 MB of data, Amazon Keyspaces breaks up the results automatically into multiple pages based on the 1 MB data-read increments.

For example, if the average size of a row is 100 KB and you specify a `PAGE SIZE` of 20, Amazon Keyspaces paginates data automatically after it reads 10 rows (1000 KB of data read). 

Because Amazon Keyspaces paginates results based on the number of rows that it reads to process a request and not the number of rows returned in the result set, some pages may not contain any rows if you are running filtered queries. 

For example, if you set `PAGE SIZE` to 10 and Keyspaces evaluates 30 rows to process your `SELECT` query, Amazon Keyspaces will return three pages. If only a subset of the rows matched your query, some pages may have less than 10 rows. For an example how the `PAGE SIZE` of `LIMIT` queries can affect read capacity, see [Estimate the read capacity consumption of limit queries](limit_queries.md).

For a comparison with Apache Cassandra pagination, see [Pagination](functional-differences.md#functional-differences.paging).