I was exploring DynamoDB query command using AWS cli. What made me confused is the page-size option that comes with AWS cli command. General perception is that when we add page-size to a query command, it will return that many items in the result set.
So when I ran command similar to below, my expectation was that I would get max 5 items as I had added page-size 5 from AWS cli command line.
aws dynamodb query --table-name test_table --key-condition-expression "pk=:pk" --expression-attribute-values file://test-query.json --endpoint-url http://1ocalhost:4566 --page-size 5
But I got 14 items in result set which was actually total number of items that satisfied the query condition. So I had to go through the documents online. I found out that in AWS cli, page-size option has a different purpose.
To give you an example, suppose our query matches 1000 elements. AWS cli will make a DynamoDB service call to fetch the items. If querying & returning 1000 elements in a single DynamoDB call takes time, AWS cli might get a timeout from DynamoDB service. And in your command line, you will get an error as an output.
To solve this, page-size option is used. When I add page-size 5 option in AWS cli query command, AWS cli will make multiple DynamoDB service calls & each call will return a set of 5 items or less (when there are no more elements to return). AWS cli will aggregate all these results internally & produce the output to you. That way you can avoid timeout error from AWS service. Instead of using one large call with 1000 default page size, you are making smaller calls with page size 5 internally.
As a client, you won’t find any difference in result set between calls with or without page-size option. Response payload for my query showed “ScannedCount”: 14 which was same as the total matched items in DynamoDB test table.
What about limit option then? limit option is a DynamoDB query specific option & enables pagination. When I entered below query with limit 5 option, I got 5 items in the result set and a “LastEvaluatedKey” field to do the forward pagination. “LastEvaluatedKey” field is basically the entry that needs to be passed in the next request to fetch next set of results.
aws dynamodb query --table-name test_table --key-condition-expression "pk=:pk" --expression-attribute-values file://test-query.json --endpoint-url http://1ocalhost:4566 --limit 5
Also result json showed “ScannedCount”: 5 which means DynamoDB query evaluated only 5 elements instead of all 14. That is an improvement & we have an optimized pagination technique.
I hope the difference between page-size & limit options in AWS cli query is clear now.