salesforce data migration services

PK Chunking for Large Data Volumes in Salesforce

In this blog ‘PK Chunking for Large Data Volumes in Salesforce’ I am going to walk through about how it can help get desired results in large data Volumes in Salesforce.

What is PK Chunking?

PK Chunking stands for Primary Key Chunking. Primary Key is nothing but the object’s record ID which is always indexed. It is a method by which first the chunks of records with Sequential IDs within a table are identified and then separate queries which those chunked Primary Keys are executed and finally, the results are combined.

Business Use Case

A B2C Company has 100,000 Accounts and Contacts and logs order against every account in the Salesforce system. The Salesforce Architect has suggested using a custom object for Order (Order__c) which is on the detail side for the master-detail relationship for the account. On average, each customer orders twice in a month which results in a large amount of data set for a year. Business is asking for a report to get how many orders have been placed so far from the accounts that are in the north region. The current data size of the order records this year is estimated to 20 Million records. Now when a system administrator is trying to run a bulk API query to get all the orders placed in the north region, A time out error occurs. Salesforce Architect now has to roll up his sleeves and look for a solution to fix the issues.

Other Common Errors

“System.QueryException: Non-selective query against large object type.”

“Too many Query Rows”

“Response Size Exceeded”

“Aggregate Query has too many rows ”

Solution Approach

Salesforce Architect has already tried implementing custom indexing, Skinny tables, and selective queries but still due to the large data volumes, Time Out Error is not resolved. Since Salesforce recommends to enable Primary Key chunking when querying tables with more than 20 million records or when a bulk query consistently times out, The architect recommends considering PK Chunking. Let’s go through more details.

How does PK Chunking work?

In our case, Orders__c object has 20,000,000 records, When you enable PK chunking for the following query on an Orders__c  object with PK Chunking to find the orders with status ‘Delivered’ with below query :


Select Id, Name,Status__c,Region__c from Order__c where 
Status__c = 'placed' AND Region__c = 'North'

When Primary Key Chunking is applied to the above Query, The above query is further divided into chunks by using Primary Key limits. So let’s say the total records are 2 million and this query is broken down into 80 chunks with each chunk of 250,000 each chunked with primary keys as below, In the below chunking example, considering just, for example,092-78-0-000000000  is the first ID. The Id is in the format Object-Instance-Reserved-This Distance (9 Digits).9 Digits calculates the distance, you do not have to worry about how it works but if you are curious, Please check out – Distance Between Salesforce Ids  :


Select Id, Name,Status__c,Region__c from Order__c where 
Status__c = 'placed' and Region__c = 'North' AND 
Id >= 092780000000000 AND Id < 09278000000132G 

Select Id, Name,Status__c,Region__c from Order__c where 
Status__c = 'placed' and Region__c = 'North' AND 
Id >= 09278000000132G AND Id < 09278000000264W 

Select Id, Name,Status__c,Region__c from Order__c where 
Status__c = 'placed' and Region__c = 'North' AND 
Id >= 09278000000264W AND Id < 09278000000396m 
.........

77 more queries 

Explanation

In the above chunking process,80 queries are processed, In each query chunking has been decided on the range of Ids AKA Primary keys. We wanted to get the Orders placed in the north region. For Query 1, It will search for the order in the north region within the range of those Ids that is 2,50,000 records. Let’s say for each query we get 10 orders so, in total,80 queries will provide 800 desired results when summing up results from all queries. So, in a nutshell, the process is :

  • Chunk the Data sets with defined ranges
  • Sum up all the desired results from the queries.

PK Chunking Header

Bulk API uses below customer  HTTP request and response headers.

To implement PK Chunking Header, We  just need to add below


Sforce-Enable-PKChunking: chunkSize=250000;

Below is the header information you can add : (Reference – Salesforce Documentation)

Field name

Sforce-Enable-PKChunking

Field values

  • TRUE—Enables PK chunking with the default chunk size, starting from the first record ID in the queried table.
  • FALSE—Disables PK chunking. If the header isn’t provided in the request, the default is FALSE.
  • chunkSize—Specifies the number of records within the ID boundaries for each chunk. The default is 100,000, and the maximum size is 250,000. If the query contains filters or soft-deleted records, the number of returned results for each chunk could be less than the chunk size.parent—Specifies the parent object when you’re enabling PK chunking for queries on sharing objects. The chunks are based on the parent object’s records rather than the sharing object’s records. For example, when querying on AccountShare, specify Account as the parent object. PK chunking is supported for sharing objects as long as the parent object is supported.
    Similarly, for case history, specify Case as the parent object.
  • startRow—Specifies the 15-character or 18-character record ID to be used as the lower boundary for the first chunk. Use this parameter to specify a starting ID when restarting a job that failed between batches.

What Next?

References :