Shard cluster definitions
The definition for a shard cluster includes information about the shard servers, the data to shard, and the sharding method.
- A name for the sharding definition
- The name of the database that contains the table that is being sharded
- The name of the user that owns the table that is being sharded
- The sqlhosts file group name for each database server in the shard cluster
- The column that is used as a shard key
- Which sharding method the database server uses for determining where rows are distributed to:
- With consistent hash-based sharding, the data is automatically distributed between shard servers in a way that minimizes the data movement when you add or remove shard servers.
- With hash-based sharding, the data is automatically divided between shard servers, but when you change the shard cluster, all data is redistributed.
- With expression-based sharding, you specify how the data is divided between shard servers. You must also specify the shard server to receive the data that is outside the scope of the expression.
- How you want to distribute the data:
- Insert rows on any shard server, replicate the rows to the appropriate shard server, and then delete duplicate rows from the original server. The delete method is the default method and is the same behavior as when you define sharding with MongoDB commands.
- Insert rows on any shard server, replicate the rows to the appropriate shard server, but then keep duplicate rows on the original server. The keep method is similar to a data dissemination system.
- Insert rows on the appropriate shard server and do not replicate rows. The informational method is useful if you want to query across multiple servers that have the same table, but you do not need to shard the data during loading. For example, you have a different database server for each of your three stores. The data from each store is always inserted in the appropriate server. You set up the sharding definition with an expression that matches database servers with their store identifiers. Then you can run sharded queries to aggregate data from all three stores.
- The table column or collection field for tracking row updates
Consistent hash-based sharding
When you create a consistent hash-based sharding definition, HCL OneDB™ uses a hash value of a specific column or field to distribute data to the servers of a shard cluster in a consistent pattern. When you add or remove a shard server, the consistent hashing algorithm redistributes a fraction of the data. You specify how many hashing partitions to create on each shard server. The default number of hashing partitions is three. The more hashing partitions, the more evenly the data is distributed among shard servers. However, if you specify more than 10 hashing partitions, the resulting SQL statement to create the sharded table might fail because it exceeds the maximum character limit for an SQL statement.
For example, the following command creates a consistent hashing index that has three partitions on each shard server:
cdr define shardCollection collection_1 db_1:john.customers
--type=delete --key=b --strategy=chash --partitions=3 --versionCol=column_3
g_shard_server_1 g_shard_server_2 g_shard_server_3
You can dynamically change the number of hashing partitions per shard server by running the cdr change shardCollection command.
Hash-based sharding
When you create a hash-based sharding definition, HCL OneDB uses a hash value of a specific column or field to distribute data to the servers of a shard cluster. When you add or remove a shard server, the hashing algorithm redistributes all the data.
cdr define shardCollection collection_1 db_1:john.customers
--type=delete --key=state --strategy=hash --versionCol=version
g_shard_server_A g_shard_server_B g_shard_server_C g_shard_server_D
Expression-based sharding
When you create an expression-based sharding definition, HCL OneDB uses WHERE-clause syntax on a specific column or field to distributes data to the servers of a shard cluster.
NV
to
g_shard_server_B:cdr define shardCollection collection_1 db_1:joe.clients
--type=delete --key=state --strategy=expression --versionCol=version
g_shard_server_A "IN ('WA','OR','ID')"
g_shard_server_B "IN ('CA','NV','UT','AZ')"
g_shard_server_C "IN ('TX','OK','NM','AR','LA')"
g_shard_server_D REMAINDER
Sharding definitions must include the REMAINDER
expression for rows or documents
that have values that are not accounted for by the other expressions. For example, the previous
sharding definition sends rows with a shard-key value of 'NY'
to
g_shard_server_D.
40
to 60
would be sent to both g_shard_server_A and
g_shard_server_B.cdr define shardCollection collection_1 db_1:joe.clients)
--type=delete --key=age --strategy=expression --versionCol=version
g_shard_server_A "BETWEEN 0 AND 60"
g_shard_server_B "BETWEEN 40 AND 100"
g_shard_server_C REMAINDER