Jobs can be partitioned into “chunks” to be executed sequentially on the computational nodes. Chunks are defined by providing a data frame with columns “job.id” and “chunk” (integer) to submitJobs. All jobs with the same chunk number will be grouped together on one node to form a single computational job.

The function chunk simply splits x into either a fixed number of groups, or into a variable number of groups with a fixed number of maximum elements.

The function lpt also groups x into a fixed number of chunks, but uses the actual values of x in a greedy “Longest Processing Time” algorithm. As a result, the maximum sum of elements in minimized.

binpack splits x into a variable number of groups whose sum of elements do not exceed the upper limit provided by chunk.size.

See examples of estimateRuntimes for an application of binpack and lpt.

chunk(x, n.chunks = NULL, chunk.size = NULL, shuffle = TRUE)

lpt(x, n.chunks = 1L)

binpack(x, chunk.size = max(x))

Arguments

x

[numeric]
For chunk an atomic vector (usually the job.id). For binpack and lpt, the weights to group.

n.chunks

[integer(1)]
Requested number of chunks. The function chunk distributes the number of elements in x evenly while lpt tries to even out the sum of elements in each chunk. If more chunks than necessary are requested, empty chunks are ignored. Mutually exclusive with chunks.size.

chunk.size

[integer(1)]
Requested chunk size for each single chunk. For chunk this is the number of elements in x, for binpack the size is determined by the sum of values in x. Mutually exclusive with n.chunks.

shuffle

[logical(1)]
Shuffles the groups. Default is TRUE.

Value

[integer] giving the chunk number for each element of x.

See also

Examples

batchtools:::example_push_temp(2) ch = chunk(1:10, n.chunks = 2) table(ch)
#> ch #> 1 2 #> 5 5
ch = chunk(rep(1, 10), chunk.size = 2) table(ch)
#> ch #> 1 2 3 4 5 #> 2 2 2 2 2
set.seed(1) x = runif(10) ch = lpt(x, n.chunks = 2) sapply(split(x, ch), sum)
#> 1 2 #> 2.808393 2.706746
set.seed(1) x = runif(10) ch = binpack(x, 1) sapply(split(x, ch), sum)
#> 1 2 3 4 5 6 #> 0.9446753 0.9699941 0.8983897 0.9263065 0.8307960 0.9449773
# Job chunking tmp = makeRegistry(file.dir = NA, make.default = FALSE)
#> No readable configuration file found
#> Created registry in '/tmp/batchtools-example/reg1' using cluster functions 'Interactive'
ids = batchMap(identity, 1:25, reg = tmp)
#> Adding 25 jobs ...
### Group into chunks with 10 jobs each library(data.table) ids[, chunk := chunk(job.id, chunk.size = 10)]
#> job.id chunk #> 1: 1 3 #> 2: 2 1 #> 3: 3 1 #> 4: 4 2 #> 5: 5 3 #> 6: 6 1 #> 7: 7 3 #> 8: 8 3 #> 9: 9 2 #> 10: 10 1 #> 11: 11 1 #> 12: 12 2 #> 13: 13 2 #> 14: 14 1 #> 15: 15 2 #> 16: 16 1 #> 17: 17 3 #> 18: 18 1 #> 19: 19 2 #> 20: 20 1 #> 21: 21 2 #> 22: 22 3 #> 23: 23 2 #> 24: 24 3 #> 25: 25 3 #> job.id chunk
print(ids[, .N, by = chunk])
#> chunk N #> 1: 3 8 #> 2: 1 9 #> 3: 2 8
### Group into 4 chunks ids[, chunk := chunk(job.id, n.chunks = 4)]
#> job.id chunk #> 1: 1 2 #> 2: 2 3 #> 3: 3 4 #> 4: 4 3 #> 5: 5 4 #> 6: 6 1 #> 7: 7 4 #> 8: 8 1 #> 9: 9 2 #> 10: 10 2 #> 11: 11 3 #> 12: 12 3 #> 13: 13 4 #> 14: 14 1 #> 15: 15 3 #> 16: 16 2 #> 17: 17 1 #> 18: 18 2 #> 19: 19 3 #> 20: 20 4 #> 21: 21 1 #> 22: 22 2 #> 23: 23 4 #> 24: 24 1 #> 25: 25 1 #> job.id chunk
print(ids[, .N, by = chunk])
#> chunk N #> 1: 2 6 #> 2: 3 6 #> 3: 4 6 #> 4: 1 7
### Submit to batch system submitJobs(ids = ids, reg = tmp)
#> Submitting 25 jobs in 4 chunks using cluster functions 'Interactive' ...
# Grouped chunking tmp = makeExperimentRegistry(file.dir = NA, make.default = FALSE)
#> No readable configuration file found
#> Created registry in '/tmp/batchtools-example/reg2' using cluster functions 'Interactive'
prob = addProblem(reg = tmp, "prob1", data = iris, fun = function(job, data) nrow(data))
#> Adding problem 'prob1'
prob = addProblem(reg = tmp, "prob2", data = Titanic, fun = function(job, data) nrow(data))
#> Adding problem 'prob2'
algo = addAlgorithm(reg = tmp, "algo", fun = function(job, data, instance, i, ...) problem)
#> Adding algorithm 'algo'
prob.designs = list(prob1 = data.table(), prob2 = data.table(x = 1:2)) algo.designs = list(algo = data.table(i = 1:3)) addExperiments(prob.designs, algo.designs, repls = 3, reg = tmp)
#> Adding 9 experiments ('prob1'[1] x 'algo'[3] x repls[3]) ...
#> Adding 18 experiments ('prob2'[2] x 'algo'[3] x repls[3]) ...
### Group into chunks of 5 jobs, but do not put multiple problems into the same chunk # -> only one problem has to be loaded per chunk, and only once because it is cached ids = getJobTable(reg = tmp)[, .(job.id, problem, algorithm)] ids[, chunk := chunk(job.id, chunk.size = 5), by = "problem"]
#> job.id problem algorithm chunk #> 1: 1 prob1 algo 1 #> 2: 2 prob1 algo 1 #> 3: 3 prob1 algo 2 #> 4: 4 prob1 algo 2 #> 5: 5 prob1 algo 1 #> 6: 6 prob1 algo 2 #> 7: 7 prob1 algo 1 #> 8: 8 prob1 algo 1 #> 9: 9 prob1 algo 2 #> 10: 10 prob2 algo 2 #> 11: 11 prob2 algo 1 #> 12: 12 prob2 algo 1 #> 13: 13 prob2 algo 3 #> 14: 14 prob2 algo 3 #> 15: 15 prob2 algo 3 #> 16: 16 prob2 algo 2 #> 17: 17 prob2 algo 2 #> 18: 18 prob2 algo 2 #> 19: 19 prob2 algo 2 #> 20: 20 prob2 algo 4 #> 21: 21 prob2 algo 1 #> 22: 22 prob2 algo 1 #> 23: 23 prob2 algo 3 #> 24: 24 prob2 algo 4 #> 25: 25 prob2 algo 1 #> 26: 26 prob2 algo 4 #> 27: 27 prob2 algo 4 #> job.id problem algorithm chunk
ids[, chunk := .GRP, by = c("problem", "chunk")]
#> job.id problem algorithm chunk #> 1: 1 prob1 algo 1 #> 2: 2 prob1 algo 1 #> 3: 3 prob1 algo 2 #> 4: 4 prob1 algo 2 #> 5: 5 prob1 algo 1 #> 6: 6 prob1 algo 2 #> 7: 7 prob1 algo 1 #> 8: 8 prob1 algo 1 #> 9: 9 prob1 algo 2 #> 10: 10 prob2 algo 3 #> 11: 11 prob2 algo 4 #> 12: 12 prob2 algo 4 #> 13: 13 prob2 algo 5 #> 14: 14 prob2 algo 5 #> 15: 15 prob2 algo 5 #> 16: 16 prob2 algo 3 #> 17: 17 prob2 algo 3 #> 18: 18 prob2 algo 3 #> 19: 19 prob2 algo 3 #> 20: 20 prob2 algo 6 #> 21: 21 prob2 algo 4 #> 22: 22 prob2 algo 4 #> 23: 23 prob2 algo 5 #> 24: 24 prob2 algo 6 #> 25: 25 prob2 algo 4 #> 26: 26 prob2 algo 6 #> 27: 27 prob2 algo 6 #> job.id problem algorithm chunk
dcast(ids, chunk ~ problem)
#> Using 'chunk' as value column. Use 'value.var' to override
#> Aggregate function missing, defaulting to 'length'
#> chunk prob1 prob2 #> 1: 1 5 0 #> 2: 2 4 0 #> 3: 3 0 5 #> 4: 4 0 5 #> 5: 5 0 4 #> 6: 6 0 4