Jobs can be partitioned into “chunks” to be executed sequentially on the computational nodes.
Chunks are defined by providing a data frame with columns “job.id” and “chunk” (integer)
to submitJobs
.
All jobs with the same chunk number will be grouped together on one node to form a single
computational job.
The function chunk
simply splits x
into either a fixed number of groups, or
into a variable number of groups with a fixed number of maximum elements.
The function lpt
also groups x
into a fixed number of chunks,
but uses the actual values of x
in a greedy “Longest Processing Time” algorithm.
As a result, the maximum sum of elements in minimized.
binpack
splits x
into a variable number of groups whose sum of elements do
not exceed the upper limit provided by chunk.size
.
See examples of estimateRuntimes
for an application of binpack
and lpt
.
chunk(x, n.chunks = NULL, chunk.size = NULL, shuffle = TRUE) lpt(x, n.chunks = 1L) binpack(x, chunk.size = max(x))
x | [ |
---|---|
n.chunks | [ |
chunk.size | [ |
shuffle | [ |
[integer
] giving the chunk number for each element of x
.
#> ch #> 1 2 #> 5 5#> ch #> 1 2 3 4 5 #> 2 2 2 2 2#> 1 2 #> 2.808393 2.706746#> 1 2 3 4 5 6 #> 0.9446753 0.9699941 0.8983897 0.9263065 0.8307960 0.9449773#>#>#>### Group into chunks with 10 jobs each library(data.table) ids[, chunk := chunk(job.id, chunk.size = 10)]#> job.id chunk #> 1: 1 3 #> 2: 2 1 #> 3: 3 1 #> 4: 4 2 #> 5: 5 3 #> 6: 6 1 #> 7: 7 3 #> 8: 8 3 #> 9: 9 2 #> 10: 10 1 #> 11: 11 1 #> 12: 12 2 #> 13: 13 2 #> 14: 14 1 #> 15: 15 2 #> 16: 16 1 #> 17: 17 3 #> 18: 18 1 #> 19: 19 2 #> 20: 20 1 #> 21: 21 2 #> 22: 22 3 #> 23: 23 2 #> 24: 24 3 #> 25: 25 3 #> job.id chunk#> chunk N #> 1: 3 8 #> 2: 1 9 #> 3: 2 8### Group into 4 chunks ids[, chunk := chunk(job.id, n.chunks = 4)]#> job.id chunk #> 1: 1 2 #> 2: 2 3 #> 3: 3 4 #> 4: 4 3 #> 5: 5 4 #> 6: 6 1 #> 7: 7 4 #> 8: 8 1 #> 9: 9 2 #> 10: 10 2 #> 11: 11 3 #> 12: 12 3 #> 13: 13 4 #> 14: 14 1 #> 15: 15 3 #> 16: 16 2 #> 17: 17 1 #> 18: 18 2 #> 19: 19 3 #> 20: 20 4 #> 21: 21 1 #> 22: 22 2 #> 23: 23 4 #> 24: 24 1 #> 25: 25 1 #> job.id chunk#> chunk N #> 1: 2 6 #> 2: 3 6 #> 3: 4 6 #> 4: 1 7#>#>#>#>#>#>prob.designs = list(prob1 = data.table(), prob2 = data.table(x = 1:2)) algo.designs = list(algo = data.table(i = 1:3)) addExperiments(prob.designs, algo.designs, repls = 3, reg = tmp)#>#>### Group into chunks of 5 jobs, but do not put multiple problems into the same chunk # -> only one problem has to be loaded per chunk, and only once because it is cached ids = getJobTable(reg = tmp)[, .(job.id, problem, algorithm)] ids[, chunk := chunk(job.id, chunk.size = 5), by = "problem"]#> job.id problem algorithm chunk #> 1: 1 prob1 algo 1 #> 2: 2 prob1 algo 1 #> 3: 3 prob1 algo 2 #> 4: 4 prob1 algo 2 #> 5: 5 prob1 algo 1 #> 6: 6 prob1 algo 2 #> 7: 7 prob1 algo 1 #> 8: 8 prob1 algo 1 #> 9: 9 prob1 algo 2 #> 10: 10 prob2 algo 2 #> 11: 11 prob2 algo 1 #> 12: 12 prob2 algo 1 #> 13: 13 prob2 algo 3 #> 14: 14 prob2 algo 3 #> 15: 15 prob2 algo 3 #> 16: 16 prob2 algo 2 #> 17: 17 prob2 algo 2 #> 18: 18 prob2 algo 2 #> 19: 19 prob2 algo 2 #> 20: 20 prob2 algo 4 #> 21: 21 prob2 algo 1 #> 22: 22 prob2 algo 1 #> 23: 23 prob2 algo 3 #> 24: 24 prob2 algo 4 #> 25: 25 prob2 algo 1 #> 26: 26 prob2 algo 4 #> 27: 27 prob2 algo 4 #> job.id problem algorithm chunk#> job.id problem algorithm chunk #> 1: 1 prob1 algo 1 #> 2: 2 prob1 algo 1 #> 3: 3 prob1 algo 2 #> 4: 4 prob1 algo 2 #> 5: 5 prob1 algo 1 #> 6: 6 prob1 algo 2 #> 7: 7 prob1 algo 1 #> 8: 8 prob1 algo 1 #> 9: 9 prob1 algo 2 #> 10: 10 prob2 algo 3 #> 11: 11 prob2 algo 4 #> 12: 12 prob2 algo 4 #> 13: 13 prob2 algo 5 #> 14: 14 prob2 algo 5 #> 15: 15 prob2 algo 5 #> 16: 16 prob2 algo 3 #> 17: 17 prob2 algo 3 #> 18: 18 prob2 algo 3 #> 19: 19 prob2 algo 3 #> 20: 20 prob2 algo 6 #> 21: 21 prob2 algo 4 #> 22: 22 prob2 algo 4 #> 23: 23 prob2 algo 5 #> 24: 24 prob2 algo 6 #> 25: 25 prob2 algo 4 #> 26: 26 prob2 algo 6 #> 27: 27 prob2 algo 6 #> job.id problem algorithm chunk#>#>#> chunk prob1 prob2 #> 1: 1 5 0 #> 2: 2 4 0 #> 3: 3 0 5 #> 4: 4 0 5 #> 5: 5 0 4 #> 6: 6 0 4