Wednesday, 8 November 2023

Gnuplot parallel do for [ ... ] { ... }, pt. 1

While making some primitive exercises in 3-body system modelling, I got sick of waiting for all the animation frames. These are being made with gnuplot, using its loop feature and ranges in datacolumns. The reason is straightforward; my output file has the following format:

t, x1, y1, z1, vx1, vy1, vz1, ... , x3, y3, z3, vx3, vy3, vz3
(in 2-d there are no "z" position and velocity coordinates).

Doing first successful launches (recompiling every time I need to change the initial conditions, hunting for bugs etc.) didn't care about the file's size. And doing 10 k time units with time step of the order 0.01 already gives millions of output lines). Now, drawing the trajectories of the three bodies looks somewhat like

do for[i = 1:n:step]{
    # set output file name based on i, using sprintf
    # print some progress information to the console
    plot datafile using 2:3 every::1::i with {some trajectory line style},\
    datafile using 2:3 every ::i::i with {some dot/point style},\
    datafile using 8:9 every::1::i with ...,\
    datafile using 8:9 every::i::i with ...,\
    datafile using 14:15 every::1::i with ...,\
    datafile using 14:15 every::i::i with ...
}

In other words, we have roughly n/step loop cycles, but the larger i is, the larger amount of data is being read from the datafile because of the every::1::i thing. Because each iteration draws some lines using i datapoints, the time will be roughly quadratic in n/step.

Hence the idea to parallelize the gnuplot loop. In order to do that on the most basic level, we have to

0. Implement some "output every ... steps" in our program rather than use it in gnuplot. Or, if we need the data, it's better to rewrite every ...-th line to some separate file.

1. Create the ordinary ("serial") script, i.e. containing a do for[...]{ ... } loop like the one above, that fits our datafile format;

2. Clean it (omit indendation, comments "#..." and console logging that starts with "system("); 

3. Divide the script into three parts: the header (coordinate range settings, terminal, picture size etc.), the loop body (+ its parameter and step if present) and the footer (whatever comes after the loop);

4. Localize and write a function that would replace the loop index by some number.

5. Implement some clever (as if) way of editing the loop body using the file index, creating a batch of scripts and run gnuplot in parallel. Perhaps using in-memory data rather than forcing gnuplot to open the datafile will increase performance but will cost more memory. The batches should of course get smaller as the file index increases, probably as 1/idx2.

6. (?) Drop creating script files and use single-line, comma-separated sets of commands.

[...work in progress... I'm stuck on automatic script preparation and getting rid of unused columns in datafile which - I'm not sure, but strongly suspect - should speed up the whole process. The reptile language should help]

No comments:

Post a Comment