how_to_make_any_perl_loop_parallel
no way to compare when less than two revisions
Differences
This shows you the differences between two versions of the page.
— | how_to_make_any_perl_loop_parallel [2010/12/03 23:28] (current) – created tkbletsc | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ===== Introduction ===== | ||
+ | Let's say you have a loop like: | ||
+ | |||
+ | < | ||
+ | | ||
+ | system "ssh $host reboot"; | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | This is well and good, but it takes forever if you want to do 1500 clients. | ||
+ | |||
+ | * Install a module that does this for you. | ||
+ | * Copy/paste a bit of code (saves you from having to muck about with CPAN). | ||
+ | |||
+ | We'll cover both methods. | ||
+ | |||
+ | ===== Using a parallelization module ===== | ||
+ | |||
+ | The simplest module to do this is [[http:// | ||
+ | |||
+ | We'll apply this module to the same code we looked at before: | ||
+ | |||
+ | use Parallel:: | ||
+ | | ||
+ | $pm = new Parallel:: | ||
+ | | ||
+ | foreach my $host (@hosts) { | ||
+ | my $pid = $pm-> | ||
+ | | ||
+ | system "ssh $host reboot"; | ||
+ | | ||
+ | $pm-> | ||
+ | } | ||
+ | | ||
+ | $pm-> | ||
+ | |||
+ | Since we're just ssh' | ||
+ | |||
+ | ===== Using hand-rolled parallel code ===== | ||
+ | |||
+ | We can achieve the exact same result without relying on this module without a huge amount of code -- it's actually pretty simple: | ||
+ | |||
+ | my $numChildren=0; | ||
+ | | ||
+ | for my $host (@hosts) { | ||
+ | while ($numChildren >= $MAX_PROCESSES) { # limit forked children to <= $processes | ||
+ | my $deadKid = wait(); | ||
+ | $numChildren--; | ||
+ | } | ||
+ | $numChildren++; | ||
+ | my $pid = fork() and next; # Fork; the parent loops to do the next host and the child does the following: | ||
+ | | ||
+ | system "ssh $host reboot"; | ||
+ | | ||
+ | exit; | ||
+ | } | ||
+ | | ||
+ | # Reap remaining kids | ||
+ | while ($numChildren > 0) { | ||
+ | my $deadKid = wait(); | ||
+ | $numChildren--; | ||
+ | } | ||
+ | |||
+ | ===== Tricks and caveats ===== | ||
+ | |||
+ | There are some things to be aware of when working with parallel code. | ||
+ | |||
+ | ==== No inter-iteration interaction ==== | ||
+ | |||
+ | Each iteration occurs in its own process, so no iteration can have any effect on any other. | ||
+ | |||
+ | ==== Parallel user input is a bad idea ==== | ||
+ | |||
+ | If you need to ask something of the user, you should do it up front before all the forking starts. | ||
+ | |||
+ | ==== Output will be mixed together ==== | ||
+ | |||
+ | All the children will run at the same time, so their console output will be mixed together. | ||
+ | |||
+ | If you want to make your output really coherent (e.g. sorted or otherwise post-processed) you need to set up a pipe to the parent process. | ||
+ | |||
+ | < | ||
+ | # Fork a child and redirect the parent' | ||
+ | # and sorts the output. | ||
+ | # from the perl cookbook. | ||
+ | sub sortMyOutput { | ||
+ | my $pid; | ||
+ | # Forking a child to sort us... | ||
+ | if ($pid = open STDOUT, " | ||
+ | # Sorting child forked, parent returning. | ||
+ | | ||
+ | } | ||
+ | |||
+ | die " | ||
+ | # Sorting child ready - will read all STDIN into a list, sort it, and print it. | ||
+ | | ||
+ | | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | Then, after the loop, it helps to explicitly close STDOUT so the child knows to proceed before we wait on it. So the overall algorithm becomes: | ||
+ | |||
+ | sortMyOutput(); | ||
+ | foreach () {} # parallel loop goes here | ||
+ | close STDOUT; | ||
+ | |||
+ | If you want to process the output with knowledge of which output goes to which iteration, you'll need to build a Unix pipe manually. |
how_to_make_any_perl_loop_parallel.txt · Last modified: 2010/12/03 23:28 by tkbletsc