====== Introduction ======
Let's say you have a loop like:
 foreach my $host (@hosts) {
    system "ssh $host reboot";
 }
This is well and good, but it takes forever if you want to do 1500 clients.  Therefore, we'd want to parallelize the loop.  There are two ways to go about this:
  * Install a module that does this for you.
  * Copy/paste a bit of code (saves you from having to muck about with CPAN).
We'll cover both methods.  **Note: you can only parallelize a loop when each iteration doesn't depend on the previous!**
====== Using a parallelization module ======
The simplest module to do this is [[http://search.cpan.org/~dlux/Parallel-ForkManager-0.7.5/ForkManager.pm|Parallel::ForkManager]].
We'll apply this module to the same code we looked at before:
  use Parallel::ForkManager;
  
  $pm = new Parallel::ForkManager($MAX_PROCESSES);
  
  foreach my $host (@hosts) {
    my $pid = $pm->start and next;  # Fork; the parent loops to do the next host and the child does the following:
    
    system "ssh $host reboot";
    
    $pm->finish; # Terminate the child process
  }
  
  $pm->wait_all_children;
Since we're just ssh'ing to a bunch of machines, we can set $MAX_PROCESSES as high as 60-100 on a reasonable desktop machine, or 200-400 on a beefy server like Mario.
====== Using hand-rolled parallel code ======
We can achieve the exact same result without relying on this module without a huge amount of code -- it's actually pretty simple:
  my $numChildren=0; # number of forked children currently active
  
  for my $host (@hosts) {
    while ($numChildren >= $MAX_PROCESSES) {  # limit forked children to <= $processes
      my $deadKid = wait();
      $numChildren--;
    }
    $numChildren++;   # We're about to fork another kid
    my $pid = fork() and next; # Fork; the parent loops to do the next host and the child does the following:
    
    system "ssh $host reboot";
    
    exit;
  }
  
  # Reap remaining kids
  while ($numChildren > 0) {
    my $deadKid = wait();
    $numChildren--;
  }
====== Tricks and caveats ======
There are some things to be aware of when working with parallel code.
===== No inter-iteration interaction =====
Each iteration occurs in its own process, so no iteration can have any effect on any other.  Once the fork occurs, unless you set up a pipe or something, the parent and child are two separate animals entirely.
===== Parallel user input is a bad idea =====
If you need to ask something of the user, you should do it up front before all the forking starts.
===== Output will be mixed together =====
All the children will run at the same time, so their console output will be mixed together.  To keep things together, print them all at once.  For example, if you print all the output of an iteration in a single print call, then it will all stay together.  Each blob of output will still be emitted in a random order, since there's no telling which process will get to print first.
If you want to make your output really coherent (e.g. sorted or otherwise post-processed) you need to set up a pipe to the parent process.  Basically, this means doing one fork before the loop, and having a process whose sole job is to collect the STDOUT of all the subsequent processes via its own STDIN and process it.  That may sound complicated, but it's not that bad.  Lets say you simply wanted to sort all your output and print it at the end.  All you'd have to do is call the following function before the parallel loop:
 # Fork a child and redirect the parent's output to that child.  The child reads
 # and sorts the output.  This is adapted from "filtering your own output" recipe
 # from the perl cookbook.  Also, it is awesome.
 sub sortMyOutput {
 	my $pid;
 	# Forking a child to sort us...
 	if ($pid = open STDOUT, "|-") {
 		# Sorting child forked, parent returning.
 		return;
 	}
 	
 	die "Cannot fork for self-sort: $!\n" unless defined $pid;
 	# Sorting child ready - will read all STDIN into a list, sort it, and print it.
 	print sort ;
 	exit; # child is done, exit
 }
Then, after the loop, it helps to explicitly close STDOUT so the child knows to proceed before we wait on it.  So the overall algorithm becomes:
 sortMyOutput();
 foreach () {} # parallel loop goes here
 close STDOUT;
If you want to process the output with knowledge of which output goes to which iteration, you'll need to build a Unix pipe manually.  This gets a bit hairy, so look at the "pipe" and "perlipc" documentation if you really care.  Or as Tyler.