Building an iterator that can grow

A few weeks ago I ran into a tricky to solve issue in CakePHP. It involved an iterator that needs be grown during iteration, and nested loops over that same iterator. While infrequent, there are scenarios where you would want to grow an iterator as it is being iterated. My situation is the plugin registry for CakePHP. Plugins support a bootstrap hook method that is used to initialize a plugin. During that hook a plugin may load additional plugins if it depends on other plugins. These child plugins also need their bootstrap hook fired. Instead of trying to solve this problem with multiple loops, we use an iterator that grows in place. The problematic loop code looks like:

Show Plain Text
  1. // $plugins is the iterator containing all the plugins in an application.
  2. foreach ($plugins as $plugin) {
  3.     $plugin->bootstrap();
  4. }

While this code is simple, the problem arose from a plugin that iterated plugins looking for sub-classes of its interfaces. This effectively created something like:

Show Plain Text
  1. $plugins = new PluginIterator([new Plugin('one'), new Plugin('discover'), new Plugin('three')]);
  2. foreach ($plugins as $plugin) {
  3.     // Assume this condition is hit mid-way through
  4.     // the loop
  5.     if ($plugin->getName() == 'discover') {
  6.         foreach ($plugins as $inner) {
  7.             $plugin->discoverTasks($inner);
  8.         }
  9.     }
  10.     echo "Found {$thing->getName()}\n";
  11. }

If discover was in the third item out of five, the echo in the outer loop would never be reached for the fourth and fifth items. In order to understand why we should review the various ways you can make custom iterators in PHP and the properties each approach imparts.

Creating a PHP Iterator

The simplest way to make an object iterable in PHP is to implement IteratorAggregate interface. This interface allows you to delegate iteration or use a generator:

Show Plain Text
  1. class PluginIterator implements IteratorAggregate
  2. {
  3.     public function __construct($items)
  4.     {
  5.         $this->items = $items;
  6.     }
  7.  
  8.     public function getIterator()
  9.     {
  10.         return new ArrayIterator($this->items);
  11.     }
  12.  
  13.     public function add($item)
  14.     {
  15.         $this->items[] = $item;
  16.     }
  17. }

Using this approach will allow your iterator to be used in nested loops safely. However, each loop will take a state snapshot when iteration is started and items appended to the iterator using add() will require additional loops as we can’t grow the copy being iterated. This snapshot behavior is also shared by arrays in PHP, which is one of the reasons we need to use an iterator in the first place.

Another approach is to implement the Iterator interface directly. This gives us more control over how our object is iterated. A simple Iterator implementation could look like:

Show Plain Text
  1. class PluginIterator implements Iterator {
  2.     private $position = 0;
  3.     private $items;
  4.  
  5.     public function __construct(array $items) {
  6.         $this->items = $items;
  7.         $this->position = 0;
  8.     }
  9.  
  10.     public function rewind() {
  11.         $this->position = 0;
  12.     }
  13.  
  14.     public function current() {
  15.         return $this->items[$this->position];
  16.     }
  17.  
  18.     public function key() {
  19.         return $this->position;
  20.     }
  21.  
  22.     public function next() {
  23.         $this->position++;
  24.     }
  25.  
  26.     public function valid() {
  27.         return isset($this->items[$this->position]);
  28.     }
  29.  
  30.     public function add($item)
  31.     {
  32.         $this->items[] = $item;
  33.     }
  34. }

This implementation allows us to grow the iterator as it is being iterated. However, it will not work in a nested loop scenario. When PHP starts iteration it calls rewind() instructing the iterator to go back to the beginning. If we revisit the example from before:

Show Plain Text
  1. $plugins = new PluginIterator([new Plugin('one'), new Plugin('discover'), new Plugin('three')]);
  2. // rewind() is called here
  3. foreach ($plugins as $plugin) {
  4.     $plugin->bootstrap();
  5.     if ($plugin->getName() == 'discover') {
  6.         // rewind() is called here too!
  7.         foreach ($plugins as $inner) {
  8.             $plugin->discoverTasks($inner);
  9.         }
  10.         // The iterator's position is now at the end of its data,
  11.         // and the outer loop's valid() call will return false
  12.         // stopping iteration.
  13.     }
  14.     echo "Found {$thing->getName()}\n";
  15. }

Our code will never call bootstrap() on item three, and never output Found three which isn’t what we want.

My solution

The problem boils down to only having a single integer to track the iterator’s current position. Given that we want to support nested loops, we either need to freeze state, or keep track of multiple positions. While freezing state would be simpler, it would also break our ability to grow the iterator in place. That leaves tracking multiple positions. I ended up creating a stack of position indices. The basic implementation looks like:

Show Plain Text
  1. class PluginIterator implements Iterator {
  2.     private $positions = [];
  3.     private $loopDepth = -1;
  4.     private $items;
  5.  
  6.     public function __construct(array $items) {
  7.         $this->items = $items;
  8.     }
  9.  
  10.     public function rewind() {
  11.         $this->positions[] = 0;
  12.         $this->loopDepth += 1;
  13.     }
  14.  
  15.     public function current() {
  16.         $position = $this->positions[$this->loopDepth];
  17.  
  18.         return $this->items[$position];
  19.     }
  20.  
  21.     public function key() {
  22.         return $this->positions[$this->loopDepth];
  23.     }
  24.  
  25.     public function next() {
  26.         $this->positions[$this->loopDepth]++;
  27.     }
  28.  
  29.     public function valid() {
  30.         $position = $this->positions[$this->loopDepth];
  31.         $valid = isset($this->items[$position]);
  32.         if (!$valid) {
  33.             array_pop($this->positions);
  34.             $this->loopDepth -= 1;
  35.         }
  36.  
  37.         return $valid;
  38.     }
  39.  
  40.     public function add($item)
  41.     {
  42.         $this->items[] = $item;
  43.     }
  44. }

While this solves the nested loop problem, it isn’t perfect. It fails if the inner loop uses break. It will also accumulate position values each time iteration is stopped prematurely with a return. Aside from those two caveats it does enable us to perform nested loops on an iterator that can be modified in-place. Hopefully this can help you in the future should you need to build an iterator with similar constraints.

Comments

While I agree that iterations levels can be useful, I think in that case you have kinda rewrite this part:

<pre>
foreach ($plugins as $plugin) { $plugin->bootstrap(); if ($plugin->getName() == ‘discover’) { foreach ($plugins as $inner) { $plugin->discoverTasks($inner); } }
}
</pre>

To this:

<pre>
$discover = null;
foreach ($plugins as $plugin) { $plugin->bootstrap(); if ($plugin->getName() == ‘discover’) { $discover = $plugin; }
}
foreach ($plugins as $plugin) { $discover->discoverTasks($plugin);
}
</pre>

And in this case you will not discover tasks for un-inited plugins.

May be it is not an issue in this case, but it simplier.

ooooo on 5/26/19

ooooo: If the first loop adds plugins, which can also add their own dependencies. Won’t you be missing calling the @bootstrap@ method on the 2nd level of plugins? Wouldn’t you have to add yet another loop after the first bootstrap call to get the newly added items? One of my goals was to generalize the solution and avoid having to traverse the set twice, and support arbitrary depth.

mark story on 7/28/19

Have your say: