In the last article we finished with a SPE-based fractal renderer, but with a limited maximum fractal size of 64 × 64 pixels:

We'd like to generate full-size fractals, but the DMAs (which we use to transfer the fractal image out of the SPE) have a maximum size of 64kB. The solution is to perform multiple DMAs each containing a subset of the image's rows.

Each invocation of `render_fractal()`

should render a DMA-able
chunk of fractal data, then we perform the DMA. We do this until the SPE has
processed the entire image:

We just need to modify the spe-fractal code (`spe-fractal.c`

) a
little. At present, we just render the whole fractal in one pass and DMA the
data in the `main()`

function:

render_fractal(&args.fractal); mfc_put(args.fractal.imgbuf, ppe_buf, args.fractal.rows * args.fractal.cols * sizeof(struct pixel), 0, 0, 0); /* Wait for the DMA to complete */ mfc_write_tag_mask(1 << 0); mfc_read_tag_status_all();

First, we need to modify our `render_fractal()`

fuction to take
a starting row, and a number of rows to render. This is the new prototype
of `render_fractal()`

:

static void render_fractal(struct fractal_params *params, int start_row, int n_rows)

In the SPE program's `main()`

, we just need to set up some
convenience variables:

bytes_per_row = sizeof(*buf) * args.fractal.cols; rows_per_dma = sizeof(buf) / bytes_per_row;

And do the rendering and DMAs in a loop:

for (row = 0; row < args.fractal.rows; row += rows_per_dma) { render_fractal(&args.fractal, row, rows_per_dma); mfc_put(buf, ppe_buf + row * bytes_per_dma, rows_per_dma * bytes_per_row, 0, 0, 0); /* Wait for the DMA to complete */ mfc_write_tag_mask(1 << 0); mfc_read_tag_status_all(); }

This loop will render as many image rows as will fit into a single DMA, then DMA the rendered data back to main memory.

Now, we're able to render fractals larger than 64 × 64 pixels:

The source for the updated fractal renderer is available in fractal.2.tar.gz.

### performance

Now that we can generate full-size fractals, we can compare the running times with the PPE-based fractal renderer. The following table shows running times with a standard fractal (using these fractal parameters).

Implementation | Time (sec) |
---|---|

PPE | 55.7 |

1 SPE | 40.7 |

So, we get a 27% speedup by moving the fractal generation code to run on a SPE. We're still a way behind the optimal performance though, and benchmarking on other systems gives better times (for example, generating the same fractal on an Intel Core 2 Duo @ 2.4GHz takes 13.8 seconds).

We can improve the Cell performance by a large amount - stay tuned for the next article to see how.