Thursday, February 27, 2014

Noise Generation Multi-Threading

Multi-threading is an extremely powerful tool at a coder's disposal, but it is devilishly hard to understand and control. Before multiple cores became common on computers, multi-threading was a way for multiple programs to run "simultaneously". In reality, the programs just took turns on the processor. But with the advent of multi-core cpus, on which actual simultaneous processing is possible, multi-threading actually gives you a significant speed increase if used correctly. In this tutorial, we will look at using multi-threading to increase the speed of noise generation. This speed boost can be significant. With noise, since almost all of the calculations can be done concurrently, you will get a speed boost similar to the number of cores you have. (I get an average of for all types of noise of 1.8 times faster on my dual-core machine)



The first step in making a multi-threaded noise generation system is deciding on a way to split up the generation. The most obvious solution is just to make each thread calculate an area of the noise array as follows:
However that solution has two immediately apparent problems. The first and most obvious one is that we either have to have special cases for every possible number of cores or implement a fancy splitting algorithm to divide the area up into sections based on the number of available cores. A less obvious issue is that Java does not guarantee that threads will receive equal processing time, so if one thread takes especially long to complete, it will "waste" processing time.

Another way to do it would be to have a pixel counter that increments every time a thread asks for the next pixel to do. That way it is core-count independent and doesn't wait for the slowest thread. However this solution has excessive overhead for each pixel. Many values in noise can be calculated for an entire row or column and don't need to be recalculated for each pixel.

So instead we strike a middle ground. While there are more columns to process, get next column, precompute values for that column and then calculate the pixels. This gets the best of both worlds, core-count independent, min-time not max-time, negligible per-pixel overhead.

We can encapsulate this into a runnable instance:
/**
 * 
 * Class that represents a task that will be executed simultaneously on multiple threads over an array.
 * 
 * @author F4113nb34st
 *
 */
public abstract class ArrayTask implements Runnable
{
   /**
    * The maximum value. The task stops when we go over this value.
    */
   private final int max;
   /**
    * The current index. Incremented on getX() calls.
    */
   private volatile int index;
 
   /**
    * Creates a new ArrayTask with the given min and max values.
    * @param mi The min value.
    * @param ma The max value.
    */
   public ArrayTask(int mi, int ma)
   {
       max = ma;
       //since index is incremented before returned, start one less than min so first value is min
       index = mi - 1;
   }
 
    /**
     * Gets the next index value.
     * @return The index value or -1 if we are done.
     */
    private synchronized int getX()
    {
        index++;
        if(index <= max)
        {
            return index;
        }else
        {
            return -1;
        }
    }
 
    @Override
    public void run()
    {
        try
        {
            int x;
            //loop until done (x == -1)
            while(true)
            {
                x = getX();
                if(x == -1)
                {
                    break;
                }
                run(x);
            }
        }catch(ArrayIndexOutOfBoundsException ex)
        {
            //rarely we go over, if we do, don't worry about it
        }
    }
 
    /**
     * Performs the required operation on the given x value.
     * @param x The index value to operate on.
     */
    public abstract void run(int x);
}
This runnable takes a min and max int, in this case "0" and the "noise.getWidth() - 1". Notice this class does not specify the code that actually performs operations based on the column. This is because the operations vary from noise function to noise function. Those operations will be handled by a subclass. It is important that all of the threads run the same instance of ArrayTask so the index variable is shared.

Next we need something to actually run our Runnable on. Ideally this object should handle menial tasks related to initing the threads, waiting for the operations to complete, and handing Runnables off to the subthreads. What we need is a thread pool.
A thread pool is an object that manages a variable size group of threads and hands tasks to the threads to be run. By default Java implements thread pools in ExecutorServive, but we will write our own so we have more control over exactly what it does. We also want our thread pool to "pause" all of its child threads if there are no tasks to be completed so as to not waste cpu cycles. The main thread should also pause while it waits for the tasks to be completed if desired. Here is my implementation of a thread pool for this purpose.
/**
 * 
 * Object that manages a variable sized pool of threads.
 * 
 * @author F4113nb34st
 *
 */
public class ThreadPool
{
    /**
     * List of available tasks.
     */
    private volatile LinkedList taskQueue;
    /**
     * Array of the threads in this ThreadPool.
     */
    private Thread[] threads;
    /**
     * The hibernation lock to suspend the child threads until they are needed.
     */
    private Object hibernate = new Object();
    /**
     * The wait lock to suspend the main thread until the tasks are completed.
     */
    private Object waitLock = new Object();
    /**
     * The number of hibernating threads.
     */
    private volatile int numHibernating;
 
    /**
     * Creates a new ThreadPool with the given number of child threads.
     * @param size The number of child threads.
     */
    public ThreadPool(int size)
    {
        taskQueue = new LinkedList();
        threads = new Thread[size];
        for(int i = 0; i < threads.length; i++)
        {
            //create and start all child threads
            threads[i] = new PoolThread();
            threads[i].start();
        }
    }
 
    /**
     * Returns the number of child threads in this pool.
     * @return The number of child threads.
     */
    public int poolSize()
    {
        return threads.length;
    }
 
    /**
     * Adds a new task that will be run once by each thread.
     * @param runner The task.
     */
    public synchronized void addGlobalTask(Runnable runner)
    {
        //equivalent to adding the runner poolSize() times to the taskQueue
        for(int i = 0; i < threads.length; i++)
        {
            taskQueue.add(runner);
        }
    }
 
    /**
     * Adds the given task to the taskQueue of this ThreadPool.
     * @param runner The task.
     */
    public synchronized void addTask(Runnable runner)
    {
        taskQueue.add(runner);
    }
 
    /**
     * Starts all of the child threads to begin executing the availible tasks.
     */
    public void start()
    {
        //signal all hibernating threads to resume
        synchronized(hibernate)
        {
            hibernate.notifyAll();
        }
    }
 
    /**
     * Starts all of the child threads to begin executing the availible tasks and waits for all tasks to be completed.
     */
    public void startAndWait()
    {
        //start
        start();
        //wait til all tasks done
        synchronized(waitLock)
        {
            while(moreTasks())
            {
                try
                {
                    waitLock.wait();
                } catch(InterruptedException ex)
                {
                    //ex.printStackTrace();
                }
            }
        }
    }
 
    /**
     * Hibernates the current thread if there are no more tasks to be completed. Also alerts the main thread if it is waiting.
     * Called by child threads.
     */
    private void hibernate()
    {
        synchronized(hibernate)
        {
            //if more tasks, don't hibernate silly
            while(!moreTasks())
            {
                //if the last thread to finish
                if(numHibernating == poolSize() - 1)
                {
                    //notify the sleeping main thread
                    synchronized(waitLock)
                    {
                        waitLock.notifyAll();
                    }
                }
                //increase hibernating count
                numHibernating++;
                try
                {
                    //hibernate
                    hibernate.wait();
                } catch(InterruptedException ex)
                {
                    //ex.printStackTrace();
                }
                //we're done hibernating, reduce hibernating count and return
                numHibernating--;
            }
        }
    }
 
    /**
     * Returns true if there are more tasks to be completed.
     * @return True if more tasks left.
     */
    private synchronized boolean moreTasks()
    {
        return !taskQueue.isEmpty();
    }
 
    /**
     * Returns a new task from the queue.
     * @return A task from the queue.
     */
    private synchronized Runnable getTask()
    {
        return taskQueue.pollFirst();
    }
 
    /**
     * The child thread's class
     */
    public class PoolThread extends Thread
    {
        public PoolThread()
        {
         //all pool threads are daemon
         setDaemon(true);
        }
  
        public void run()
        {
            try
            {
                //infinitely loop, if we need to die, program will kill us since we are daemon
                while(true)
                {
                    Runnable task;
                    //while more tasks
                    while(true)
                    {
                        //get a task
                        task = getTask();
                        if(task == null)
                        {
                            break;
                        }
                        //perform it
                        task.run();
                    }
                    //hibernate til more tasks
                    hibernate();
                }
            }catch(Exception ex)
            {
                ex.printStackTrace();
            }
        }
    }
}
Now we have everything we need to make a multi-threaded noise generator. Here is how it would be implemented for BasicNoise:
public void fillMultiThreaded(NoiseArray noise, ThreadPool pool)
{
    pool.addGlobalTask(new ColumnTask(noise, 0, noise.getWidth() - 1));
    pool.startAndWait();
}
 
/**
 * Fills columns with basic noise until all have been filled.
 */
private class ColumnTask extends ArrayTask
{
    /**
     * NoiseArray to fill.
     */
    private final NoiseArray noise;
  
    private ColumnTask(NoiseArray array, int min, int max)
    {
        super(min, max);
        noise = array;
    }

    @Override
    public void run(int x)
    {
        //for all y's in column
        for(int y = 0; y <= noise.getHeight(); y++)
        {
            //set to noise value
            noise.setRelative(x, y, gen.noise_gen(seed, x, y));
        }
    }
}

Also, last but not least, we need the number of cpu cores available to our program to init the ThreadPool. This is done by calling "Runtime.getRuntime().availableProcessors()".

For specific implementations, look at the sources. There is a test class to compare speeds.

Demos: https://drive.google.com/folderview?id=0B_jXYEquMamINGVGQTM0cTJzdmM&usp=sharing
Source Code: https://github.com/f4113nb34st/Println/

No comments:

Post a Comment