Archive for the ‘ GameDev::Programming ’ Category

Multi-threading my way

In D.S.S. I have to handle tons and tons of ships and projectiles in many star-systems. Consequently, if I want a decent framerate I’ll have to multi-process all this to take advantage of new multi-core CPUs.
So without further ado, here is how I did it. It’s actually a pretty standard way to do it. It’s just the simplest way I could think to implement it. I’ll be using common primitive like mutex, event and thread so I won’t go into details for these classes. I’m using FastDelegate library as well cause I just like it a lot.
The model is basically: queue a bunch of tasks (or jobs) and process them with several threads. The magic happens in 3 classes: JobQueue, MPWorker and MultiProcess.

JobQueue:
This one synchronizes job addition and retrieving. It really just acts as a synchronization primitive.

MPWorker:
this guy process job from the JobQueue. There are usually several of them spawned at initialization time. They literally fight each other to get and execute jobs. They are pretty good workers indeed.

MultiProcess just ties everything together.

So here goes a wall of code:
JobQueue:

class JobQueue{
public:
    JobQueue()
    : muiNbWorkingJob(0)
    , muiMaxSimultaneous(0){}
    void addJob(const fastdelegate::FastDelegate0<>& a){
        Mutex::Lock lock(mJobMutex);
        mAllDone.reset();
        mJobs.push_back(a);
    }
    fastdelegate::FastDelegate0<> retrieveNextJob(){
        Mutex::Lock lock(mJobMutex);
        if(mJobs.size()>0){
            fastdelegate::FastDelegate0<> job = mJobs.front();
            mJobs.pop_front();
            _InterlockedIncrement(&muiNbWorkingJob);
            updateMaxSimultanious();
            return job;
        } else{
            return nullptr;
        }
    }
    // called when a job is done
    void markJobDone(fastdelegate::FastDelegate0<>& a){
        Mutex::Lock lock(mJobMutex);
        // _InterlockedDecrement is used to force updating local variable 
        // with up-to-date variable changed by other thread
        // a volatile pointer would probably work as well.
        uint32 uiNbWorking = _InterlockedDecrement(&muiNbWorkingJob); 
        if(uiNbWorking == 0 && mJobs.size()==0 ){
            mAllDone.raise();
        }
    }
    // waits until all job are done
    void waitForAllDone(){
        mAllDone.wait();
    }
    uint32 getMaxSimultanious()const {return muiMaxSimultaneous;}
    void resetMaxSimultanious(){muiMaxSimultaneous=0;}
private:
    // prevent copying this class
    JobQueue(const JobQueue&);
    JobQueue& operator=(const JobQueue&);
    void updateMaxSimultanious(){
        if(muiNbWorkingJob>muiMaxSimultaneous) muiMaxSimultaneous=muiNbWorkingJob;
    }
    std::list< fastdelegate::FastDelegate0<> > mJobs;
    Mutex mJobMutex;
    long muiNbWorkingJob; // number of job currently executing at the moment
    Event mAllDone; // Event raised when all jobs have been processed
    uint32 muiMaxSimultaneous; // max number of simultaneous worker working at the same time. use for statistics
};

MPWorker

class MPWorker{
public:
    MPWorker(uint32 auiId, JobQueue& aJq)
        :muiId(auiId)
        ,mJobQueue(aJq) {
        mpThread = new Thread(fastdelegate::MakeDelegate(this,& MPWorker::workerFunc));
        mpThread->start();
    }
    ~MPWorker(){
        delete mpThread;
    }
    void workerFunc(){
        while(1){
            fastdelegate::FastDelegate0<> job = mJobQueue.retrieveNextJob();
            if(job){
                job();
                mJobQueue.markJobDone(job);
            }
        }
    }
    uint32 muiId; // don't really need this, it is just for debug purpose
    Thread * mpThread;
    JobQueue& mJobQueue; // the queue from which to get jobs
private:
    MPWorker(const MPWorker& a);
    MPWorker& operator=(const MPWorker& a);
};

MultiProcess

class MultiProcess{
public:
    MultiProcess()
        :mbWorkInSingleProcess(true){}
    // initialize system with number of desired workers
    void init(uint32 auiNbWroker){
        if(auiNbWroker==0){
            mbWorkInSingleProcess=true;
        } else {
            mbWorkInSingleProcess=false;
            for(uint32 i = 0; i < auiNbWroker; ++i){
                mWorkers.push_back( new MPWorker(i, mJobQueue));
            }
        }
    }
    ~MultiProcess(){
        for(uint32 i = 0; i < mWorkers.size(); ++i){
            delete mWorkers[i];
        }
    }
    void addJob(const fastdelegate::FastDelegate0<>& aJob){
        if(mbWorkInSingleProcess){
            aJob();
        } else {
            mJobQueue.addJob(aJob);
        }
    }

    // called when the user want all jobs to be processed
    void flush(){
        fastdelegate::FastDelegate0<> job = mJobQueue.retrieveNextJob();
        while(job){
            job();
            mJobQueue.markJobDone(job);
            job = mJobQueue.retrieveNextJob();
        }
        if(!mbWorkInSingleProcess)
            mJobQueue.waitForAllDone();
    }
    // some stats
    uint32 getMaxSimultaniousWorker()const {return mJobQueue.getMaxSimultanious();}
    void resetMaxSimultaniousWorker(){mJobQueue.resetMaxSimultanious();}
private:
    JobQueue mJobQueue;
    std::vector< MPWorker* > mWorkers;
    bool mbWorkInSingleProcess; // if work in single process, job are executed when added. otherwise they are put in the job queue.
    //prevent copying this class
    MultiProcess(const MultiProcess&a);
    MultiProcess& operator=(const MultiProcess&a);
};

Now to use it, do something like:

class MPUser{
public:
    void expensiveJob(){
        float fValue=0;
        for(uint32 i = 0; i < 50000; ++i){
            fValue+=rand()/(float) RAND_MAX;
        }
        printf("", fValue);
    }
    void use(){
        MultiProcess mp;
        mp.init(3);
        for(uint32 i = 0; i < 100; ++i){
            mp.addJob(fastdelegate::MakeDelegate(this, &MPUser::expensiveJob));
        }
        mp.flush();
    }
};

If you don’t know how much worker you should spawn, I suggest using the CPU number of processor – 1

    SYSTEM_INFO sysinfo;
    GetSystemInfo( &sysinfo );
    uint32 nbWorker = sysinfo.dwNumberOfProcessors-1;

Be aware that if your jobs are too small, the overhead of adding and getting job will be quite high. For instance, in D.S.S., I update whole star-systems in a job. Updating each ships with its own jobs would be terrible. Also, as ships usually interact only with other ships in the same star-system, I don’t have a lot of synchronization to handle.

Here is the full code with VS 2010 solution.

Todo

I’m very forgetful so I tend to write every little things down for later reference. Over the years I learned to coop with that trait of mine in many ways. I even bought an Ipod touch so i can take notes and pictures of what i should remember. So it’s not surprising that my todo list got quite big and exhaustive over the ~4 months I’ve been working on my new project. But the problem is that I add a new item in the list and just never go back to do it. I just forget about my todo list… I’m that forgetful. So as an incentive to do my todo list items, I did this:

#define JOIN_STR2__(x) #x
#define JOIN_STR1__(x) JOIN_STR2__(x)
#define LOC_TODO __FILE__ "("JOIN_STR1__(__LINE__)") : Warning _TODO: "
#define TODO(x) __pragma(message(LOC_TODO x))

So when I think about something i should remember and actually do later I just write it down at the right place in the code like this:

TODO("Make this real");

So when I compile I get a warning message that look like:
Warning 110 warning _TODO: Make this real c:\p4\test\test.cpp 96

I like getting a warning message, It reminds me that I should fix it while I’m working in the same file.

Edge iteration

If you’ve done some computational geometry on polygons you probably have encountered this little problem: iteration on polygon edges.
Common solution is using the modulo operator:

void processPolygon(unsigned int n){
    for(unsigned int i = 0; i < n; ++i){
        precessEdge(i, (i+1)%n);
    }
}

Which yields on VS C++ 2010 with optimization on:

01061027  lea         esi,[ecx+1]  
0106102A  xor         edx,edx  
0106102C  mov         eax,esi  
0106102E  div         eax,edi  
01061030  mov         eax,edx  
01061032  call        precessEdge (1061000h)  
01061037  mov         ecx,esi  
01061039  cmp         ecx,edi  
0106103B  jb          processPolygon+7 (1061027h)  
0106103D  pop         esi  

The div is the result of the modulo, we don’t want it.
There is a better and faster way to do this:

void processPolygon2(unsigned int n){
    for(unsigned int i = n-1,j=0 ; j < n; i=j, ++j){
        precessEdge(i, j);
    }
}

this one yields:

01061050  mov         eax,esi  
01061052  call        precessEdge (1061000h)  
01061057  mov         ecx,esi  
01061059  inc         esi  
0106105A  cmp         esi,edi  
0106105C  jb          processPolygon2+10h (1061050h)  
0106105E  pop         esi  

No div, i.e. faster.

This is known as the Barrett Enumeration

unsigned -1

Are you the kind of programmer that uses unsigned variable index and genuinely needs to set it to FFs or -1 for special purpose?
Stuff like:

class Foo{
	/*snip snip*/
	unsigned int mIndex;
	void bar(){
		mIndex = unsigned int(-1);
	}
	void monkey(){
		if(mIndex == unsigned int(-1)){
			/* snip snip */
		}
	}
};

If you are like me you probably like to make thing generic without warning while involving as less code as possible.
So here is a thing for you.

class CMax{
public:
	template< class T >
	operator T()const{return (std::numeric_limits<T>::max)();}
};

template< class T >
T MaxOf(const T& a){ return CMax(); }

template< class T >
bool IsMax(const T& a){ return a == MaxOf(a); }

Use them like so:

unsigned int ui = CMax();
unsigned short us = CMax();
if( IsMax(ui) ){
	/* snip snip */
}
if( IsMax(us) ){
	/* snip snip */
}

Better see?


class Foo{
	/*snip snip*/
	unsigned int mIndex;
	void bar(){
		mIndex = CMax();
	}
	void monkey(){
		if( IsMax(mIndex)){
			/* snip snip */
		}
	}
};

Changing the type of mIndex will not affect the rest of the code. One less thing to think about.