Peterson’s Solution on Modern Multiprocessors
I wrote about the Peterson's Solution in one of my previous posts. I gave an example of how the Peterson’s n-Process Protocol can be implemented in C. It is a classical software based solution to the critical section problem. It isn't desirable to implement on modern systems since it wastes CPU time (busy waiting).
In addition, there is one -very important- reason why it isn't implemented today; there is no guarantee that Peterson's Solution will work on modern multiprocessor. When I had first heard it, I couldn't understand the exact reason why it will not work. Yesterday afternoon, I had time to search. I was thinking I can understand the whole picture, every details in a short time (how naive).
I found the first well described information in Solaris' Multithreaded Programming Guide. Briefly, it says, Peterson's Algorithm works when the multiprocessor has strongly ordered memory. However today's multiprocessors (not all) have store buffers. When a process/thread executes a store instruction, the data is put into the store buffer. The buffered data is sent to the cache sooner or later, but not necessarily right away.

Consequently, two threads running on separate processors both could enter the critical section. Each thread sets its own interested array slot (interested[process] = TRUE) and then loads from the others slot (interested[other] == TRUE). Both threads could read the old values (FALSE) and then assume that the other party is not interested. It means both enter the critical section.
#define TRUE 1
int turn;
int interested[2];
void enterRegion(int process)
{
int other = 1 - process;
interested[process] = TRUE;
turn = process;
while (interested[other] == TRUE && turn == process)
;
}
void leaveRegion(int process)
{
interested[process] = FALSE;
}
The first explanation is pretty clear. However when you dive into the details, it becomes a complex subject (memory models, instruction reordering etc.). In addition it is hardware depended, you should read different documents for Intel, SPARC or other architectures. Anyway, I don't like to write about advanced topics which I don't have accurate knowledge yet. I found and downloaded lots of documents. I think it will take at least few months to read and to understand them. I had estimated few hours at the beginning.
After explained the problem, you might think of the question, then how can we implement multithreading? It is really easy as an application developer, just use the synchronization objects provided by operating systems (mutex locks, condition variables, read-write locks, semaphores etc.). Kernel developers consider all hardware depended issues and implement the synchronization objects in proper way.
I had been wondering if I can reproduce the buggy case on modern processors. Therefore I have implemented a test code with Peterson's Algorithm and performed some tests on three different hardware. The test code is very simple, one thread adds 1 to a variable and the other subtracts 1. Results are interesting;
- First machine has got 2 x SPARC64-VII CPU as following
bash-3.00$ psrinfo -vp The physical processor has 8 virtual processors (0-7) SPARC64-VII (portid 1024 impl 0x7 ver 0x91 clock 2400 MHz) The physical processor has 8 virtual processors (8-15) SPARC64-VII (portid 1032 impl 0x7 ver 0x91 clock 2400 MHz)
Threads run about a minute and after 32.577.239 calculations (add/sub), I have seen 17 cases where both entered critical section.
- Second machine has got 2 x UltraSPARC-T2+ CPU as following
bash-3.00$ psrinfo -vp The physical processor has 48 virtual processors (0-23 32-55) UltraSPARC-T2+ (cpuid 0 clock 1162 MHz) The physical processor has 48 virtual processors (64-111) UltraSPARC-T2+ (cpuid 64 clock 1162 MHz)
Threads run about 5 hours (please consider that first machine performed 32.577.239 calculations in 1 minute), I haven't seen any case where both enter critical section (I am not sure if it means it never happens on UltraSPARC-T2+, I am going to read the CPU specs to understand its memory model).
- Last machine has got 1 x Intel(R) Core(TM)2 Quad CPU Q8400 @ 2.66GHz CPU as following
vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Core(TM)2 Quad CPU Q8400 @ 2.66GHz stepping : 10 cpu MHz : 1998.000 cache size : 2048 KB
Threads run about 5 minutes, I have seen 3 cases where both entered critical section.
Graphical Editor – Programming Challenges
In my previous post, I wrote about the website includes programming problems. I aimed to emphasize the importance of careful problem text reading.
After that post, I should be more careful. However, I spent last 3 hours to find out what was wrong with my solution. I had checked every line again and again. I even implemented same functionality in different ways. I had submitted 30 times and received the 'wrong answer' every time.
I found the problem only ten minutes ago (BTW, time is about 2.25 am). Here is the problem that took 3 hours...
The problem text says "Create a new M x N image with all pixels initially colored white (O)". Yes, color white is capital letter O, not 0 (Zero). My last 3 hours and 30 submits included 0 (zero) instead O.
Programming Challenges
I discovered a new website while I was reading 'The Algorithm Design Manual' from Steve Skiena (I have read only one chapter so far. However, I can say it is worth to read all).
On that website, You can find hundreds of problems that are like the ones used during programming contests. You can submit your solutions in C/C++, Java and Pascal.
This is really interesting opportunity to the developers who haven't got many challenging tasks at work and to who like to solve puzzles. I am some more lucky, my company develops real time billing systems on Unix/Linux. Therefore, I do much more interesting than reading/writing database. However, even in my company, the most challenging parts are already developed and they are put to the libraries. Therefore, I don't face challenging problems every day. I mostly don't do more than calling library functions in order or fixing cheesy bugs. Fortunately there are times that I am asked to develop tough projects.
I have solved only two of the problems so far. I received 'wrong answer' for one of them due to the '\n' character at the end of the file. The code printing one less '\n' character is accepted. The results are checked by a software and every byte of the output is important. If you receive 'wrong answer' as a result and you are sure about your solution, you should read the problem text carefully for the output format.
The Algorithm Design Manual
Tower of Hanoi
The tower of Hanoi is a mathematical game invented by French mathematician Edouard Lucas in 1883. We are given a tower of eight disks, initially stacked in decreasing size on one of three pegs.

The objective is to transfer entire tower A to the peg B, moving only one disk at a time and never moving a larger one onto a smaller one.
The algorithm to transfer n disks from A to B in general: We first transfer n - 1 smallest disks to peg C, then move the largest one to the peg B and finally transfer the n - 1 smallest back onto largest (peg B).
The number of necessary moves to transfer n disks can be found by
Tn = 2^n - 1
Here is the sample solution in C:
void hanoi(int n_disks, char source, char destination, char temp)
{
static n_move = 0;
if (n_disks == 1) {
printf("%d - Move disk 1 from %c to %c\n", ++n_move, source, destination);
} else {
hanoi(n_disks - 1, source, temp, destination);
printf("%d - Move disk %d from %c to %c\n", ++n_move, n_disks, source, destination);
hanoi(n_disks -1, temp, destination, source);
}
}
int main(int argc, char *argv)
{
int n_disk = 8;
char source = 'A';
char destination = 'B';
char temp = 'C';
hanoi(n_disk, source, destination, temp);
return 0;
}