http://www.alper.net/ Eyup Alper Yoney

17Mar/103

Peterson’s Solution on Modern Multiprocessors

I wrote about the Peterson's Solution in one of my previous posts. I gave an example of how the Peterson’s n-Process Protocol can be implemented in C. It is a classical software based solution to the critical section problem. It isn't desirable to implement on modern systems since it wastes CPU time (busy waiting).

In addition, there is one -very important- reason why it isn't implemented today; there is no guarantee that Peterson's Solution will work on modern multiprocessor. When I had first heard it, I couldn't understand the exact reason why it will not work.  Yesterday afternoon, I had time to search. I was thinking I can understand the whole picture, every details in a short time (how naive).

I found the first well described information in Solaris' Multithreaded Programming Guide. Briefly, it says, Peterson's Algorithm works when the multiprocessor has strongly ordered memory. However today's multiprocessors (not all) have store buffers. When a process/thread executes a store instruction, the data is put into the store buffer. The buffered data is  sent to the cache sooner or later, but not necessarily right away.

Store Buffer

Consequently, two threads running on separate processors both could enter the critical section.  Each thread sets its own interested array slot (interested[process] = TRUE) and then loads from the others slot (interested[other] == TRUE). Both threads could read the old values (FALSE) and then assume that the other party is not interested. It means both enter the critical section.

#define FALSE  0
#define TRUE   1

int turn;
int interested[2];

void enterRegion(int process)
{
  int other = 1 - process;
  interested[process] = TRUE;
  turn = process;

  while (interested[other] == TRUE && turn == process)
    ;
}

void leaveRegion(int process)
{
  interested[process] = FALSE;
}
 

The first explanation is pretty clear. However when you dive into the details, it becomes a complex subject (memory models,  instruction reordering etc.).  In addition it is hardware depended, you should read different documents for Intel, SPARC or other architectures.  Anyway, I don't like to write about advanced topics which I don't have accurate knowledge yet. I found and downloaded lots of documents. I think it will take at least few months to read and to understand them. I had estimated few hours at the beginning. :)

After explained the problem, you might think of the question, then how can we implement multithreading? It is really easy as an application developer, just use the synchronization objects provided by operating systems (mutex locks, condition variables, read-write locks, semaphores etc.). Kernel developers consider all hardware depended issues and implement the synchronization objects in proper way.

I had been wondering if I can reproduce the buggy case on modern processors. Therefore I have implemented a test code with Peterson's Algorithm and performed some tests on three different hardware. The test code is very simple, one thread adds 1 to a variable and the other subtracts 1. Results are interesting;

- First machine has got 2 x SPARC64-VII CPU as following

bash-3.00$ psrinfo -vp
The physical processor has 8 virtual processors (0-7)
SPARC64-VII (portid 1024 impl 0x7 ver 0x91 clock 2400 MHz)
The physical processor has 8 virtual processors (8-15)
SPARC64-VII (portid 1032 impl 0x7 ver 0x91 clock 2400 MHz)

Threads run about a minute and after 32.577.239 calculations (add/sub), I have seen 17 cases where both entered critical section.

- Second machine has got 2 x UltraSPARC-T2+ CPU as following

bash-3.00$ psrinfo -vp
The physical processor has 48 virtual processors (0-23 32-55)
  UltraSPARC-T2+ (cpuid 0 clock 1162 MHz)
The physical processor has 48 virtual processors (64-111)
  UltraSPARC-T2+ (cpuid 64 clock 1162 MHz)

Threads run about 5 hours (please consider that first machine performed 32.577.239 calculations in 1 minute), I haven't seen any case where both enter critical section (I am not sure if it means it never happens on UltraSPARC-T2+, I am going to read the CPU specs to understand its memory model).

- Last machine has got 1 x Intel(R) Core(TM)2 Quad CPU Q8400 @ 2.66GHz CPU as following

vendor_id         : GenuineIntel
cpu family        : 6
model              : 23
model name      : Intel(R) Core(TM)2 Quad CPU    Q8400  @ 2.66GHz
stepping          : 10
cpu MHz          : 1998.000
cache size       : 2048 KB

Threads run about 5 minutes, I have seen 3 cases where both entered critical section.

3Mar/100

The Mythical Man-Month

I am currently reading The Mythical Man-Month from Fred Brooks. This book consists of essays on software engineer. These essays draw from his experience as project manager for the IBM System/360 computer family and then OS/360, its massive software system.

I suggest read this book if you take part in any software project. I believe you will find parts similar to your own projects and learn from Brooks' experience.

For example, when a task cannot be partitioned because of sequential constraints, the application of more effort has no effect on the schedule. However, project managers sometimes try to shorten the schedule by assigning men. The following phase from the book perfectly explains the situation. :)



The Mythical Man-Month


The bearing of a child takes nine months, no matter how many women are assigned.

12Jan/100

Endianness

I found a paper about endianness while I was looking on Intel's developer documentations. I have been wondering what kind of advantages can be gained to store the data in little endian or big endian format. There are some explained advantages and disadvantages in that document. However, I haven't been enough convinced with these explanations. I think there are many others related to CPU and hardware design.

Anyway, that document well describes the endianness and the portability issues. It should be read if you have got some questions in mind.

I implemented a function which checks the endiannes of the CPU on run time.  It is possible to decide the architecture by comparing one byte. However, I like to perform strict check on multi-byte data.

Here is the link of the Intel's document.

#include <stdio.h>

union {
  char array[sizeof(int)];
  int intVal;
} testUnion;

int main()
{
  int i, bigEndVal = 0, littleEndVal = 0;
  char c = 'a';

  for(i = 0; i < sizeof(int); i++) {
    testUnion.array[i] = c;
    bigEndVal = (bigEndVal << 8) | c;
    littleEndVal = littleEndVal | (c << 8 * i);
    c++;
  }

  if (testUnion.intVal == bigEndVal)
    printf("Big Endiann");
  else if (testUnion.intVal == littleEndVal)
    printf("Little Endiann");
  else
    printf("Unknownn");

  return 0;
}
 


2Jan/100

C++, Functions That Return Two Values

I am going to explain the class pair that is provided to treat two values as one. The class pair is used several times in STL. The container classes map and multimap use the class pair to handle the key/value pairs. The other usage of the pair class is functions that return two values. I choose the title as 'Functions That Return Two Value' instead of 'The class pair' because it sounds much more attractive since developers used to hear functions can return one value.

The pair structure is defined in <utility> as follows;

namespace std {
  template<class T1, class T2>
  struct pair {
    typedef T1 first_type;
    typedef T2 second_type;

    T1 fist;
    T2 second;

    pair() : first(T1()), second(T2()) {}
    pair(const T1& val1, const T2& val2) : first(val1), second(val2) {}
    template<class Other1, class Other2>
    pair(const pair<Other1, Other2>& p) : first(p.first), second(p.second) {}
  };
}
 

The pair is defined as struct. Therefore, accessing the members (first and second) is possible. The definition is short and straightforward. However, the template version of the copy constructor may seen confusing. The template copy constructor is called whenever implicit conversion is needed. There are many good books about C++, STL and Templates, two of them are as follows;

1. C++ Primer, by Stanley B. Lippman, Josée LaJoie, Barbara E. Moo
2. The C++ Standard Library: A Tutorial and Reference, by Nicolai M. Josuttis

The make_pair() template function makes the life easier by creating the pair without explicit types.

namespace std {
  template<class T1, class T2>
  make_pair(const T1& val1, const T2& val2)
  {
    return pair<T1, T2>(val1, val2);
  }
}
 

In conclusion, here is an example of function that returns two values, including use of the pair class and make_pair function.

#include <cstdlib>
#include <iostream>
#include <utility>

using namespace std;

const double PI = 3.14;

pair<double, double> areaAndCircumferenceOfCircle(double r)
{
  return make_pair((PI * r * r), (2 * PI * r));
}

int main(int argc, char *argv[])
{
  pair<double, double> circle;
 
  circle = areaAndCircumferenceOfCircle(5);
  cout << "Area: " << circle.first << endl;
  cout << "Circumference: " << circle.second << endl;
 
  return 0;
}
 

31Dec/091

Graphical Editor – Programming Challenges

In my previous post, I wrote about the website includes programming problems. I aimed to emphasize the  importance of careful problem text reading.

After that post, I should be more careful. However, I spent last 3 hours to find out what was wrong with my solution. I had checked every line again and again. I even implemented same functionality in different ways. I had submitted 30 times and received the 'wrong answer' every time.

I found the problem only ten minutes ago (BTW, time is about 2.25 am). Here is the problem that took 3 hours...

The problem text says "Create a new M x N image with all pixels initially colored white (O)". Yes, color white is capital letter O, not 0 (Zero). My last 3 hours and 30 submits included 0 (zero) instead O. :(

Tagged as: 1 Comment