OpenMP support

From Free Pascal wiki
Jump to navigationJump to search

What is OpenMP?

OpenMP is an API accessed by language directives to do multi threaded programming, see also http://www.openmp.org. Currently, there is only OpenMP syntax defined for C and Fortran. This page tries to collect some stuff to settle down pascal syntax for it.

Pascal syntax for OpenMP

Proposal 1

Foreword

At first, I must admit that some parts of the OpenMP specification I still don't understand. They did a terrible good job throwing away all common terms ever used in multi threading context, and invented their own ones.

Syntax vs. Compiler directives

OpenMP for C and C++ is implemented by using compiler directives mainly due to the reasons of source code compatibility (or: standards compliance). So a conforming program is intended to behave the same regardless if the actual compiler compiling the program supports those special pragmas or not.

For FreePascal I don't think this is the way to go, because first it changes comments into code and second, it makes the program far less readable. For C programs this doesn't seem to be an issue, if you get my meaning. But in my opinion, readability is a far more important issue than compatibility to older/different compilers. If all else fails, a preprocessor could be provided to strip out the parallel specific stuff, as has been suggested by Marco. Note that you would need this preprocessor for the directives too because older FPCs and Delphi don't skip unknown directives.

Well, enough talk, I start with the easier directives which are luckily the more fundamental ones.

Ok, I got more input than I'd expected and less time than I wished. :) Anyway, against my own objection, the idea of enclosing the parallel code into (local) functions looks very appealing, so I've changed the example accordingly.

parallel

The parallel construct can only be used for a structured block. That means in Pascal it should be enclosed in some sort of begin/end pair anyway - so, as it has been suggested, we could use a (in this particular example non-local) function instead. Though, I don't know yet, if this may bite with other parts of the spec as this is evolving. Let's try:

(Original example A.4.1.c of the OpenMP V2.5 specification):

  procedure SubDomain (var x       : array of Float;
                           istart  : Integer;
                           ipoints : Integer)
  var
     i : Integer;
  begin
     for i := 0 to ipoints - 1 do
        x[istart + i] := 123.456;
  end {SubDomain};
  
  parallel procedure Sub (var x : array of Float);
  // Variables declared here have private context.
  // So each instance of the parallel function has its own set, as usual.
  var
     iam     : Integer;
     nt      : Integer;
     ipoints : Integer;
  // Any variable access outside of the function's scope accesses the variable in
  // a shared context.
  // This might prove problematic, especially because it causes special semantics
  // on the function's parameters, probably depending on the parameter mode or worse:
  // On the calling convention actually used (call-by-value vs. call-by-reference).
  begin // of (possibly) parallel section
     iam := OMP.Get_Thread_Num;  // OMP library calls.
     nt  := OMP.Get_Num_Threads;
        
     ipoints := Length (x) div nt; // size of partition
     istart  := iam * ipoints;     // starting array index
        
     if iam = Pred (nt) then
        ipoints := Length (x) - istart; // last thread may do more
        
     SubDomain (x, istart, ipoints);
  end {Sub};
  
  var
     arr : array[0 .. 9999] of Float;
  begin  // Main program
     Sub (arr);
  end.

I don't like the idea of declaring variables inside the actual statments, this looks very unpascalish. Maybe we can find a way around it. --FPK 10:22, 26 July 2006 (CEST)

I agree with Florian that this is not the way to go. Why not require all parallelizable code to be in local functions ? After all, that's almost what you are doing: declaring a local function. That would be a simple extension of the current syntax. You have access to all local variables; all you'd need is to add a parallel keyword to the local function declaration.


Ok, so what do you think about the changed example above? OpenMP really is about coarse grain parallelism, so I see indeed no strong reason, why parallel blocks shouldn't be enclosed in procedures. Parallel functions obviously do not make sense, as every thread could return its own return value, but the block calling the parallel function can only evaluate one. I would have liked the notion of a local block, though (I'm quite used to it), but as I seem to be the only one... --V.hoefler 21:03, 27 July 2006 (CEST)

parallel for

This is simply a parallel for-loop. There's nothing special to it. Although OMP2.5 states a for-loop iteration variable is private in that construct, which I consider rather redundant, I hardly can imagine correctly behaving code with a shared loop iteration variable. It also places some restrictions onto the allowed loop-statements (no change of iteration variable inside the loop, simple iteration constructs, ...), but these are already implemented in the language, so there's no need to elaborate on that much further.

(Example A.1.1c):

  procedure a1 (      n : Integer;
                const a : array of Float;
                var   b : array of Float);
  var
     i : Integer;
  begin
     parallel for i := Succ (Low (a)) to High (a) do
        b[i] := (a[i] + a[i - 1]) / 2.0;
  end {a1};

That's it. Now probably someone sees the reason why I wouldn't use the parallel keyword as function modifier like inline or cdecl are used, but rather prepend it to the function header itself. I think, it's a more consistent usage of a new keyword. -- V.hoefler 21:17, 27 July 2006 (CEST)

data sharing attributes

To me these seem quite complex constructs considering that most of the time you probably won't need it at all, because the default is fine and follows normal programming logic. So if anyone has an idea, if and why we need to support them explicitely, here's the place.

threadprivate

This attribute closely resembles, what FreePascal already knows as threadvar, so I even think, we can reuse this keyword here. I see some semantic issue though:

The OMP2.5 specification states:

The values of the data in the threadprivate objects of threads other than the initial thread are guaranteed to persist between two consecutive parallel regions only if all the conditions hold:

and then follows a list of condition, which basically state, that the number of threads in both sections must be the same.

So to write some simple pseudo code to demonstrate:

procedure Thread_Vars;
threadvar
   Count : Integer;
var
   i     : Integer;
begin
   Count := 0; // initial state

   parallel for i := 1 to SOME_VALUE do
      Count := Count + 1;

   // Point A

   parallel for i := 1 to SOME_OTHER_VALUE do
      Count := Count + 2;

   // Point B

end {Thread_Vars};

Now, each iteration of the loop is executed in parallel, so each copy also gets its own copy of Count. At Point A after the loop, Count would equal 1, because if each loop iteration was executed by a single thread, the incrementing operation would have happened only once (per thread). Let aside the question which copy is seen after the loop, the thing gets more interesting. What value is seen at Point B? Well, if I understood the specification correctly, the value would be 3 if and only if the actual values of the place-holding constants SOME_VALUE and SOME_OTHER_VALUE are equal. In any other case, the value of Count at Point B would be undefined.

Proposal 2: Using local functions

Instead of using new block types (like parallel), it uses a nested procedure, with the parallel modifier.

parallel

  procedure SubDomain (var x       : array of Float;
                           istart  : Integer;
                           ipoints : Integer); 
  var
     i : Integer;
  begin
     for i := 0 to ipoints - 1 do
        x[istart + i] := 123.456;
  end {SubDomain};
  
  procedure Sub (var x : array of Float);
 
    procedure ParallelBlock; parallel;
    var     
        iam     : Integer;
        nt      : Integer;
        ipoints : Integer;
    begin
         iam := OMP.Get_Thread_Num;  // OMP library calls.
         nt  := OMP.Get_Num_Threads;
         
         ipoints := Length (x) div nt; // size of partition
         istart  := iam * ipoints;     // starting array index
         
         if iam = Pred (nt) then
            ipoints := Length (x) - istart; // last thread may do more
        
         SubDomain (x, istart, ipoints);
    end;

  begin
       ParallelBlock;
  end {Sub};
  
  var
     arr : array[0 .. 9999] of Float;
  begin  // Main program
     Sub (arr);
  end.

Proposal 3: Thread events

The parallel paradigm imposes a huge change in how a program is to be written, is to be compiled and is to be "read" by a fellow programmer. Thus we would leave the safe ground of Delphi language here quite far.

The compiler/runtime needs to dynamically create an unlimited count of threads. With parallel for this is under control of the user, with naked parallel the compiler would need to decide what code sequence is to be run in which thread. All this seems very tricky to me.

OTOH, Delphi language programmers are used to work with "run to completion" events. (I.e. any code is placed in an "event handler" (a "callback") and is "fired" due to events that happen as a consequence of hardware or OS proceedings or are just directly fired (called) by other Delphi-language events.

But due to the make up of the RTL, such "events" only are applicable for the "Main Tread", while all "Worker Treads (created by TThread objects) just run in a "one-shot" or in a cyclic way. Event driven programming is impossible here.

To make working with threads more usable for the "crowd", the same "event driven" paradigm should (optionally) hold for worker threads, too. To enable this, the RTL needs to implement an "event queue" for all threads in a similar way as it's done for the main thread: The user code in a thread is done in multiple event-handlers, that can be assigned to messages. Incoming messages are queued in a FiFo and the event handlers are "fired" by the RTL one after the other when taking each message from the top of the FiFo.

implementation of the "thread event queue"

Implementing this should be a lot easier and straight forward than parallel, as the threads to be used are managed more explicitly under user control and need not to be created and destroyed automatically by the RTL.

I think a new class called (e.g.) TEventThread should be provided by the Free Pascal RTL. This might (or might not) be a sibling of TThread. Each object of this type has its own "event queue". This "event queue" is not implemented in the samy way as the "message queue" of the main thread (which only exists for GUI projects (e.g. done in Lazarus) and uses mechanisms of the underlying GUI mechanism (e.g. Windows or QT). It is implemented just by Delphi code creating a FIFO for the "event specification" and some SyncObject to allow the thread to wait for the next event.

By this a non-GUI project can be created using the "normal" event driven Delphi language paradigm. IMHO this is a major benefit !

Of course a "procedure ... message" can't be defined in a thread. This is only possible in the Main (GUI-) thread.

When doing a GUI project a mechanism needs to be defined to use the main thread as a target for "thread events", as well. Thus a class similar to TEventThread (e.g. called TEventMainThread) should be provided that creates an event queue, uses the appropriate GUI based send-message mechanism to notify the main thread and is hooked in the TApplication message scheduling loop.

When doing a non-GIO project TEventMainThread) could provide an event scheduling loop based in polling using CheckSynchronize() (or similar).

Thread events

To make this more easily usable, by modifying the compiler, this paradigm can be enhanced by introducing "Thread Events".

Instead of using additional new language constructs we could enhance the "event" paradigm to optionally using a different thread for executing the code fired by the event. That should be more easily understandable and usable for Delphi programmers, as it does not change much in their picture of the project.

For defining "events", Delphi language does not use a dedicated keyword, but you just define a property of a procedure or function type. Here I could imagine an "of Thread" directive usable similar to the of Object clause:

defining a Thread Event

  Type  
  TThreadEventHandler = procedure(x: Integer); of Thread;
  test = Class (TObject)
     Property ev : TThreadEventHandler; 
  end.

Similar to of Object, a variable of an of Thread type stores the entry address and a Self pointer. The Self pointer here is a sibling of TThead (or something similar). If in runtime the content is not such a type, of Thread works like of Object.

Thus in the end, we even don't need a new "of Thread" keyword, but can reuse the of Object keyword, as in runtime the meaning can easily be detected.

All this can be done without modifying the language or the compiler just by enhancing the RTL.

using a Thread Event

The most straight forward way to use a thread event is setting the variable (usually a property in some other object) to a class function of an appropriate TThread (or similar) sibling. With that, the self pointer is stored in the same way as with a normal event due to the of Object type.

Maybe a mechanism (language construct or just a library function) to define a thread event with explicitly stating both the function to call and the thread context to be used, might be desirable. But here the function is not a class function of the TThread to be used and thus we need to store as well the Self pointer as the TThread pointer, making things a lot more complicated. So IMHO, for the beginning only class functions of a TThread sibling can be thread events.

implementation of Thread Events

Of course the implementation of this involves a lot of hazzle. e.g.:

- there needs to be an event queue for each of the appropriate 
  thread type (TThread sibling).
  This is independent of, but necessary for "thread events"
- additionally, the arguments for the event handler need to be 
  stored in the queue and used when the callback function in 
  the thread is actually started
- result values (var parameters, function result) needs to be 
  returned in a decent way (either when the thread is ready or 
  when it stores a value to the var parameter)
- the thread that fires a thread event needs to be informed 
  about the state of the threads running in parallel
- the thread that fires a thread event needs to be able to wait 
  on the fired thread(s) to be ready 
- ...

--Mschnell 12:02, 5 December 2007 (CET)

Proposal 4

<not yet done>