WebAssembly/Threads

From Free Pascal wiki
Jump to navigationJump to search

Thread support

This page contains some collected informations on the features needed for thread support in WebAssembly (in the browser).

Thread support consists of 4 parts:

  • Atomic instructions.
  • Shared memory and passive segments.
  • Thread Local Storage (threadvars)
  • Actually starting a thread.


Atomic instructions

The proposed specs 

When the Free Pascal RTL is compiled with -CTwasmthreads, the following RTL functions will use the new atomic instructions and thus should be thread safe in a multithreaded environment:

InterlockedDecrement
InterlockedIncrement
InterlockedExchange
InterlockedCompareExchange
InterlockedExchangeAdd

Note that these require proper alignment (4 bytes) of the target, otherwise they trap (i.e. terminate the program with a stack trace).

In addition to that, there are many more atomic functions available in the WebAssembly unit:

const
  { Special values for the TimeoutNanoseconds parameter of AtomicWait }
  awtInfiniteTimeout = -1;
  { AtomicWait result values }
  awrOk = 0;       { woken by another agent in the cluster }
  awrNotEqual = 1; { the loaded value did not match the expected value }
  awrTimedOut = 2; { not woken before timeout expired }

procedure AtomicFence; inline;

function AtomicLoad(constref Mem: Int8): Int8; inline;
function AtomicLoad(constref Mem: UInt8): UInt8; inline;
function AtomicLoad(constref Mem: Int16): Int16; inline;
function AtomicLoad(constref Mem: UInt16): UInt16; inline;
function AtomicLoad(constref Mem: Int32): Int32; inline;
function AtomicLoad(constref Mem: UInt32): UInt32; inline;
function AtomicLoad(constref Mem: Int64): Int64; inline;
function AtomicLoad(constref Mem: UInt64): UInt64; inline;

procedure AtomicStore(out Mem: Int8; Data: Int8); inline;
procedure AtomicStore(out Mem: UInt8; Data: UInt8); inline;
procedure AtomicStore(out Mem: Int16; Data: Int16); inline;
procedure AtomicStore(out Mem: UInt16; Data: UInt16); inline;
procedure AtomicStore(out Mem: Int32; Data: Int32); inline;
procedure AtomicStore(out Mem: UInt32; Data: UInt32); inline;
procedure AtomicStore(out Mem: Int64; Data: Int64); inline;
procedure AtomicStore(out Mem: UInt64; Data: UInt64); inline;

function AtomicAdd(var Mem: Int8; Data: Int8): Int8; inline;
function AtomicAdd(var Mem: UInt8; Data: UInt8): UInt8; inline;
function AtomicAdd(var Mem: Int16; Data: Int16): Int16; inline;
function AtomicAdd(var Mem: UInt16; Data: UInt16): UInt16; inline;
function AtomicAdd(var Mem: Int32; Data: Int32): Int32; inline;
function AtomicAdd(var Mem: UInt32; Data: UInt32): UInt32; inline;
function AtomicAdd(var Mem: Int64; Data: Int64): Int64; inline;
function AtomicAdd(var Mem: UInt64; Data: UInt64): UInt64; inline;

function AtomicSub(var Mem: Int8; Data: Int8): Int8; inline;
function AtomicSub(var Mem: UInt8; Data: UInt8): UInt8; inline;
function AtomicSub(var Mem: Int16; Data: Int16): Int16; inline;
function AtomicSub(var Mem: UInt16; Data: UInt16): UInt16; inline;
function AtomicSub(var Mem: Int32; Data: Int32): Int32; inline;
function AtomicSub(var Mem: UInt32; Data: UInt32): UInt32; inline;
function AtomicSub(var Mem: Int64; Data: Int64): Int64; inline;
function AtomicSub(var Mem: UInt64; Data: UInt64): UInt64; inline;

function AtomicAnd(var Mem: Int8; Data: Int8): Int8; inline;
function AtomicAnd(var Mem: UInt8; Data: UInt8): UInt8; inline;
function AtomicAnd(var Mem: Int16; Data: Int16): Int16; inline;
function AtomicAnd(var Mem: UInt16; Data: UInt16): UInt16; inline;
function AtomicAnd(var Mem: Int32; Data: Int32): Int32; inline;
function AtomicAnd(var Mem: UInt32; Data: UInt32): UInt32; inline;
function AtomicAnd(var Mem: Int64; Data: Int64): Int64; inline;
function AtomicAnd(var Mem: UInt64; Data: UInt64): UInt64; inline;

function AtomicOr(var Mem: Int8; Data: Int8): Int8; inline;
function AtomicOr(var Mem: UInt8; Data: UInt8): UInt8; inline;
function AtomicOr(var Mem: Int16; Data: Int16): Int16; inline;
function AtomicOr(var Mem: UInt16; Data: UInt16): UInt16; inline;
function AtomicOr(var Mem: Int32; Data: Int32): Int32; inline;
function AtomicOr(var Mem: UInt32; Data: UInt32): UInt32; inline;
function AtomicOr(var Mem: Int64; Data: Int64): Int64; inline;
function AtomicOr(var Mem: UInt64; Data: UInt64): UInt64; inline;

function AtomicXor(var Mem: Int8; Data: Int8): Int8; inline;
function AtomicXor(var Mem: UInt8; Data: UInt8): UInt8; inline;
function AtomicXor(var Mem: Int16; Data: Int16): Int16; inline;
function AtomicXor(var Mem: UInt16; Data: UInt16): UInt16; inline;
function AtomicXor(var Mem: Int32; Data: Int32): Int32; inline;
function AtomicXor(var Mem: UInt32; Data: UInt32): UInt32; inline;
function AtomicXor(var Mem: Int64; Data: Int64): Int64; inline;
function AtomicXor(var Mem: UInt64; Data: UInt64): UInt64; inline;

function AtomicExchange(var Mem: Int8; Data: Int8): Int8; inline;
function AtomicExchange(var Mem: UInt8; Data: UInt8): UInt8; inline;
function AtomicExchange(var Mem: Int16; Data: Int16): Int16; inline;
function AtomicExchange(var Mem: UInt16; Data: UInt16): UInt16; inline;
function AtomicExchange(var Mem: Int32; Data: Int32): Int32; inline;
function AtomicExchange(var Mem: UInt32; Data: UInt32): UInt32; inline;
function AtomicExchange(var Mem: Int64; Data: Int64): Int64; inline;
function AtomicExchange(var Mem: UInt64; Data: UInt64): UInt64; inline;

function AtomicCompareExchange(var Mem: Int8; Compare, Data: Int8): Int8; inline;
function AtomicCompareExchange(var Mem: UInt8; Compare, Data: UInt8): UInt8; inline;
function AtomicCompareExchange(var Mem: Int16; Compare, Data: Int16): Int16; inline;
function AtomicCompareExchange(var Mem: UInt16; Compare, Data: UInt16): UInt16; inline;
function AtomicCompareExchange(var Mem: Int32; Compare, Data: Int32): Int32; inline;
function AtomicCompareExchange(var Mem: UInt32; Compare, Data: UInt32): UInt32; inline;
function AtomicCompareExchange(var Mem: Int64; Compare, Data: Int64): Int64; inline;
function AtomicCompareExchange(var Mem: UInt64; Compare, Data: UInt64): UInt64; inline;

function AtomicWait(constref Mem: Int32; Compare: Int32; TimeoutNanoseconds: Int64): Int32; inline;
function AtomicWait(constref Mem: UInt32; Compare: UInt32; TimeoutNanoseconds: Int64): Int32; inline;
function AtomicWait(constref Mem: Int64; Compare: Int64; TimeoutNanoseconds: Int64): Int32; inline;
function AtomicWait(constref Mem: UInt64; Compare: UInt64; TimeoutNanoseconds: Int64): Int32; inline;

function AtomicNotify(constref Mem: Int32; Count: UInt32): UInt32; inline;
function AtomicNotify(constref Mem: UInt32; Count: UInt32): UInt32; inline;
function AtomicNotify(constref Mem: Int64; Count: UInt32): UInt32; inline;
function AtomicNotify(constref Mem: UInt64; Count: UInt32): UInt32; inline;

Shared memory and passive segments

First, the memory needs to be declared shared.

Secondly, the data segments need to be declared passive segments and extra startup code should be generated by the compiler to initialize them only once. Without this, when the module is instantiated on a new WebWorker (in order to start a new thread), this will cause memory to be initialized again to the initial state, which is not what we want when starting a thread.

Some info: Shared Memory and Passive Segments

Turns out, this is all done by the LLVM linker (including the initialization startup code), when you pass the appropriate command line options. The compiler now passes these options to the linker, when a program is compiled with -CTbfexceptions. As a side effect, such programs no longer work with "wasmtime run --enable-features threads", but that's because wasmtime's threads support is incomplete.

Thread Local Storage (threadvars)

Special consideration is needed to support threadvars.

More info here: Thread Local Storage

Code generation for threadvar access is now implemented in FPC. It follows the ABI convention for TLS from Emscripten. However, it causes the LLVM 14 linker to crash. The LLVM 15 (release candidate) linker from Emscripten seems to work. The following intrinsics exist in the system unit, they should be used for things like setting up the TLS of the new thread, or for allocating memory for the TLS in the calling thread, before passing it to the JavaScript helper:

function fpc_wasm32_tls_size: SizeUInt; - returns the TLS size in bytes for the entire program
function fpc_wasm32_tls_align: SizeUInt; - returns the alignment requirements for the TLS block for the program
function fpc_wasm32_tls_base: Pointer; - the start of the TLS block for the current thread. Only becomes valid after calling __wasm_init_tls (which is a special function, generated by the linker).
procedure fpc_wasm32_init_tls(memory: Pointer);external name '__wasm_init_tls';

TODO: initialize threadvars for the main thread, during system unit initialization, before the first ever threadvar is accessed.

Actually starting a thread

Webassembly relies on the hosting environment to actually start threads.

Some extra info:

Unfortunately, the WASI Native Threads API proposal is very incomplete. Emscripten implements threads using a different API/ABI, but it's quite messy and poorly documented.

Starting a thread requires the following steps:

  • WebAssembly: allocating a block of memory for the stack and TLS (threadvar) block for the new thread. This needs to be done in a thread safe manner (can we use the heap?). TODO: How do we determine the stack size for the new thread? Do we use the same main stack size as specified by the {$M stacksize} directive?
  • WebAssembly: calling an external (JavaScript) function and passing at least the following data, that needs to be passed to the new thread by JavaScript code:
 - the start address of the new thread procedure
 - the args that need to be passed to the new thread procedure
 - the stack and TLS address
 - TODO: how/where do we determine the thread ID?

What does Emscripten do?

int __pthread_create_js(struct pthread *thread, const pthread_attr_t *attr, void *(*start_routine) (void *), void *arg);

Basically, it exposes the pthread structure to the JavaScript code, which expects certain things (stack start, stack limit, TLS start, thread ID, etc.) to be at certain fixed offsets.

What does the WASI Native Threads API proposal do?

status thread_spawn(thread_id* thread_id, const thread_attributes* attrs, thread_start_func* function, thread_args* arg);

It doesn't (yet) define what thread_id or thread_attributes are.

  • JavaScript: TBD...delegate the new thread to a Web Worker, pass the arguments, etc. Decisions: Should the main thread be started in the main JavaScript GUI thread or inside a worker? Emscripten supports both (via PROXY_TO_PTHREAD). Should we support both as well? Other caveats: [1] and [2]
  • WebAssembly: a function that sets up the new thread:
 - setting up the stack pointer in linear memory (might need inline asm, an external wasm module or some compiler magic helper code generation, because no local variables can be used, before this is set up)
 - initialize global variables that hold the TLS (threadvar) block (calling __wasm_init_tls). Threadvars should not be used before this point.
 - maybe initialize some threadvars, that contain information about the current thread (is it the main thread, is it run in the main browser thread or in a worker, can it use atomic wait, or should it busy-wait instead?)
 - call the actual thread function and pass its parameters

What does Emscripten do?

void _emscripten_thread_init(pthread_t ptr, int is_main, int is_runtime, int can_block, int start_profiling);

What does the WASI Native Threads API proposal do?

- It only mentions a function, called _start_thread, but doesn't specify its parameters.