Note--Parallel programming

Cooperative Multitasking vs Preemptive Multitasking

Preemptive Multitasking

In preemptive multitasking, the operating system preempts a program to allow another waiting task to run on the CPU. Programs or threads cannot determine how long they can use the CPU. The operating system's scheduler decides which thread or program gets to use the CPU next and for how much time.

Cooperative Multitasking

Cooperative Multitasking involves well-behaved programs voluntarily giving up control to the scheduler so that another program can run. A program or thread may give up control after a period of time has expired, or if it becomes idle or logically blocked. The operating system's scheduler has no control over how long a program or thread runs.

Threads

In C++, a new thread starts its work immediately. It can run in the foreground or background and gets its data by copy or reference.

Shared Variables

Access to shared variables between threads needs to be coordinated. This coordination can be done in different ways using mutexes or locks. Often, it is sufficient to protect the initialization of the data as it will be immutable during its lifetime.

Thread-Local Variable

Declaring a variable as thread-local ensures that a thread gets its own copy, so there is no conflict.

Condition Variables

Condition variables are a classic solution to implement sender-receiver workflows. The key idea is that the sender notifies the receiver when it’s done with its work so that the receiver can start.

Tasks

Tasks have a lot in common with threads. While a programmer explicitly creates a thread, a task will be implicitly created by the C++ runtime. Tasks are like data channels. The promise puts data into the data channel, and the future picks the value up. The data can be a value, an exception, or simply a notification.

Mutex

As the name implies, a mutex provides mutual exclusion. A mutex is used to guard shared data such as a linked list, an array, or any primitive type. A mutex allows only a single thread to access a resource or critical section.

Semaphore

A semaphore, on the other hand, is used for limiting access to a collection of resources. A semaphore with a single permit is called a binary semaphore.

Critical Section

A critical section is any piece of code that may be executed concurrently by more than one thread of the application and exposes shared data or resources used by the application for access.

The most important difference between the two is that, in the case of a mutex, the same thread must call acquire and subsequent release on the mutex. In contrast, with a binary semaphore, different threads can call acquire and release on the semaphore.

A mutex is owned by a thread, whereas a semaphore has no concept of ownership. If a mutex is locked, it must be unlocked by the same thread. A semaphore, on the other hand, can be acted upon by different threads, even if it has a permit of one.

In Mesa monitors - a language developed by Xerox researchers in the 1970s - there is a possibility that, between when thread B calls notify() and releases its mutex, and when the sleeping thread A wakes up and reacquires the mutex, the predicate is changed back to false by another thread different from the signaler and the awoken threads! The awoken thread competes with other threads to acquire the mutex once the signaling thread B empties the monitor. On signaling, thread B doesn't give up the monitor just yet; instead, it continues to own the monitor until it exits the monitor section.

In contrast, Hoare monitors - named after one of the original inventors of monitors - the signaling thread B relinquishes the monitor to the awoken thread A, which immediately enters the monitor, while thread B waits. This guarantees that the predicate will not have changed, and instead of checking for the predicate in a while loop, an if-clause would suffice. The awoken/released thread A immediately starts execution when the signaling thread B signals that the predicate has changed. No other thread gets a chance to change the predicate since no other thread gets to enter the monitor.

Java, in particular, follows Mesa monitor semantics, and developers are always expected to check for the condition/predicate in a while loop. Mesa monitors are more efficient than Hoare monitors.

Differences Between Semaphore and Monitor

Monitors and semaphores are interchangeable in theory, as one can be constructed out of the other or reduced to the other. However, monitors atomically acquire necessary locks, while the developer is responsible for appropriately acquiring and releasing locks with semaphores, which can be error-prone.

Semaphores are lightweight compared to monitors, which are bloated. However, the tendency to misuse semaphores is greater than monitors. When using a semaphore and mutex pair as an alternative to a monitor, it is easy to lock the wrong mutex or forget to lock altogether. Although both constructs can solve the same problem, monitors provide a pre-packaged solution with less dependence on a developer's skill to properly lock.

In Java, monitors enforce correct locking by throwing the IllegalMonitorState exception when condition variable methods are invoked without first acquiring the associated lock. The exception indicates that either the object's lock/mutex was not acquired or an incorrect lock was acquired.

A semaphore can allow several threads to access a given resource or critical section, but only a single thread can own the monitor and access the associated resource at any given time.

Semaphores can address the issue of missed signals, but monitors require additional state called the predicate, apart from the condition variable and the mutex that make up the monitor, to solve the problem of missed signals.

Amdahl's Law

Blindly adding threads to speed up program execution may not always be a good idea. Find out what Amdahl's Law says about parallelizing a program

S(n) is the speed-up achieved by using n cores or threads.

P is the fraction of the program that is parallelizable

(1 - P) is the fraction of the program that must be executed serially.

Java specification

Assignments and reads for primitive data types except for double and long are always atomic.

The reads and writes to double and long primitive types aren’t atomic.

All reference assignments are atomic.

Creating a Thread in Java

In Java, there are two ways to create a new thread of execution. The first way is to declare a class as a subclass of Thread. This subclass should override the run method of class Thread. An instance of the subclass can then be allocated and started.

The second way to create a thread is to declare a class that implements the Runnable interface. This class should then implement the run method. An instance of the class can then be allocated, passed as an argument when creating a Thread, and started.

Synchronized

Java’s most fundamental construct for thread synchronization is the synchronized keyword. It can be used to restrict access to critical sections one thread at a time.

Java

class Employee {

    // shared variable
    private String name;

    // method is synchronize on 'this' object
    public synchronized void setName(String name) {
        this.name = name;
    }

    // also synchronized on the same object
    public synchronized void resetName() {

        this.name = "";
    }

    // equivalent of adding synchronized in method
    // definition
    public String getName() {
        synchronized (this) {
            return this.name;
        }
    }
}

wait()

The wait method is available on every Java object. Each Java object can act as a condition variable. When a thread executes the wait method, it releases the monitor for the object and is placed in the wait queue. Note that the thread must be inside a synchronized block of code that synchronizes on the same object as the one on which wait() is being called. In other words, the thread must hold the monitor of the object on which it will call wait(). If not, an illegalMonitor exception is raised!

notify()

Like the wait() method, notify() can only be called by the thread which owns the monitor for the object on which notify() is being called. Otherwise, an illegal monitor exception is thrown. The notify method awakens one of the threads in the associated wait queue, i.e., waiting on the thread's monitor.

However, this thread will not be scheduled for execution immediately and will compete with other active threads that are trying to synchronize on the same object. The thread which executed notify will also need to give up the object's monitor, before any one of the competing threads can acquire the monitor and proceed forward.

notifyAll()

This method is the same as the notify() one, except that it wakes up all the threads that are waiting on the object's monitor.

Packages:

piscina - the node.js worker pool

worker_threads

Worker threads | Node.js v17.8.0 Documentation

Source Code: lib/worker_threads.js This operation cannot be undone. There is no equivalent to this API in browsers. Transfer a MessagePort to a different Context. The original port object is rendered unusable, and the returned MessagePort instance takes its place. The returned MessagePort is an object in the target context and inherits from its global Object class.

https://nodejs.org/api/worker_threads.html

cluster

Clusters of Node.js processes can be used to run multiple instances of Node.js that can distribute workloads among their application threads.When process isolation is not needed, use the worker_threads module instead, which allows running multiple application threads within a single Node.js instance.

For CPU-intensive tasks, set the number of threads to N (number of CPU cores) + 1. The additional thread is to prevent interruptions caused by thread thrashing or other reasons that may lead to task suspension and impact performance.

For IO-intensive tasks, where the system spends most of the time handling I/O interactions, threads do not occupy the CPU during I/O processing. Therefore, additional threads can be configured for use by other tasks. The recommended calculation method is 2N.