The smallest execution unit the computer system can handle is thread, which in principle can be even transferred between different cores. Old computers can only execute single thread from each single program on a single CPU core, so there is no need to distinguish between threads and processes. With the advance of multi-core systems, we need to build the level of abtractions.
- Thread is now truely the smallest execution unit.
- A process can launch multiple threads among a number of cores within one processor.
- A program can have multiple processes on different processors.
- The abstraction makes it such that it is not a strict 1-to-1 mapping between physical processors and virtual processes. For instance, an MPI program can launch more processes than what is physically available on the system.
Be aware of the difference between a protocol and an implementation.
- Message Passing Interface (MPI) is a standardized and portable message-passing standard designed to function on parallel computing architectures.
- MPICH, OpenMPI are implementations of MPI.
- Even though originally only C, C++, and Fortran are supported, there are now bindings which are libraries that extend MPI support to other languages by wrapping an existing MPI implementation such as MPICH or Open MPI.
- OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran.
- OpenMP implementations are coupled into vendors’ compilers, if they are claimed to support a certain OpenMP version.
Any low level parallel library needs to deal with the system. In C++, there is a standard library
pthread which allows you to interact with system threads (which is not available in the high level OpenMP); In Julia, the standard library
Distributed is a native implementation of one-sided communication between processes. The implementation details envolves many network concepts like hand-shaking, data packing/serializing/deserializing, etc.
How to Get a Faster Multi-Processing Framework
Factors that affect the parallel system’s performance:
- network connection speed
- checking procedures when establishing the connection, sending/receiving messages, and closing the connection
- the size of data that requires communicating
- the frequency of communication
From a software programmer’s perspective, only the last two points can be controlled.
- 保证每个“人”有能力高效工作（efficient serial execution）；
- 减少不必要的开会，任务分配下去后，少量高效地沟通，最后一道汇总（reducing communication）。
Modern Philosophy of Concurrency
Quoted from the Go language documentation:
Do not communicate by sharing memory; instead, share memory by communicating.
Channel is a programming concept for implementing this idea. A channel in programming has two halves: a transmitter and a receiver. A channel is said to be closed if either the transmitter or receiver half is dropped.