From f5e991bee01104342ade57d8f2ea51527190c50d Mon Sep 17 00:00:00 2001 From: Gabriel Majeri Date: Sun, 9 Sep 2018 14:22:58 +0300 Subject: [PATCH 1/6] Expand the documentation for the std::sync module Provides an overview on why synchronization is required, as well a short summary of what sync primitives are available. --- src/libstd/sync/mod.rs | 123 +++++++++++++++++++++++++++++++++++++++-- 1 file changed, 119 insertions(+), 4 deletions(-) diff --git a/src/libstd/sync/mod.rs b/src/libstd/sync/mod.rs index e12ef8d9eda2d..e06e299406933 100644 --- a/src/libstd/sync/mod.rs +++ b/src/libstd/sync/mod.rs @@ -10,10 +10,125 @@ //! Useful synchronization primitives. //! -//! This module contains useful safe and unsafe synchronization primitives. -//! Most of the primitives in this module do not provide any sort of locking -//! and/or blocking at all, but rather provide the necessary tools to build -//! other types of concurrent primitives. +//! ## The need for synchronization +//! +//! On an ideal single-core CPU, the timeline of events happening in a program +//! is linear, consistent with the order of operations in the code. +//! +//! Considering the following code, operating on some global static variables: +//! +//! ```rust +//! # static mut A: u32 = 0; +//! # static mut B: u32 = 0; +//! # static mut C: u32 = 0; +//! # unsafe { +//! A = 3; +//! B = 4; +//! A = A + B; +//! C = B; +//! println!("{} {} {}", A, B, C); +//! C = A; +//! # } +//! ``` +//! +//! It appears _as if_ some variables stored in memory are changed, an addition +//! is performed, result is stored in A and the variable C is modified twice. +//! When only a single thread is involved, the results are as expected: +//! the line `7 4 4` gets printed. +//! +//! As for what happens behind the scenes, when an optimizing compiler is used +//! the final generated machine code might look very different from the code: +//! +//! - first store to `C` might be moved before the store to `A` or `B`, +//! _as if_ we had written `C = 4; A = 3; B = 4;` +//! +//! - last store to `C` might be removed, since we never read from it again. +//! +//! - assignment of `A + B` to `A` might be removed, the sum can be stored in a +//! in a register until it gets printed, and the global variable never gets +//! updated. +//! +//! - the final result could be determined just by looking at the code at compile time, +//! so [constant folding] might turn the whole block into a simple `println!("7 4 4")` +//! +//! The compiler is allowed to perform any combination of these optimizations, as long +//! as the final optimized code, when executed, produces the same results as the one +//! without optimizations. +//! +//! When multiprocessing is involved (either multiple CPU cores, or multiple +//! physical CPUs), access to global variables (which are shared between threads) +//! could lead to nondeterministic results, **even if** compiler optimizations +//! are disabled. +//! +//! Note that thanks to Rust's safety guarantees, accessing global (static) +//! variables requires `unsafe` code, assuming we don't use any of the +//! synchronization primitives in this module. +//! +//! [constant folding]: https://en.wikipedia.org/wiki/Constant_folding +//! +//! ## Out-of-order execution +//! +//! Instructions can execute in a different order from the one we define, due to +//! various reasons: +//! +//! - **Compiler** reordering instructions: if the compiler can issue an +//! instruction at an earlier point, it will try to do so. For example, it +//! might hoist memory loads at the top of a code block, so that the CPU can +//! start [prefetching] the values from memory. +//! +//! In single-threaded scenarios, this can cause issues when writing signal handlers +//! or certain kinds of low-level code. +//! Use [compiler fences] to prevent this reordering. +//! +//! - **Single processor** executing instructions [out-of-order]: modern CPUs are +//! capable of [superscalar] execution, i.e. multiple instructions might be +//! executing at the same time, even though the machine code describes a +//! sequential process. +//! +//! This kind of reordering is handled transparently by the CPU. +//! +//! - **Multiprocessor** system, where multiple hardware threads run at the same time. +//! In multi-threaded scenarios, you can use two kinds of primitives to deal +//! with synchronization: +//! - [memory fences] to ensure memory accesses are made visibile to other +//! CPUs in the right order. +//! - [atomic operations] to ensure simultaneous access to the same memory +//! location doesn't lead to undefined behavior. +//! +//! [prefetching]: https://en.wikipedia.org/wiki/Cache_prefetching +//! [compiler fences]: atomic::compiler_fence +//! [out-of-order]: https://en.wikipedia.org/wiki/Out-of-order_execution +//! [superscalar]: https://en.wikipedia.org/wiki/Superscalar_processor +//! [memory fences]: atomic::fence +//! [atomics operations]: atomic +//! +//! ## Higher-level synchronization objects +//! +//! Most of the low-level synchronization primitives are quite error-prone and +//! inconvenient to use, which is why the standard library also exposes some +//! higher-level synchronization objects. +//! +//! These abstractions can be built out of lower-level primitives. For efficiency, +//! the sync objects in the standard library are usually implemented with help +//! from the operating system's kernel, which is able to reschedule the threads +//! while they are blocked on acquiring a lock. +//! +//! ## Efficiency +//! +//! Higher-level synchronization mechanisms are usually heavy-weight. +//! While most atomic operations can execute instantaneously, acquiring a +//! [`Mutex`] can involve blocking until another thread releases it. +//! For [`RwLock`], while! any number of readers may acquire it without +//! blocking, each writer will have exclusive access. +//! +//! On the other hand, communication over [channels] can provide a fairly +//! high-level interface without sacrificing performance, at the cost of +//! somewhat more memory. +//! +//! The more synchronization exists between CPUs, the smaller the performance +//! gains from multithreading will be. +//! +//! [channels]: mpsc #![stable(feature = "rust1", since = "1.0.0")] From e0df0ae734ec97ad7cc67cf6bed0d142275571b9 Mon Sep 17 00:00:00 2001 From: Gabriel Majeri Date: Sun, 16 Sep 2018 12:56:44 +0300 Subject: [PATCH 2/6] Make example code use global variables Because `fn main()` was added automatically, the variables were actually local statics. --- src/libstd/sync/mod.rs | 27 ++++++++++++++------------- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/src/libstd/sync/mod.rs b/src/libstd/sync/mod.rs index e06e299406933..df153561b4b16 100644 --- a/src/libstd/sync/mod.rs +++ b/src/libstd/sync/mod.rs @@ -18,17 +18,20 @@ //! Considering the following code, operating on some global static variables: //! //! ```rust -//! # static mut A: u32 = 0; -//! # static mut B: u32 = 0; -//! # static mut C: u32 = 0; -//! # unsafe { -//! A = 3; -//! B = 4; -//! A = A + B; -//! C = B; -//! println!("{} {} {}", A, B, C); -//! C = A; -//! # } +//! static mut A: u32 = 0; +//! static mut B: u32 = 0; +//! static mut C: u32 = 0; +//! +//! fn main() { +//! unsafe { +//! A = 3; +//! B = 4; +//! A = A + B; +//! C = B; +//! println!("{} {} {}", A, B, C); +//! C = A; +//! } +//! } //! ``` //! //! It appears _as if_ some variables stored in memory are changed, an addition @@ -42,8 +45,6 @@ //! - first store to `C` might be moved before the store to `A` or `B`, //! _as if_ we had written `C = 4; A = 3; B = 4;` //! -//! - last store to `C` might be removed, since we never read from it again. -//! //! - assignment of `A + B` to `A` might be removed, the sum can be stored in a //! in a register until it gets printed, and the global variable never gets //! updated. From f3fdbbfae8646be30d7a19db059b9cdc42fadbc4 Mon Sep 17 00:00:00 2001 From: Gabriel Majeri Date: Thu, 27 Sep 2018 20:25:04 +0300 Subject: [PATCH 3/6] Address review comments Reword the lead paragraph and turn the list items into complete sentences. --- src/libstd/sync/mod.rs | 29 +++++++++++++++-------------- 1 file changed, 15 insertions(+), 14 deletions(-) diff --git a/src/libstd/sync/mod.rs b/src/libstd/sync/mod.rs index df153561b4b16..bdb6e49aabc2a 100644 --- a/src/libstd/sync/mod.rs +++ b/src/libstd/sync/mod.rs @@ -12,8 +12,9 @@ //! //! ## The need for synchronization //! -//! On an ideal single-core CPU, the timeline of events happening in a program -//! is linear, consistent with the order of operations in the code. +//! Conceptually, a Rust program is simply a series of operations which will +//! be executed on a computer. The timeline of events happening in the program +//! is consistent with the order of the operations in the code. //! //! Considering the following code, operating on some global static variables: //! @@ -35,22 +36,22 @@ //! ``` //! //! It appears _as if_ some variables stored in memory are changed, an addition -//! is performed, result is stored in A and the variable C is modified twice. +//! is performed, result is stored in `A` and the variable `C` is modified twice. //! When only a single thread is involved, the results are as expected: //! the line `7 4 4` gets printed. //! -//! As for what happens behind the scenes, when an optimizing compiler is used -//! the final generated machine code might look very different from the code: +//! As for what happens behind the scenes, when optimizations are enabled the +//! final generated machine code might look very different from the code: //! -//! - first store to `C` might be moved before the store to `A` or `B`, -//! _as if_ we had written `C = 4; A = 3; B = 4;` +//! - The first store to `C` might be moved before the store to `A` or `B`, +//! _as if_ we had written `C = 4; A = 3; B = 4`. //! -//! - assignment of `A + B` to `A` might be removed, the sum can be stored in a -//! in a register until it gets printed, and the global variable never gets -//! updated. +//! - Assignment of `A + B` to `A` might be removed, since the sum can be stored +//! in a temporary location until it gets printed, with the global variable +//! never getting updated. //! -//! - the final result could be determined just by looking at the code at compile time, -//! so [constant folding] might turn the whole block into a simple `println!("7 4 4")` +//! - The final result could be determined just by looking at the code at compile time, +//! so [constant folding] might turn the whole block into a simple `println!("7 4 4")`. //! //! The compiler is allowed to perform any combination of these optimizations, as long //! as the final optimized code, when executed, produces the same results as the one @@ -77,8 +78,8 @@ //! might hoist memory loads at the top of a code block, so that the CPU can //! start [prefetching] the values from memory. //! -//! In single-threaded scenarios, this can cause issues when writing signal handlers -//! or certain kinds of low-level code. +//! In single-threaded scenarios, this can cause issues when writing +//! signal handlers or certain kinds of low-level code. //! Use [compiler fences] to prevent this reordering. //! //! - **Single processor** executing instructions [out-of-order]: modern CPUs are From bcec6bb525032b48d8d1793854f61892c21fe8af Mon Sep 17 00:00:00 2001 From: Gabriel Majeri Date: Thu, 27 Sep 2018 22:12:09 +0300 Subject: [PATCH 4/6] Fix broken links --- src/libstd/sync/mod.rs | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/src/libstd/sync/mod.rs b/src/libstd/sync/mod.rs index bdb6e49aabc2a..5ba569bf7ce5e 100644 --- a/src/libstd/sync/mod.rs +++ b/src/libstd/sync/mod.rs @@ -98,11 +98,11 @@ //! location doesn't lead to undefined behavior. //! //! [prefetching]: https://en.wikipedia.org/wiki/Cache_prefetching -//! [compiler fences]: atomic::compiler_fence +//! [compiler fences]: crate::sync::atomic::compiler_fence //! [out-of-order]: https://en.wikipedia.org/wiki/Out-of-order_execution //! [superscalar]: https://en.wikipedia.org/wiki/Superscalar_processor -//! [memory fences]: atomic::fence -//! [atomics operations]: atomic +//! [memory fences]: crate::sync::atomic::fence +//! [atomic operations]: crate::sync::atomic //! //! ## Higher-level synchronization objects //! @@ -120,7 +120,7 @@ //! Higher-level synchronization mechanisms are usually heavy-weight. //! While most atomic operations can execute instantaneously, acquiring a //! [`Mutex`] can involve blocking until another thread releases it. -//! For [`RwLock`], while! any number of readers may acquire it without +//! For [`RwLock`], while any number of readers may acquire it without //! blocking, each writer will have exclusive access. //! //! On the other hand, communication over [channels] can provide a fairly @@ -130,7 +130,9 @@ //! The more synchronization exists between CPUs, the smaller the performance //! gains from multithreading will be. //! -//! [channels]: mpsc +//! [`Mutex`]: crate::sync::Mutex +//! [`RwLock`]: crate::sync::RwLock +//! [channels]: crate::sync::mpsc #![stable(feature = "rust1", since = "1.0.0")] From 7e921aa59090096593cb4fa202041c91a5d1e36b Mon Sep 17 00:00:00 2001 From: Gabriel Majeri Date: Fri, 28 Sep 2018 10:59:45 +0300 Subject: [PATCH 5/6] Rewrite section on concurrency --- src/libstd/sync/mod.rs | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/src/libstd/sync/mod.rs b/src/libstd/sync/mod.rs index 5ba569bf7ce5e..edbed430e3866 100644 --- a/src/libstd/sync/mod.rs +++ b/src/libstd/sync/mod.rs @@ -57,16 +57,17 @@ //! as the final optimized code, when executed, produces the same results as the one //! without optimizations. //! -//! When multiprocessing is involved (either multiple CPU cores, or multiple -//! physical CPUs), access to global variables (which are shared between threads) -//! could lead to nondeterministic results, **even if** compiler optimizations -//! are disabled. +//! Due to the [concurrency] involved in modern computers, assumptions about +//! the program's execution order are often wrong. Access to global variables +//! can lead to nondeterministic results, **even if** compiler optimizations +//! are disabled, and it is **still possible** to introduce synchronization bugs. //! //! Note that thanks to Rust's safety guarantees, accessing global (static) //! variables requires `unsafe` code, assuming we don't use any of the //! synchronization primitives in this module. //! //! [constant folding]: https://en.wikipedia.org/wiki/Constant_folding +//! [concurrency]: https://en.wikipedia.org/wiki/Concurrency_(computer_science) //! //! ## Out-of-order execution //! From 6ba55847129e9a35b477e43b7a381ca00fd2a339 Mon Sep 17 00:00:00 2001 From: Gabriel Majeri Date: Fri, 5 Oct 2018 08:50:17 +0300 Subject: [PATCH 6/6] Address review comments --- src/libstd/sync/mod.rs | 110 +++++++++++++++++++++++++---------------- 1 file changed, 67 insertions(+), 43 deletions(-) diff --git a/src/libstd/sync/mod.rs b/src/libstd/sync/mod.rs index edbed430e3866..d69ebc1762272 100644 --- a/src/libstd/sync/mod.rs +++ b/src/libstd/sync/mod.rs @@ -12,11 +12,11 @@ //! //! ## The need for synchronization //! -//! Conceptually, a Rust program is simply a series of operations which will -//! be executed on a computer. The timeline of events happening in the program -//! is consistent with the order of the operations in the code. +//! Conceptually, a Rust program is a series of operations which will +//! be executed on a computer. The timeline of events happening in the +//! program is consistent with the order of the operations in the code. //! -//! Considering the following code, operating on some global static variables: +//! Consider the following code, operating on some global static variables: //! //! ```rust //! static mut A: u32 = 0; @@ -35,8 +35,10 @@ //! } //! ``` //! -//! It appears _as if_ some variables stored in memory are changed, an addition -//! is performed, result is stored in `A` and the variable `C` is modified twice. +//! It appears as if some variables stored in memory are changed, an addition +//! is performed, result is stored in `A` and the variable `C` is +//! modified twice. +//! //! When only a single thread is involved, the results are as expected: //! the line `7 4 4` gets printed. //! @@ -50,17 +52,19 @@ //! in a temporary location until it gets printed, with the global variable //! never getting updated. //! -//! - The final result could be determined just by looking at the code at compile time, -//! so [constant folding] might turn the whole block into a simple `println!("7 4 4")`. +//! - The final result could be determined just by looking at the code +//! at compile time, so [constant folding] might turn the whole +//! block into a simple `println!("7 4 4")`. //! -//! The compiler is allowed to perform any combination of these optimizations, as long -//! as the final optimized code, when executed, produces the same results as the one -//! without optimizations. +//! The compiler is allowed to perform any combination of these +//! optimizations, as long as the final optimized code, when executed, +//! produces the same results as the one without optimizations. //! -//! Due to the [concurrency] involved in modern computers, assumptions about -//! the program's execution order are often wrong. Access to global variables -//! can lead to nondeterministic results, **even if** compiler optimizations -//! are disabled, and it is **still possible** to introduce synchronization bugs. +//! Due to the [concurrency] involved in modern computers, assumptions +//! about the program's execution order are often wrong. Access to +//! global variables can lead to nondeterministic results, **even if** +//! compiler optimizations are disabled, and it is **still possible** +//! to introduce synchronization bugs. //! //! Note that thanks to Rust's safety guarantees, accessing global (static) //! variables requires `unsafe` code, assuming we don't use any of the @@ -74,7 +78,7 @@ //! Instructions can execute in a different order from the one we define, due to //! various reasons: //! -//! - **Compiler** reordering instructions: if the compiler can issue an +//! - The **compiler** reordering instructions: If the compiler can issue an //! instruction at an earlier point, it will try to do so. For example, it //! might hoist memory loads at the top of a code block, so that the CPU can //! start [prefetching] the values from memory. @@ -83,20 +87,20 @@ //! signal handlers or certain kinds of low-level code. //! Use [compiler fences] to prevent this reordering. //! -//! - **Single processor** executing instructions [out-of-order]: modern CPUs are -//! capable of [superscalar] execution, i.e. multiple instructions might be -//! executing at the same time, even though the machine code describes a -//! sequential process. +//! - A **single processor** executing instructions [out-of-order]: +//! Modern CPUs are capable of [superscalar] execution, +//! i.e. multiple instructions might be executing at the same time, +//! even though the machine code describes a sequential process. //! //! This kind of reordering is handled transparently by the CPU. //! -//! - **Multiprocessor** system, where multiple hardware threads run at the same time. -//! In multi-threaded scenarios, you can use two kinds of primitives to deal -//! with synchronization: -//! - [memory fences] to ensure memory accesses are made visibile to other -//! CPUs in the right order. -//! - [atomic operations] to ensure simultaneous access to the same memory -//! location doesn't lead to undefined behavior. +//! - A **multiprocessor** system executing multiple hardware threads +//! at the same time: In multi-threaded scenarios, you can use two +//! kinds of primitives to deal with synchronization: +//! - [memory fences] to ensure memory accesses are made visibile to +//! other CPUs in the right order. +//! - [atomic operations] to ensure simultaneous access to the same +//! memory location doesn't lead to undefined behavior. //! //! [prefetching]: https://en.wikipedia.org/wiki/Cache_prefetching //! [compiler fences]: crate::sync::atomic::compiler_fence @@ -111,29 +115,49 @@ //! inconvenient to use, which is why the standard library also exposes some //! higher-level synchronization objects. //! -//! These abstractions can be built out of lower-level primitives. For efficiency, -//! the sync objects in the standard library are usually implemented with help -//! from the operating system's kernel, which is able to reschedule the threads -//! while they are blocked on acquiring a lock. +//! These abstractions can be built out of lower-level primitives. +//! For efficiency, the sync objects in the standard library are usually +//! implemented with help from the operating system's kernel, which is +//! able to reschedule the threads while they are blocked on acquiring +//! a lock. +//! +//! The following is an overview of the available synchronization +//! objects: +//! +//! - [`Arc`]: Atomically Reference-Counted pointer, which can be used +//! in multithreaded environments to prolong the lifetime of some +//! data until all the threads have finished using it. +//! +//! - [`Barrier`]: Ensures multiple threads will wait for each other +//! to reach a point in the program, before continuing execution all +//! together. +//! +//! - [`Condvar`]: Condition Variable, providing the ability to block +//! a thread while waiting for an event to occur. //! -//! ## Efficiency +//! - [`mpsc`]: Multi-producer, single-consumer queues, used for +//! message-based communication. Can provide a lightweight +//! inter-thread synchronisation mechanism, at the cost of some +//! extra memory. //! -//! Higher-level synchronization mechanisms are usually heavy-weight. -//! While most atomic operations can execute instantaneously, acquiring a -//! [`Mutex`] can involve blocking until another thread releases it. -//! For [`RwLock`], while any number of readers may acquire it without -//! blocking, each writer will have exclusive access. +//! - [`Mutex`]: Mutual Exclusion mechanism, which ensures that at +//! most one thread at a time is able to access some data. //! -//! On the other hand, communication over [channels] can provide a fairly -//! high-level interface without sacrificing performance, at the cost of -//! somewhat more memory. +//! - [`Once`]: Used for thread-safe, one-time initialization of a +//! global variable. //! -//! The more synchronization exists between CPUs, the smaller the performance -//! gains from multithreading will be. +//! - [`RwLock`]: Provides a mutual exclusion mechanism which allows +//! multiple readers at the same time, while allowing only one +//! writer at a time. In some cases, this can be more efficient than +//! a mutex. //! +//! [`Arc`]: crate::sync::Arc +//! [`Barrier`]: crate::sync::Barrier +//! [`Condvar`]: crate::sync::Condvar +//! [`mpsc`]: crate::sync::mpsc //! [`Mutex`]: crate::sync::Mutex +//! [`Once`]: crate::sync::Once //! [`RwLock`]: crate::sync::RwLock -//! [channels]: crate::sync::mpsc #![stable(feature = "rust1", since = "1.0.0")]