RCU - Misplaced Pages

In computer science , read-copy-update ( RCU ) is a synchronization mechanism that avoids the use of lock primitives while multiple threads concurrently read and update elements that are linked through pointers and that belong to shared data structures (e.g., linked lists , trees , hash tables ).

#916083

58-507: RCU may refer to: Science and technology [ edit ] Read-copy-update , a computer operating system synchronization mechanism Remote concentrator unit in telephony Organocopper complexes (RCu), in reactions of organocopper reagents Organizations [ edit ] Radio Club Uruguayo Rogue Credit Union , a federal credit union in Medford, Oregon Royal College Union ,

116-404: A NULL pointer or a valid pointer to the new structure, but not some mash-up of the two values. Additional properties of rcu_assign_pointer are described later in this article. This procedure demonstrates how new data may be inserted into a linked data structure even though readers are concurrently traversing the data structure before, during, and after the insertion. The second diagram on

174-399: A funnel or serializing tokens can avoid the biggest problem: deadlocks. Alternatives to locking include non-blocking synchronization methods, like lock-free programming techniques and transactional memory . However, such alternative methods often require that the actual lock mechanisms be implemented at a more fundamental level of the operating software. Therefore, they may only relieve

232-431: A Relativistic Program" PSU Tech Report TR-09-04 ( http://www.pdx.edu/sites/www.pdx.edu.computer-science/files/tr0904.pdf ) Locking (computer science) In computer science , a lock or mutex (from mutual exclusion ) is a synchronization primitive that prevents state from being modified or accessed by multiple threads of execution at once. Locks enforce mutual exclusion concurrency control policies, and with

290-604: A closely related technique. RCU is also the topic of one claim in the SCO v. IBM lawsuit . RCU is available in a number of operating systems and was added to the Linux kernel in October 2002. User-level implementations such as liburcu are also available. The implementation of RCU in version 2.6 of the Linux kernel is among the better-known RCU implementations and will be used as an inspiration for

348-473: A context switch, then all preceding RCU read-side critical sections will have completed. Although RCU can be used in many different ways, a very common use of RCU is analogous to reader–writer locking. The following side-by-side code display shows how closely related reader–writer locking and RCU can be. The differences between the two approaches are quite small. Read-side locking moves to rcu_read_lock and rcu_read_unlock , update-side locking moves from

406-461: A field, a field, a record, a data page, or an entire table. Coarse granularity, such as using table locks, tends to give the best performance for a single user, whereas fine granularity, such as record locks, tends to give the best performance for multiple users. Database locks can be used as a means of ensuring transaction synchronicity. i.e. when making transaction processing concurrent (interleaving transactions), using 2-phased locks ensures that

464-608: A first lock blocks another lock, the two locks are called incompatible ; otherwise the locks are compatible . Often, lock types blocking interactions are presented in the technical literature by a Lock compatibility table . The following is an example with the common, major lock types: Comment: In some publications, the table entries are simply marked "compatible" or "incompatible", or respectively "yes" or "no". Lock-based resource protection and thread/process synchronization have many disadvantages: Some concurrency control strategies avoid some or all of these problems. For example,

522-413: A given grace period must complete before the end of that grace period, which constitutes the fundamental guarantee provided by RCU. In addition, the wait-for-readers operation must wait for at least one grace period to elapse. It turns out that this guarantee can be provided with extremely small read-side overheads, in fact, in the limiting case that is actually realized by server-class Linux-kernel builds,

580-430: A linked data structure even though readers are concurrently traversing the data structure before, during, and after the deletion. Given insertion and deletion, a wide variety of data structures can be implemented using RCU. RCU's readers execute within read-side critical sections , which are normally delimited by rcu_read_lock and rcu_read_unlock . Any statement that is not within an RCU read-side critical section

638-741: A number of problems, including management of metadata used in dynamic analysis, managing the lifetime of clustered objects, managing object lifetime in the K42 research operating system, and optimizing software transactional memory implementations. Dragonfly BSD uses a technique similar to RCU that most closely resembles Linux's Sleepable RCU (SRCU) implementation. The ability to wait until all readers are done allows RCU readers to use much lighter-weight synchronization—in some cases, absolutely no synchronization at all. In contrast, in more conventional lock-based schemes, readers must use heavy-weight synchronization in order to prevent an updater from deleting

SECTION 10

#1732869439917

696-444: A reader-writer lock to a simple spinlock, and a synchronize_rcu precedes the kfree . However, there is one potential catch: the read-side and update-side critical sections can now run concurrently. In many cases, this will not be a problem, but it is necessary to check carefully regardless. For example, if multiple independent list updates must be seen as a single atomic update, converting to RCU will require special care. Also,

754-425: A realtime process to miss its scheduling deadline, precipitate priority inversion , or result in high lock contention . However, in this toy RCU implementation, blocking within an RCU read-side critical section is illegal, just as is blocking while holding a pure spinlock. The implementation of synchronize_rcu moves the caller of synchronize_cpu to each CPU, thus blocking until all CPUs have been able to perform

812-424: A reference. A wait-for-readers operation transitions to the third state. Note that this wait-for-readers operation need only wait for pre-existing readers, but not new readers. Element B is now colored green to indicate that readers can no longer be referencing it. Therefore, it is now safe for the updater to free element B , thus transitioning to the fourth and final state. It is important to reiterate that in

870-404: A single process is accessing the protected data, but worse performance when multiple processes are running concurrently. This is because of increased lock contention . The more coarse the lock, the higher the likelihood that the lock will stop an unrelated process from proceeding. Conversely, using a fine granularity (a larger number of locks, each protecting a fairly small amount of data) increases

928-640: A specifically defined "cascade" order.) Some languages do support locks syntactically. An example in C# follows: C# introduced System.Threading.Lock in C# 13 on .NET 9. The code lock(this) can lead to problems if the instance can be accessed publicly. Similar to Java , C# can also synchronize entire methods, by using the MethodImplOptions.Synchronized attribute. Before being introduced to lock granularity, one needs to understand three concepts about locks: There

986-438: A variety of possible methods there exist multiple unique implementations for different applications. Generally, locks are advisory locks , where each thread cooperates by acquiring the lock before accessing the corresponding data. Some systems also implement mandatory locks , where attempting unauthorized access to a locked resource will force an exception in the entity attempting to make the access. The simplest type of lock

1044-426: Is read concurrently with a thread copying in order to do an update , hence the name "read-copy update". The abbreviation "RCU" was one of many contributions by the Linux community. Other names for similar techniques include passive serialization and MP defer by VM/XA programmers and generations by K42 and Tornado programmers. A key property of RCU is that readers can access a data structure even when it

1102-409: Is a binary semaphore . It provides exclusive access to the locked data. Other schemes also provide shared access for reading data. Other widely implemented access modes are exclusive, intend-to-exclude and intend-to-upgrade. Another way to classify locks is by what happens when the lock strategy prevents the progress of a thread. Most locking designs block the execution of the thread requesting

1160-427: Is a specialized technique that works best in situations with mostly reads and few updates but is often less applicable to update-only workloads. For another example, although the fact that RCU readers and updaters may execute concurrently is what enables the lightweight nature of RCU's read-side primitives, some algorithms may not be amenable to read/update concurrency. Despite well over a decade of experience with RCU,

1218-410: Is a tradeoff between decreasing lock overhead and decreasing lock contention when choosing the number of locks in synchronization. An important property of a lock is its granularity . The granularity is a measure of the amount of data the lock is protecting. In general, choosing a coarse granularity (a small number of locks, each protecting a large segment of data) results in less lock overhead when

SECTION 20

#1732869439917

1276-409: Is crucial and is an example of space–time tradeoff , enabling fast operations at the cost of more space. This makes all readers proceed as if there were no synchronization involved, hence they will be fast, but also making updates more difficult. The name comes from the way that RCU is used to update a linked structure in place. A thread wishing to do this uses the following steps: So the structure

1334-418: Is different from Wikidata All article disambiguation pages All disambiguation pages Read-copy-update Whenever a thread is inserting or deleting elements of data structures in shared memory , all readers are guaranteed to see and traverse either the older or the new structure, therefore avoiding inconsistencies (e.g., dereferencing null pointers ). It is used when performance of reads

1392-416: Is in the process of being updated: RCU updaters cannot block readers or force them to retry their accesses. This overview starts by showing how data can be safely inserted into and deleted from linked structures despite concurrent readers. The first diagram on the right depicts a four-state insertion procedure, with time advancing from left to right. The first state shows a global pointer named gptr that

1450-538: Is initially NULL , colored red to indicate that it might be accessed by a reader at any time, thus requiring updaters to take care. Allocating memory for a new structure transitions to the second state. This structure has indeterminate state (indicated by the question marks) but is inaccessible to readers (indicated by the green color). Because the structure is inaccessible to readers, the updater may carry out any desired operation without fear of disrupting concurrent readers. Initializing this new structure transitions to

1508-460: Is left intact in order to allow readers currently referencing element B to traverse the remainder of the list. Readers accessing the link from element A will either obtain a reference to element B or element C , but either way, each reader will see a valid and correctly formatted linked list. Element B is now colored yellow to indicate that while pre-existing readers might still have a reference to element B , new readers have no way to obtain

1566-503: Is often helpful for an entirely different thread to do the reclamation. Reference counting can be used to let the reader perform removal so, even if the same thread performs both the update step (step (2) above) and the reclamation step (step (4) above), it is often helpful to think of them separately. RCU is perhaps the most common non-blocking algorithm for a shared data structure. RCU is completely wait-free for any number of readers. Single-writer implementations RCU are also lock-free for

1624-409: Is required is because of concurrency, where more than one task executes the same logic. For example, consider the following C code: The above example does not guarantee that the task has the lock, since more than one task can be testing the lock at the same time. Since both tasks will detect that the lock is free, both tasks will attempt to set the lock, not knowing that the other task is also setting

1682-413: Is said to be in a quiescent state , and such statements are not permitted to hold references to RCU-protected data structures, nor is the wait-for-readers operation required to wait for threads in quiescent states. Any time period during which each thread resides at least once in a quiescent state is called a grace period . By definition, any RCU read-side critical section in existence at the beginning of

1740-422: Is the great strength of classic RCU in a non-preemptive kernel: read-side overhead is precisely zero, as smp_read_barrier_depends() is an empty macro on all but DEC Alpha CPUs; such memory barriers are not needed on modern CPUs. The ACCESS_ONCE() macro is a volatile cast that generates no additional code in most cases. And there is no way that rcu_read_lock can participate in a deadlock cycle, cause

1798-436: Is the semantics of modern CPUs guarantee that readers will see either the old or the new version of the data structure rather than a partially updated reference. Once a grace period has elapsed, there can no longer be any readers referencing the old version, so it is then safe for the reclamation phase to free ( reclaim ) the data items that made up that old version. Splitting an update into removal and reclamation phases allows

RCU - Misplaced Pages Continue

1856-462: The application level from the details of implementing locks, with the problems listed above still needing to be dealt with beneath the application. In most cases, proper locking depends on the CPU providing a method of atomic instruction stream synchronization (for example, the addition or deletion of an item into a pipeline requires that all contemporaneous operations needing to add or delete other items in

1914-576: The RCU API in the remainder of this article. The core API ( Application Programming Interface ) is quite small: The diagram on the right shows how each API communicates among the reader, updater, and reclaimer. The RCU infrastructure observes the time sequence of rcu_read_lock , rcu_read_unlock , synchronize_rcu , and call_rcu invocations in order to determine when (1) synchronize_rcu invocations may return to their callers and (2) call_rcu callbacks may be invoked. Efficient implementations of

1972-656: The RCU infrastructure make heavy use of batching in order to amortize their overhead over many uses of the corresponding APIs. RCU has extremely simple "toy" implementations that can aid understanding of RCU. This section presents one such "toy" implementation that works in a non-preemptive environment . In the code sample, rcu_assign_pointer and rcu_dereference can be ignored without missing much. However, they are needed in order to suppress harmful compiler optimization and to prevent CPUs from reordering accesses. Note that rcu_read_lock and rcu_read_unlock do nothing. This

2030-722: The alumni association of Royal College Colombo, Sri Lanka Regional Cadet Units of the Australian Army Cadets Regional Coordinating Unit in the Northwest Pacific Action Plan Regional Crime Unit of the Hong Kong Police Force Other uses [ edit ] Rocket City United , an American soccer team RC Unterföhring , a German rugby union club Las Higueras Airport (IATA code), Argentina A remote control unit Topics referred to by

2088-402: The concurrent execution of the transaction turns out equivalent to some serial ordering of the transaction. However, deadlocks become an unfortunate side-effect of locking in databases. Deadlocks are either prevented by pre-determining the locking order between transactions or are detected using waits-for graphs . An alternate to locking for database synchronicity while avoiding deadlocks involves

2146-434: The context switch. Recall that this is a non-preemptive environment and that blocking within an RCU read-side critical section is illegal, which imply that there can be no preemption points within an RCU read-side critical section. Therefore, if a given CPU executes a context switch (to schedule another process), we know that this CPU must have completed all preceding RCU read-side critical sections. Once all CPUs have executed

2204-441: The data structure out from under them. The reason is that lock-based updaters typically update data in place and must therefore exclude readers. In contrast, RCU-based updaters typically take advantage of the fact that writes to single aligned pointers are atomic on modern CPUs, allowing atomic insertion, removal, and replacement of data in a linked structure without disrupting readers. Concurrent RCU readers can then continue accessing

2262-496: The exact extent of its applicability is still a research topic. The technique is covered by U.S. software patent U.S. patent 5,442,758 , issued August 15, 1995, and assigned to Sequent Computer Systems , as well as by U.S. patent 5,608,893 (expired 2009-03-30), U.S. patent 5,727,209 (expired 2010-04-05), U.S. patent 6,219,690 (expired 2009-05-18), and U.S. patent 6,886,162 (expired 2009-05-25). The now-expired US Patent U.S. patent 4,809,168 covers

2320-417: The lock until it is allowed to access the locked resource. With a spinlock , the thread simply waits ("spins") until the lock becomes available. This is efficient if threads are blocked for a short time, because it avoids the overhead of operating system process re-scheduling. It is inefficient if the lock is held for a long time, or if the progress of the thread that is holding the lock depends on preemption of

2378-450: The lock. Dekker's or Peterson's algorithm are possible substitutes if atomic locking operations are not available. Careless use of locks can result in deadlock or livelock . A number of strategies can be used to avoid or recover from deadlocks or livelocks, both at design-time and at run-time . (The most common strategy is to standardize the lock acquisition sequences so that combinations of inter-dependent locks are always acquired in

RCU - Misplaced Pages Continue

2436-400: The locked thread. Locks typically require hardware support for efficient implementation. This support usually takes the form of one or more atomic instructions such as " test-and-set ", " fetch-and-add " or " compare-and-swap ". These instructions allow a single process to test if the lock is free, and if free, acquire the lock in a single atomic operation. Uniprocessor architectures have

2494-406: The modules or at least knowing about their internals. Simon Peyton Jones (an advocate of software transactional memory ) gives the following example of a banking application: design a class Account that allows multiple concurrent clients to deposit or withdraw money to an account, and give an algorithm to transfer money from one account to another. The lock-based solution to the first part of

2552-503: The old versions and can dispense with the atomic read-modify-write instructions, memory barriers, and cache misses that are so expensive on modern SMP computer systems, even in absence of lock contention. The lightweight nature of RCU's read-side primitives provides additional advantages beyond excellent performance, scalability, and real-time response. For example, they provide immunity to most deadlock and livelock conditions. Of course, RCU also has disadvantages. For example, RCU

2610-418: The option of using uninterruptible sequences of instructions—using special instructions or instruction prefixes to disable interrupts temporarily—but this technique does not work for multiprocessor shared-memory machines. Proper support for locks in a multiprocessor environment can require quite complex hardware or software support, with substantial synchronization issues. The reason an atomic operation

2668-404: The overhead of the locks themselves but reduces lock contention. Granular locking where each process must hold multiple locks from a common set of locks can create subtle lock dependencies. This subtlety can increase the chance that a programmer will unknowingly introduce a deadlock . In a database management system , for example, a lock could protect, in order of decreasing granularity, part of

2726-504: The pipe be suspended during the manipulation of the memory content required to add or delete the specific item). Therefore, an application can often be more robust when it recognizes the burdens it places upon an operating system and is capable of graciously recognizing the reporting of impossible demands. One of lock-based programming's biggest problems is that "locks don't compose ": it is hard to combine small, correct lock-based modules into equally correct larger programs without modifying

2784-419: The presence of synchronize_rcu means that the RCU version of delete can now block. If this is a problem, call_rcu could be used like call_rcu (kfree, p) in place of synchronize_rcu . This is especially useful in combination with reference counting. Techniques and mechanisms resembling RCU have been independently invented multiple times: Bauer, R.T., (June 2009), "Operational Verification of

2842-424: The problem is: The second part of the problem is much more complicated. A transfer routine that is correct for sequential programs would be In a concurrent program, this algorithm is incorrect because when one thread is halfway through transfer , another might observe a state where amount has been withdrawn from the first account, but not yet deposited into the other account: money has gone missing from

2900-445: The read-side overhead is exactly zero. RCU's fundamental guarantee may be used by splitting updates into removal and reclamation phases. The removal phase removes references to data items within a data structure (possibly by replacing them with references to new versions of these data items) and can run concurrently with RCU read-side critical sections. The reason that it is safe to run the removal phase concurrently with RCU readers

2958-415: The right depicts a four-state deletion procedure, again with time advancing from left to right. The first state shows a linked list containing elements A , B , and C . All three elements are colored red to indicate that an RCU reader might reference any of them at any time. Using list_del_rcu to remove element B from this list transitions to the second state. Note that the link from element B to C

SECTION 50

#1732869439917

3016-403: The same term [REDACTED] This disambiguation page lists articles associated with the title RCU . If an internal link led you here, you may wish to change the link to point directly to the intended article. Retrieved from " https://en.wikipedia.org/w/index.php?title=RCU&oldid=1032286127 " Category : Disambiguation pages Hidden categories: Short description

3074-483: The second state different readers can see two different versions of the list, either with or without element B . In other words, RCU provides coordination in space (different versions of the list) as well as in time (different states in the deletion procedures). This is in stark contrast with more traditional synchronization primitives such as locking or transactions that coordinate in time, but not in space. This procedure demonstrates how old data may be removed from

3132-462: The system. This problem can only be fixed completely by putting locks on both accounts prior to changing either one, but then the locks have to be placed according to some arbitrary, global ordering to prevent deadlock: This solution gets more complicated when more locks are involved, and the transfer function needs to know about all of the locks, so they cannot be hidden . Programming languages vary in their support for synchronization: A mutex

3190-434: The third state, which shows the initialized values of the structure's fields. Assigning a reference to this new structure to gptr transitions to the fourth and final state. In this state, the structure is accessible to readers, and is therefore colored red. The rcu_assign_pointer primitive is used to carry out this assignment and ensures that the assignment is atomic in the sense that concurrent readers will either see

3248-417: The updater to perform the removal phase immediately, and to defer the reclamation phase until all readers active during the removal phase have completed, in other words, until a grace period has elapsed. So, the typical RCU update sequence goes something like the following: In the above procedure (which matches the earlier diagram), the updater is performing both the removal and the reclamation step, but it

3306-399: The use of totally ordered global timestamps. There are mechanisms employed to manage the actions of multiple concurrent users on a database—the purpose is to prevent lost updates and dirty reads. The two types of locking are pessimistic locking and optimistic locking : Several variations and refinements of these major lock types exist, with respective variations of blocking behavior. If

3364-431: The writer. Some multi-writer implementations of RCU are lock-free. Other multi-writer implementations of RCU serialize writers with a lock. By early 2008, there were almost 2,000 uses of the RCU API within the Linux kernel including the networking protocol stacks and the memory-management system. As of March 2014 , there were more than 9,000 uses. Since 2006, researchers have applied RCU and similar techniques to

#916083