Misplaced Pages

ACID

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

In computer science , ACID ( atomicity , consistency , isolation , durability ) is a set of properties of database transactions intended to guarantee data validity despite errors, power failures, and other mishaps. In the context of databases , a sequence of database operations that satisfies the ACID properties (which can be perceived as a single logical operation on the data) is called a transaction . For example, a transfer of funds from one bank account to another, even involving multiple changes such as debiting one account and crediting another, is a single transaction.

#554445

51-403: In 1983, Andreas Reuter and Theo Härder coined the acronym ACID , building on earlier work by Jim Gray who named atomicity, consistency, and durability, but not isolation, when characterizing the transaction concept. These four properties are the major guarantees of the transaction paradigm, which has influenced many aspects of development in database systems . According to Gray and Reuter,

102-413: A Database transaction are visible to all nodes simultaneously. That is, once the transaction has been committed all parties attempting to access the database can see the results of that transaction simultaneously. A good example of the importance of transaction consistency is a database that handles the transfer of money. Suppose a money transfer requires two operations: writing a debit in one place, and

153-481: A distributed database , where no single node is responsible for all data affecting a transaction, presents additional complications. Network connections might fail, or one node might successfully complete its part of the transaction and then be required to roll back its changes because of a failure on another node. The two-phase commit protocol (not to be confused with two-phase locking ) provides atomicity for distributed transactions to ensure that each participant in

204-424: A consequence, the transaction cannot be observed to be in progress by another database client. At one moment in time, it has not yet happened, and at the next, it has already occurred in whole (or nothing happened if the transaction was cancelled in progress). Consistency ensures that a transaction can only bring the database from one consistent state to another, preserving database invariants : any data written to

255-445: A corresponding index entry is added (at the 20% mark). Because the backup is already halfway done and the index already copied, the backup will be written with the article data present, but with the index reference missing. As a result of the inconsistency, this file is considered corrupted. In real life, a real database such as Misplaced Pages's may be edited thousands of times per hour, and references are virtually always spread throughout

306-428: A credit in another. If the system crashes or shuts down when one operation has completed but the other has not, and there is nothing in place to correct this, the system can be said to lack transaction consistency. With a money transfer, it is desirable that either the entire transaction completes, or none of it completes. Both of these scenarios keep the balance in check. Transaction consistency ensures just that - that

357-896: A postdoc at the IBM Research Center in San José. In 1985 he was appointed professor at the University of Stuttgart , where he became the founding director of the Institute for Parallel and Distributed High-Performance Computer Systems in 1988. From 1992 until 1996 he was Vice-President for Academic Affairs at the University of Stuttgart. In 1996 he declined an offered position as director at the Max Planck Institute for Computer Science in Saarbrücken . Instead, from 1997 on he co-founded and developed

408-411: A result. If the power gets shut off after element 4 has been written, the battery backed memory contains the record of commitment for the other three items and ensures that they are written ("flushed") to the disk at the next available opportunity. Consistency (database systems) in the realm of Distributed database systems refers to the property of many ACID databases to ensure that the results of

459-424: A row in one table whose primary key is referred to by at least one foreign key in other tables. To demonstrate isolation, we assume two transactions execute at the same time, each attempting to modify the same data. One of the two must wait until the other completes in order to maintain isolation. Consider two transactions: Combined, there are four actions: If these operations are performed in order, isolation

510-446: A small battery back-up unit on their cache memory so that they may offer the performance gains of write caching while mitigating the risk of unintended shutdowns. The battery back-up unit keeps the memory powered even during a shutdown so that when the computer is powered back up, it can quickly complete any writes it has previously committed. With such a controller, the operating system may request four writes (1-2-3-4) in that order, but

561-581: A system is programmed to be able to detect incomplete transactions when powered on, and undo (or "roll back") the portion of any incomplete transactions that are found. Application consistency , similar to transaction consistency, is applied on a grander scale. Instead of having the scope of a single transaction, data must be consistent within the confines of many different transaction streams from one or more applications. An application may be made up of many different types of data, various types of files and data feeds from other applications. Application consistency

SECTION 10

#1732848931555

612-417: A table at the same time). Isolation ensures that concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially. Isolation is the main goal of concurrency control ; depending on the isolation level used, the effects of an incomplete transaction might not be visible to other transactions. Durability guarantees that once

663-450: A transaction has been committed, it will remain committed even in the case of a system failure (e.g., power outage or crash ). This usually means that completed transactions (or their effects) are recorded in non-volatile memory . The following examples further illustrate the ACID properties. In these examples, the database table has two columns, A and B. An integrity constraint requires that

714-406: Is a monetary transfer from bank account A to account B. It consists of two operations, withdrawing the money from account A and depositing it to account B. We would not want to see the amount removed from account A before we are sure it has also been transferred into account B. Performing these operations in an atomic transaction ensures that the database remains in a consistent state , that is, money

765-636: Is among other positions an External Scientific Member of the Max Planck Institute for Computer Science (MPII) in Saarbrücken and of the Board of Trustees of the Max Planck Institute for Astronomy (MPIA) in Heidelberg. He was elected as an ACM Fellow in 2019 "for contributions to database concurrency control and for service to the community". Data consistency Data inconsistency refers to whether

816-527: Is checked after each transaction, it is known that A + B = 100 before the transaction begins. If the transaction removes 10 from A successfully, atomicity will be achieved. However, a validation check will show that A + B = 90 , which is inconsistent with the rules of the database. The entire transaction must be canceled and the affected rows rolled back to their pre-transaction state. If there had been other constraints, triggers, or cascades, every single change operation would have been checked in

867-450: Is maintained, although T 2 must wait. Consider what happens if T 1 fails halfway through. The database eliminates T 1 's effects, and T 2 sees only valid data. By interleaving the transactions, the actual order of actions might be: Again, consider what happens if T 1 fails while modifying B in Step 4. By the time T 1 fails, T 2 has already modified A; it cannot be restored to

918-428: Is missing (confirming success), that the save operation was unsuccessful and so it should undo any incomplete steps already taken to save it (e.g. marking sector 123 free since it never was properly filled, and removing any record of XYZ from the file directory). It relies on these items being committed to disk in sequential order. Suppose a caching algorithm determines it would be fastest to write these items to disk in

969-419: Is neither debited nor credited if either of those two operations fails. Consistency is a very general term, which demands that the data must meet all validation rules. In the previous example, the validation is a requirement that A + B = 100 . All validation rules must be checked to ensure consistency. Assume that a transaction attempts to subtract 10 from A without altering B . Because consistency

1020-586: Is running a transaction that has to read a row of data that user B wants to modify, user B must wait until user A's transaction completes. Two-phase locking is often applied to guarantee full isolation. An alternative to locking is multiversion concurrency control , in which the database provides each reading transaction the prior, unmodified version of data that is being modified by another active transaction. This allows readers to operate without acquiring locks, i.e., writing transactions do not block reading transactions, and readers do not block writers. Going back to

1071-401: Is what will show if the file is opened). Further, the file system's free space map will not contain any entry showing that sector 123 is occupied, so later, it will likely assign that sector to the next file to be saved, believing it is available. The file system will then have two files both unexpectedly claiming the same sector (known as a cross-linked file ). As a result, a write to one of

SECTION 20

#1732848931555

1122-526: The Heidelberg Institute for Theoretical Studies (HITS gGmbH) from 2010 until 2016. In October 2015, he was appointed Senior Professor at the University of Heidelberg . During his school time he volunteered in the company founded by Konrad Zuse in Bad Hersfeld . After graduating in 1968, he worked as a freelance programmer for companies and authorities. From 1973 on, he studied computer science at

1173-679: The Heidelberg Institute for Theoretical Studies , whose managing director he was until April 2016. In 2007, Andreas Reuter accepted the endowed chair for “Dependable Systems”, supported by the Klaus Tschira Foundation , at the University of Kaiserslautern. In 2011 he transferred to Heidelberg University . There he held an endowed chair for “Distributed Systems”, also supported by the Klaus Tschira Foundation. The Technical University of Donetsk (Ukraine) awarded him an honorary doctorate in 1994. Andreas Reuter’s research focuses on

1224-498: The IBM Information Management System supported ACID transactions as early as 1973 (although the acronym was created later). The characteristics of these four properties as defined by Reuter and Härder are as follows: Transactions are often composed of multiple statements . Atomicity guarantees that each transaction is treated as a single "unit", which either succeeds completely or fails completely: if any of

1275-404: The 75% mark. Consider a scenario where an editor comes and creates a new article at the same time a backup is being performed, which is being made as a simple " file copy " which copies from the beginning to the end of the large file(s) and doesn't consider data consistency - and at the time of the article edit, it is 50% complete. The new article is added to the article space (at the 75% mark) and

1326-655: The Technical University of Munich and at the Department of Computer Science of the Technische Universität Darmstadt . He completed his studies in Darmstadt with a diplom in 1978. As a researcher he received his doctorate degree (Dr.-Ing.) under Theo Härder and Hartmut Wedekind in 1981. He worked as an assistant professor at the University of Kaiserslautern from 1981 to 1983. In 1983 he was employed as

1377-422: The controller may decide the quickest way to write them is 4-3-1-2. The controller essentially lies to the operating system and reports that the writes have been completed in order (a lie that improves performance at the expense of data corruption if power is lost), and the battery backup hedges against the risk of data corruption by giving the controller a way to silently fix any and all damage that could occur as

1428-442: The database from occurring only partially, which can cause greater problems than rejecting the whole series outright. In other words, atomicity means indivisibility and irreducibility. Alternatively, we may say that a logical transaction may be composed of several physical transactions. Unless and until all component physical transactions are executed, the logical transaction will not have occurred. An example of an atomic transaction

1479-424: The database must be valid according to all defined rules, including constraints , cascades , triggers , and any combination thereof. This prevents database corruption by an illegal transaction. An example of a database invariant is referential integrity , which guarantees the primary key – foreign key relationship. Transactions are often executed concurrently (e.g., multiple transactions reading and writing to

1530-422: The database subsystem to quickly find search results. If the data structures cease to reference each other properly, then the database can be said to be corrupted . The importance of point-in-time consistency can be illustrated with what would happen if a backup were made without it. Assume Misplaced Pages's database is a huge file, which has an important index located 20% of the way through, and saves article data at

1581-400: The entire database looked at a single moment. In the given Misplaced Pages example, it would ensure that the backup was written without the added article at the 75% mark, so that the article data would be consistent with the index data previously written. Point-in-time consistency is also relevant to computer disk subsystems. Specifically, operating systems and file systems are designed with

ACID - Misplaced Pages Continue

1632-417: The example, when user A's transaction requests data that user B is modifying, the database provides A with the version of that data that existed when user B started his transaction. User A gets a consistent view of the database even if other users are changing data. One implementation, namely snapshot isolation , relaxes the isolation property. Guaranteeing ACID properties in a distributed transaction across

1683-432: The expectation that the computer system they are running on could lose power, crash, fail, or otherwise cease operating at any time. When properly designed, they ensure that data will not be unrecoverably corrupted if the power is lost. Operating systems and file systems do this by ensuring that data is written to a hard disk in a certain order, and rely on that in order to detect and recover from unexpected shutdowns . On

1734-462: The field of databases, transaction systems, and parallel and distributed computer systems. Together with the Turing Award laureate James "Jim" Gray he published the book "Transaction Processing: Concepts and Techniques" in 1992, which became a standard reference work for researchers and developers around the world and was translated among others into Chinese and Japanese. He developed a definition of

1785-410: The file and can number into the millions, billions, or more. A sequential "copy" backup would literally contain so many small corruptions that the backup would be completely unusable without a lengthy repair process which could provide no guarantee as to the completeness of what has been recovered. A backup process which properly accounts for data consistency ensures that the backup is a snapshot of how

1836-400: The files will overwrite part of the other file, invisibly damaging it. A disk caching subsystem that ensures point-in-time consistency guarantees that in the event of an unexpected shutdown, the four elements would be written one of only five possible ways: completely (1-2-3-4), partially (1, 1-2, 1-2-3), or not at all. High-end hardware disk controllers of the type found in servers include

1887-498: The level of isolation, possibly on all data that may be read as well. In write ahead logging, durability is guaranteed by writing the prospective change to a persistent log before changing the database. That allows the database to return to a consistent state in the event of a crash. In shadowing, updates are applied to a partial copy of the database, and the new copy is activated when the transaction commits. Many databases rely upon locking to provide ACID capabilities. Locking means that

1938-549: The online encyclopedia Misplaced Pages , which needs to be operational around the clock, but also must be backed up with regularity to protect against disaster. Portions of Misplaced Pages are constantly being updated every minute of every day, meanwhile, Misplaced Pages's database is stored on servers in the form of one or several very large files which require minutes or hours to back up. These large files—as with any database—contain numerous data structures which reference each other by location. For example, some structures are indexes which permit

1989-425: The order 4-3-1-2, and starts doing so, but the power gets shut down after 4 get written, before 3, 1 and 2, and so those writes never occur. When the computer is turned back on, the file system would then show it contains a file named XYZ which is located in sector 123, but this sector really does not contain the file. (Instead, the sector will contain garbage, or zeroes, or a random portion of some old file - and that

2040-415: The other hand, rigorously writing data to disk in the order that maximizes data integrity also impacts performance. A process of write caching is used to consolidate and re-sequence write operations such that they can be done faster by minimizing the time spent moving disk heads. Data consistency concerns arise when write caching changes the sequence in which writes are carried out, because it there exists

2091-434: The possibility of an unexpected shutdown that violates the operating system's expectation that all writes will be committed sequentially. For example, in order to save a typical document or picture file, an operating system might write the following records to a disk in the following order: The operating system relies on the assumption that if it sees item #1 is present (saying the file is about to be saved), but that item #4

ACID - Misplaced Pages Continue

2142-555: The private "International University in Germany", where he worked as dean and vice-president until 2004. On January 1, 1998 he was appointed scientific and managing director of the EML European Media Laboratory GmbH, which Klaus Tschira had founded in 1997. Together with Klaus Tschira, he was essentially involved in building up the company and its affiliate, EML Research gGmbH (from 2003 on). In 2010 EML Research became

2193-400: The same data kept at different places do or do not match. Point-in-time consistency is an important property of backup files and a critical objective of software that creates backups. It is also relevant to the design of disk memory systems, specifically relating to what happens when they are unexpectedly shut down. As a relevant backup example, consider a website with a database such as

2244-478: The same way as above before the transaction was committed. Similar issues may arise with other constraints. We may have required the data types of both A and B to be integers. If we were then to enter, say, the value 13.5 for A , the transaction will be canceled, or the system may give rise to an alert in the form of a trigger (if/when the trigger has been written to this effect). Another example would be integrity constraints, which would not allow us to delete

2295-410: The statements constituting a transaction fails to complete, the entire transaction fails and the database is left unchanged. An atomic system must guarantee atomicity in each and every situation, including power failures, errors, and crashes. A guarantee of atomicity prevents updates to the database from occurring only partially, which can cause greater problems than rejecting the whole series outright. As

2346-666: The transaction agrees on whether the transaction should be committed or not. Briefly, in the first phase, one node (the coordinator) interrogates the other nodes (the participants), and only when all reply that they are prepared does the coordinator, in the second phase, formalize the transaction. Andreas Reuter Andreas Reuter (born October 31, 1949) is a German computer science professor and research manager. His research focuses on databases, transaction systems, and parallel and distributed computer systems. Reuter has been scientific and executive director of EML European Media Laboratory GmbH and gGmbH since 1998 and Managing Director of

2397-477: The transaction marks the data that it accesses so that the DBMS knows not to allow other transactions to modify it until the first transaction succeeds or fails. The lock must always be acquired before processing data, including data that is read but not modified. Non-trivial transactions typically require a large number of locks, resulting in substantial overhead as well as blocking other transactions. For example, if user A

2448-439: The transactional processing model in (distributed) databases along with Theo Härder, a model which to this day is often quoted by his acronym ACID ( atomicity, consistency, isolation, durability ). In addition to his research Andreas Reuter conducted numerous consulting projects and held lectures on many topics in both the university and industrial sectors. He is involved in numerous advisory boards both in academia and industry and

2499-654: The user is told the transaction was a success. However, the changes are still queued in the disk buffer waiting to be committed to disk. Power fails and the changes are lost, but the user assumes (understandably) that the changes persist. Processing a transaction often requires a sequence of operations that is subject to failure for a number of reasons. For instance, the system may have no room left on its disk drives, or it may have used up its allocated CPU time. There are two popular families of techniques: write-ahead logging and shadow paging . In both cases, locks must be acquired on all information to be updated, and depending on

2550-476: The value in A and the value in B must sum to 100. The following SQL code creates a table as described above: Atomicity is the guarantee that series of database operations in an atomic transaction will either all occur (a successful operation), or none will occur (an unsuccessful operation). The series of operations cannot be separated with only some of them being executed, which makes the series of operations "indivisible". A guarantee of atomicity prevents updates to

2601-508: The value it had before T 1 without leaving an invalid database. This is known as a write-write contention , because two transactions attempted to write to the same data field. In a typical system, the problem would be resolved by reverting to the last known good state, canceling the failed transaction T 1 , and restarting the interrupted transaction T 2 from the good state. Consider a transaction that transfers 10 from A to B. First, it removes 10 from A, then it adds 10 to B. At this point,

SECTION 50

#1732848931555
#554445