Misplaced Pages

InnoDB

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

A database engine (or storage engine ) is the underlying software component that a database management system (DBMS) uses to create, read, update and delete (CRUD) data from a database . Most database management systems include their own application programming interface (API) that allows the user to interact with their underlying engine without going through the user interface of the DBMS.

#310689

55-395: InnoDB is a storage engine for the database management system MySQL and MariaDB . Since the release of MySQL 5.5.5 in 2010, it replaced MyISAM as MySQL's default table type. It provides the standard ACID -compliant transaction features, along with foreign key support ( declarative referential integrity ). It is included as standard in most binaries distributed by MySQL AB ,

110-439: A backup rotation scheme , which is a system of backing up data to computer media that limits the number of backups of different dates retained separately, by appropriate re-use of the data storage media by overwriting of backups no longer needed. The scheme determines how and when each piece of removable storage is used for a backup operation and how long it is retained once it has backup data stored on it. The 3-2-1 rule can aid in

165-419: A disk array (maybe connected to SAN ) is an example of an online backup. This type of storage is convenient and speedy, but is vulnerable to being deleted or overwritten, either by accident, by malevolent action, or in the wake of a data-deleting virus payload. Nearline storage is typically less accessible and less expensive than online storage, but still useful for backup data storage. A mechanical device

220-435: A linear address space where every bit of data has a unique address. In practice, only a very small percentage of addresses are kept as initial reference points, which also require storage. Most data is accessed instead by indirection using displacement calculations (distance in bits from the reference points) and data structures which define access paths (using pointers) to all needed data in an effective manner, optimized for

275-418: A balance between accessibility, security and cost. These media management methods are not mutually exclusive and are frequently combined to meet the user's needs. Using on-line disks for staging data before it is sent to a near-line tape library is a common example. Online backup storage is typically the most accessible type of data storage, and can begin a restore in milliseconds. An internal hard disk or

330-404: A cable. Because the data is not accessible via any computer except during limited periods in which they are written or read back, they are largely immune to on-line backup failure modes. Access time varies depending on whether the media are on-site or off-site. Backup media may be sent to an off-site vault to protect against a disaster or other site-specific problem. The vault can be as simple as

385-635: A computer bus , which is usually a volatile storage component. Computer memory communicates data to and from external storage, typically through standard storage interfaces or networks (e.g., fibre channel , iSCSI ). A storage array , a common external storage unit, typically has storage hierarchy of its own. A fast cache, typically consisting of volatile and fast DRAM , is connected (via standard interfaces) to drives. These drives may have different speeds, like flash drives and non-volatile magnetic disk drives . Speed and price are generally correlated. The drives may be connected to magnetic tapes , on which

440-430: A computer system or other complex configuration such as a computer cluster , active directory server, or database server . A backup system contains at least one copy of all data considered worth saving. The data storage requirements can be large. An information repository model may be used to provide structure to this storage. There are different types of data storage devices used for copying backups of data that

495-469: A corrupted file that is unusable. This is also the case across interrelated files, as may be found in a conventional database or in applications such as Microsoft Exchange Server . The term fuzzy backup can be used to describe a backup of live data that looks like it ran correctly, but does not represent the state of the data at a single point in time. Backup options for data files that cannot be or are not quiesced include: Not all information stored on

550-400: A data security risk if they are lost or stolen. Encrypting the data on these media can mitigate this problem, however encryption is a CPU intensive process that can slow down backup speeds, and the security of the encrypted backups is only as effective as the security of the key management policy. When there are many more computers to be backed up than there are destination storage devices,

605-596: A database is stored in the form of bits, laid out into data structures on storage hardware. These data structures are designed for efficient reads and writes to and from the storage hardware. Typically the storage hardware itself is designed to meet the requirements of various systems, including databases, that extensively utilize storage. An operating DBMS always utilizes several storage types simultaneously. These different storage types, such as flash memory and external disk storage , each require different data layout methods. In principle, database storage can be viewed as

SECTION 10

#1732854779311

660-444: A fault of the drive typically just halts the spinning. Optical media is modular ; the storage controller is not tied to media itself like with hard drives or flash storage (→ flash memory controller ), allowing it to be removed and accessed through a different drive. However, recordable media may degrade earlier under long-term exposure to light. Some optical storage systems allow for cataloged data backups without human contact with

715-409: A layer of data protection. However, the users must trust the provider to maintain the privacy and integrity of their data, with confidentiality enhanced by the use of encryption . Because speed and availability are limited by a user's online connection, users with large amounts of data may need to use cloud seeding and large-scale recovery. Various methods can be used to manage backup media, striking

770-407: A limited period of time, so an offsite copy still remains as the ideal choice. Because there is no perfect storage, many backup experts recommend maintaining a second copy on a local physical device, even if the data is also backed up offsite. An unstructured repository may simply be a stack of tapes, DVD-Rs or external HDDs with minimal information about what was backed up and when. This method

825-552: A query. In large databases, this can reduce query time/cost by orders of magnitude. The simplest form of index is a sorted list of values that can be searched using a binary search with an adjacent reference to the location of the entry, analogous to the index in the back of a book. The same data can have multiple indexes (an employee database could be indexed by last name and hire date). Indexes affect performance, but not results. Database designers can add or remove indexes without changing application logic, reducing maintenance costs as

880-426: A record of an "item" in stock with all its respective "order" records. The decision of whether to cluster certain objects or not depends on the objects' utilization statistics, object sizes, caches sizes, storage types, etc. Indexing is a technique some storage engines use for improving database performance. The many types of indexes share the common property that they reduce the need to examine every entry when running

935-503: A shock-absorbing case around the hard disk, and claim a range of higher drop specifications. Over a period of years the stability of hard disk backups is shorter than that of tape backups. External hard disks can be connected via local interfaces like SCSI , USB , FireWire , or eSATA , or via longer-distance technologies like Ethernet , iSCSI , or Fibre Channel . Some disk-based backup systems, via Virtual Tape Libraries or otherwise, support data deduplication, which can reduce

990-485: A standard configuration to many systems rather than as a tool for making ongoing backups of diverse systems. An incremental backup stores data changed since a reference point in time. Duplicate copies of unchanged data are not copied. Typically a full backup of all files is made once or at infrequent intervals, serving as the reference point for an incremental repository. Subsequently, a number of incremental backups are made after successive time periods. Restores begin with

1045-406: A system administrator's home office or as sophisticated as a disaster-hardened, temperature-controlled, high-security bunker with facilities for backup media storage. A data replica can be off-site but also on-line (e.g., an off-site RAID mirror). A backup site or disaster recovery center is used to store data that can enable computer systems and networks to be restored and properly configured in

1100-506: Is already in secondary storage onto archive files . There are also different ways these devices can be arranged to provide geographic dispersion, data security , and portability . Data is selected, extracted, and manipulated for storage. The process can include methods for dealing with live data , including open files, as well as compression, encryption, and de-duplication . Additional techniques apply to enterprise client-server backup . Backup schemes may include dry runs that validate

1155-412: Is an appended ".bak" extension to the file name . A Reverse incremental backup method stores a recent archive file "mirror" of the source data and a series of differences between the "mirror" in its current state and its previous states. A reverse incremental backup method starts with a non-image full backup. After the full backup is performed, the system periodically synchronizes the full backup with

SECTION 20

#1732854779311

1210-433: Is frequently used interchangeably with " database server " or "database management system". A "database instance" refers to the processes and memory structures of the running database engine. Many of the modern DBMS support multiple storage engines within the same database. For example, MySQL supports InnoDB as well as MyISAM . Some storage engines are transactional . Additional engine types include: Information in

1265-457: Is selected upon DBMS development to best meet the operations needed for the types of data it contains. Type of data structure selected for a certain task typically also takes into consideration the type of storage it resides in (e.g., speed of access, minimal size of storage chunk accessed, etc.). In some DBMSs database administrators have the flexibility to select among options of data structures to contain user data for performance reasons. Sometimes

1320-448: Is stored in discrete units, known as files . These files are organized into filesystems . Deciding what to back up at any given time involves tradeoffs. By backing up too much redundant data, the information repository will fill up too quickly. Backing up an insufficient amount of data can eventually lead to the loss of critical information. Files that are actively being updated present a challenge to back up. One way to back up live data

1375-542: Is the IBM 3592 (also referred to as the TS11xx series). The Oracle StorageTek T10000 was discontinued in 2016. The use of hard disk storage has increased over time as it has become progressively cheaper. Hard disks are usually easy to use, widely available, and can be accessed quickly. However, hard disk backups are close-tolerance mechanical devices and may be more easily damaged than tapes, especially while being transported. In

1430-412: Is the easiest to implement, but unlikely to achieve a high level of recoverability as it lacks automation. A repository using this backup method contains complete source data copies taken at one or more specific points in time. Copying system images , this method is frequently used by computer technicians to record known good configurations. However, imaging is generally more useful as a way of deploying

1485-426: Is to temporarily quiesce them (e.g., close all files), take a "snapshot", and then resume live operations. At this point the snapshot can be backed up through normal methods. A snapshot is an instantaneous function of some filesystems that presents a copy of the filesystem as if it were frozen at a specific point in time, often by a copy-on-write mechanism. Snapshotting a file while it is being changed results in

1540-404: Is usually used to move media units from storage into a drive where the data can be read or written. Generally it has safety properties similar to on-line storage. An example is a tape library with restore times ranging from seconds to a few minutes. Off-line storage requires some direct action to provide access to the storage media: for example, inserting a tape into a tape drive or plugging in

1595-437: The ability to use a single storage device with several simultaneous backups can be useful. However cramming the scheduled backup window via "multiplexed backup" is only used for tape destinations. The process of rearranging the sets of backups in an archive file is known as refactoring. For example, if a backup system uses a single tape each day to store the incremental backups for all the protected computers, restoring one of

1650-484: The accumulated changes in data) increases, so does the time to perform the differential backup. Restoring an entire system requires starting from the most recent full backup and then applying just the last differential backup. A differential backup copies files that have been created or changed since the last full backup, regardless of whether any other differential backups have been made since, whereas an incremental backup copies files that have been created or changed since

1705-510: The amount of disk storage capacity consumed by daily and weekly backup data. Optical storage uses lasers to store and retrieve data. Recordable CDs , DVDs, and Blu-ray Discs are commonly used with personal computers and are generally cheap. The capacities and speeds of these discs have typically been lower than hard disks or tapes. Advances in optical media may shrink that gap in the future. Potential future data losses caused by gradual media degradation can be predicted by measuring

InnoDB - Misplaced Pages Continue

1760-799: The backup process. It states that there should be at least 3 copies of the data, stored on 2 different types of storage media, and one copy should be kept offsite, in a remote location (this can include cloud storage ). 2 or more different media should be used to eliminate data loss due to similar reasons (for example, optical discs may tolerate being underwater while LTO tapes may not, and SSDs cannot fail due to head crashes or damaged spindle motors since they do not have any moving parts, unlike hard drives). An offsite copy protects against fire, theft of physical media (such as tapes or discs) and natural disasters like floods and earthquakes. Physically protected hard drives are an alternative to an offsite copy, but they have limitations like only being able to resist fire for

1815-680: The computer is stored in files. Accurately recovering a complete system from scratch requires keeping track of this non-file data too. It is frequently useful or required to manipulate the data being backed up to optimize the backup process. These manipulations can improve backup speed, restore speed, data security, media usage and/or reduced bandwidth requirements. Out-of-date data can be automatically deleted, but for personal backup applications—as opposed to enterprise client-server backup applications where automated data "grooming" can be customized—the deletion can at most be globally delayed or be disabled. Various schemes can be employed to shrink

1870-457: The computers could require many tapes. Refactoring could be used to consolidate all the backups for a single computer onto a single tape, creating a "synthetic full backup". This is especially useful for backup systems that do incrementals forever style backups. Sometimes backups are copied to a staging disk before being copied to tape. This process is sometimes referred to as D2D2T, an acronym for Disk-to-disk-to-tape . It can be useful if there

1925-649: The consistency of live data, protecting self-consistent files but requiring applications "be quiesced and made ready for backup." Near-CDP is more practicable for ordinary personal backup applications, as opposed to true CDP, which must be run in conjunction with a virtual machine or equivalent and is therefore generally used in enterprise client-server backups. Software may create copies of individual files such as written documents, multimedia projects, or user preferences, to prevent failed write events caused by power outages, operating system crashes, or exhausted disk space, from causing data loss. A common implementation

1980-408: The data frozen at a particular point in time . Near-CDP (except for Apple Time Machine ) intent-logs every change on the host system, often by saving byte or block-level differences rather than file-level differences. This backup method differs from simple disk mirroring in that it enables a roll-back of the log and thus a restoration of old images of data. Intent-logging allows precautions for

2035-410: The data has to be copied onto an archive file data storage medium. The medium used is also referred to as the type of backup destination. Magnetic tape was for a long time the most commonly used medium for bulk data storage, backup, archiving, and interchange. It was previously a less expensive option, but this is no longer the case for smaller amounts of data. Tape is a sequential access medium, so

2090-918: The data structures have selectable parameters to tune the database performance. Databases may store data in many data structure types. Common examples are the following: In contrast to conventional row-orientation, relational databases can also be column-oriented or correlational in the way they store data in any particular structure. In general, substantial performance improvement is gained if different types of database objects that are usually utilized together are laid in storage in proximity, being "clustered". This usually allows to retrieve needed related objects from storage in minimum number of input operations (each sometimes substantially time-consuming). Even for in-memory databases clustering provides performance advantage due to common utilization of large caches for input-output operations in memory, with similar resulting behavior. For example, it may be beneficial to cluster

2145-477: The data. This allows restoration of data to any point in time and is the most comprehensive and advanced data protection. Near-CDP backup applications—often marketed as "CDP"—automatically take incremental backups at a specific interval, for example every 15 minutes, one hour, or 24 hours. They can therefore only allow restores to an interval boundary. Near-CDP backup applications use journaling and are typically based on periodic "snapshots", read-only copies of

2200-481: The database grows and database usage evolves. Indexes can speed up data access, but they consume space in the database, and must be updated each time the data is altered. Indexes therefore can speed data access but slow data maintenance. These two properties determine whether a given index is worth the cost. Backup In information technology , a backup , or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore

2255-1313: The discs, allowing for longer data integrity. A French study in 2008 indicated that the lifespan of typically-sold CD-Rs was 2–10 years, but one manufacturer later estimated the longevity of its CD-Rs with a gold-sputtered layer to be as high as 100 years. Sony's proprietary Optical Disc Archive can in 2016 reach a read rate of 250 MB/s. Solid-state drives (SSDs) use integrated circuit assemblies to store data. Flash memory , thumb drives , USB flash drives , CompactFlash , SmartMedia , Memory Sticks , and Secure Digital card devices are relatively expensive for their low capacity, but convenient for backing up relatively low data volumes. A solid-state drive does not contain any movable parts, making it less susceptible to physical damage, and can have huge throughput of around 500 Mbit/s up to 6 Gbit/s. Available SSDs have become more capacious and cheaper. Flash memory backups are stable for fewer years than hard disk backups. Remote backup services or cloud backups involve service providers storing data offsite. This has been used to protect against events such as fires, floods, or earthquakes which could destroy locally stored backups. Cloud-based backup (through services like or similar to Google Drive , and Microsoft OneDrive ) provides

InnoDB - Misplaced Pages Continue

2310-499: The event of a disaster. Some organisations have their own data recovery centres, while others contract this out to a third-party. Due to high costs, backing up is rarely considered the preferred method of moving data to a DR site. A more typical way would be remote disk mirroring , which keeps the DR data as up to date as possible. A backup operation starts with selecting and extracting coherent units of data. Most data on modern computer systems

2365-597: The exception being some OEM versions. InnoDB became a product of Oracle Corporation after its acquisition of the Finland-based company Innobase in October 2005. The software is dual licensed ; it is distributed under the GNU General Public License , but can also be licensed to parties wishing to combine InnoDB in proprietary software . InnoDB supports: Database engine The term "database engine"

2420-411: The last full backup and then apply the incrementals. Some backup systems can create a synthetic full backup from a series of incrementals, thus providing the equivalent of frequently doing a full backup. When done to modify a single archive file, this speeds restores of recent versions of files. Continuous Data Protection (CDP) refers to a backup that instantly saves a copy of every change made to

2475-496: The least active parts of a large database may reside. This may also be where backups are located. A data structure is an abstract construct that embeds data in a well defined manner. An efficient data structure allows manipulation of the data in efficient ways. The data manipulation may include data insertion, deletion, updating and retrieval in various modes. A certain data structure type may be very effective in certain operations, and very ineffective in others. A data structure type

2530-404: The live copy, while storing the data necessary to reconstruct older versions. This can either be done using hard links —as Apple Time Machine does, or using binary diffs . A differential backup saves only the data that has changed since the last full backup. This means a maximum of two backups from the repository are used to restore the data. However, as time from the last full backup (and thus

2585-432: The mid-2000s, several drive manufacturers began to produce portable drives employing ramp loading and accelerometer technology (sometimes termed a "shock sensor"), and by 2010 the industry average in drop tests for drives with that technology showed drives remaining intact and working after a 36-inch non-operating drop onto industrial carpeting. Some manufacturers also offer 'ruggedized' portable hard drives, which include

2640-400: The most recent backup of any type (full or incremental). Changes in files may be detected through a more recent date/time of last modification file attribute , and/or changes in file size. Other variations of incremental backup include multi-level incrementals and block-level incrementals that compare parts of files instead of just entire files. Regardless of the repository model that is used,

2695-469: The needed data access operations. A database, while in operation, resides simultaneously in several types of storage, forming a storage hierarchy . Inside of a contemporary computer hosting a DBMS, most of the "database" part resides, partially replicated, in volatile storage . Data that are actively being processed and manipulated reside inside the processor , possibly in processor's caches . These data are read from and written to memory, typically through

2750-415: The original after a data loss event. The verb form, referring to the process of doing so, is " back up ", whereas the noun and adjective form is " backup ". Backups can be used to recover data after its loss from data deletion or corruption , or to recover data from an earlier time.   Backups provide a simple form of IT disaster recovery ; however not all backup systems are able to reconstitute

2805-447: The rate of continuously writing or reading data can be very fast. While tape media itself has a low cost per space, tape drives are typically dozens of times as expensive as hard disk drives and optical drives . Many tape formats have been proprietary or specific to certain markets like mainframes or a particular brand of personal computer. By 2014 LTO had become the primary tape technology. The other remaining viable "super" format

SECTION 50

#1732854779311

2860-432: The rate of correctable minor data errors , of which consecutively too many increase the risk of uncorrectable sectors. Support for error scanning varies among optical drive vendors. Many optical disc formats are WORM type, which makes them useful for archival purposes since the data cannot be changed. Moreover, optical discs are not vulnerable to head crashes , magnetism, imminent water ingress or power surges ; and,

2915-476: The reliability of the data being backed up. There are limitations and human factors involved in any backup scheme. A backup strategy requires an information repository, "a secondary storage space for data" that aggregates backups of data "sources". The repository could be as simple as a list of all backup media (DVDs, etc.) and the dates produced, or could include a computerized index, catalog, or relational database . The backup data needs to be stored, requiring

2970-618: The size of the source data to be stored so that it uses less storage space. Compression is frequently a built-in feature of tape drive hardware. Redundancy due to backing up similarly configured workstations can be reduced, thus storing just one copy. This technique can be applied at the file or raw block level. This potentially large reduction is called deduplication . It can occur on a server before any data moves to backup media, sometimes referred to as source/client side deduplication. This approach also reduces bandwidth required to send backup data to its target media. The process can also occur at

3025-460: The target storage device, sometimes referred to as inline or back-end deduplication. Sometimes backups are duplicated to a second set of storage media. This can be done to rearrange the archive files to optimize restore speed, or to have a second copy at a different location or on a different storage medium—as in the disk-to-disk-to-tape capability of Enterprise client-server backup. High-capacity removable storage media such as backup tapes present

#310689