Amazon Simple Storage Service ( S3 ) is a service offered by Amazon Web Services (AWS) that provides object storage through a web service interface. Amazon S3 uses the same scalable storage infrastructure that Amazon.com uses to run its e-commerce network. Amazon S3 can store any type of object, which allows uses like storage for Internet applications, backups, disaster recovery, data archives, data lakes for analytics, and hybrid cloud storage . AWS launched Amazon S3 in the United States on March 14, 2006, then in Europe in November 2007.
80-455: Amazon S3 manages data with an object storage architecture which aims to provide scalability , high availability , and low latency with high durability . The basic storage units of Amazon S3 are objects which are organized into buckets. Each object is identified by a unique, user-assigned key. Buckets can be managed using the console provided by Amazon S3, programmatically with the AWS SDK , or
160-493: A computer a brief window of time to move information from primary volatile storage into non-volatile storage before the batteries are exhausted. Some systems, for example EMC Symmetrix , have integrated batteries that maintain volatile storage for several minutes. Utilities such as hdparm and sar can be used to measure IO performance in Linux. Full disk encryption , volume and virtual disk encryption, andor file/folder encryption
240-484: A designated timeframe. They should understand how deviations from SLAs are calculated, as these parameters may differ from those of other AWS services. These requirements can impose a significant burden on customers. Additionally, SLA percentages and conditions can vary from those of other AWS services. In cases of data loss due to hardware failure attributable to Amazon, the company does not provide monetary compensation; instead, affected users may receive credits if they meet
320-702: A drive. When the computer has finished reading the information, the robotic arm will return the medium to its place in the library. Tertiary storage is also known as nearline storage because it is "near to online". The formal distinction between online, nearline, and offline storage is: For example, always-on spinning hard disk drives are online storage, while spinning drives that spin down automatically, such as in massive arrays of idle disks ( MAID ), are nearline storage. Removable media such as tape cartridges that can be automatically loaded, as in tape libraries , are nearline storage, while tape cartridges that must be manually loaded are offline storage. Off-line storage
400-485: A durability guarantee of 99.999999999% (referred to as "11 nines"), primarily addressing data loss from hardware failures. However, this guarantee does not extend to losses resulting from human errors (such as accidental deletion), misconfigurations, third-party failures and subsequent data corruptions , natural disasters , force majeure events, or security breaches . Customers are responsible for monitoring SLA compliance and must submit claims for any unmet SLAs within
480-572: A file system. The semantics of the Amazon S3 file system are not that of a POSIX file system, so the file system may not behave entirely as expected. Amazon S3 offers nine different storage classes with different levels of durability, availability, and performance requirements. The Amazon S3 Glacier storage classes above are distinct from Amazon Glacier , which is a separate product with its own APIs. An object in S3 can be between 0 bytes and 5TB. If an object
560-624: A goal to form a committee and design a specification based on the SCSI interface protocol. This defined objects as abstracted data, with unique identifiers and metadata, how objects related to file systems, along with many other innovative concepts. Anderson presented many of these ideas at the SNIA conference in October 1999. The presentation revealed an IP Agreement that had been signed in February 1997 between
640-508: A list of identifiers for objects within a partition, optionally filtered by matches against their attribute values. A list command can also return selected attributes of the listed objects. Read and write commands can be combined, or piggy-backed, with commands to get and set attributes. This ability reduces the number of times a high-level storage system has to cross the interface to the OSD, which can improve overall efficiency. A second generation of
720-523: A memory in which they store their operating instructions and data. Such computers are more versatile in that they do not need to have their hardware reconfigured for each new program, but can simply be reprogrammed with new in-memory instructions; they also tend to be simpler to design, in that a relatively simple processor may keep state between successive computations to build up complex procedural results. Most modern computers are von Neumann machines. A modern digital computer represents data using
800-455: A series of fixed size blocks which are numbered starting at 0. Data must be that exact fixed size and can be stored in a particular block which is identified by its logical block number (LBN). Later, one can retrieve that block of data by specifying its unique LBN. With a key–value store, data is identified by a key rather than a LBN. A key might be "cat" or "olive" or "42". It can be an arbitrary sequence of bytes of arbitrary length. Data (called
880-574: A source to read instructions from, in order to start the computer. Hence, non-volatile primary storage containing a small startup program ( BIOS ) is used to bootstrap the computer, that is, to read a larger program from non-volatile secondary storage to RAM and start to execute it. A non-volatile technology used for this purpose is called ROM, for read-only memory (the terminology may be somewhat confusing as most ROM types are also capable of random access ). Many types of "ROM" are not literally read only , as updates to them are possible; however it
SECTION 10
#1732884843145960-624: A storage system while simultaneously other clients store files on the same storage system. Other vendors in the area of Hybrid cloud storage are using Cloud storage gateways to provide a file access layer over object storage, implementing file access protocols such as SMB and NFS. Some large Internet companies developed their own software when object-storage products were not commercially available or use cases were very specific. Facebook famously invented their own object-storage software, code-named Haystack, to address their particular massive-scale photo management needs efficiently. Object storage at
1040-559: A value in this parlance) does not need to be a fixed size and also can be an arbitrary sequence of bytes of arbitrary length. One stores data by presenting the key and data (value) to the data store and can later retrieve the data by presenting the key. This concept is seen in programming languages. Python calls them dictionaries, Perl calls them hashes, Java, Rust and C++ call them maps, etc. Several data stores also implement key–value stores such as Memcached, Redis and CouchDB. Object stores are similar to key–value stores in two respects. First,
1120-559: Is Quantum ActiveScale Object Storage Platform. More general-purpose object-storage systems came to market around 2008. Lured by the incredible growth of "captive" storage systems within web applications like Yahoo Mail and the early success of cloud storage, object-storage systems promised the scale and capabilities of cloud storage, with the ability to deploy the system within an enterprise, or at an aspiring cloud-storage service provider. A few object-storage systems support Unified File and Object storage, allowing clients to store objects on
1200-547: Is a core function and fundamental component of computers. The central processing unit (CPU) of a computer is what manipulates data by performing computations. In practice, almost all computers use a storage hierarchy , which puts fast but expensive and small storage options close to the CPU and slower but less expensive and larger options further away. Generally, the fast technologies are referred to as "memory", while slower persistent technologies are referred to as "storage". Even
1280-427: Is a form of volatile memory that also requires the stored information to be periodically reread and rewritten, or refreshed , otherwise it would vanish. Static random-access memory is a form of volatile memory similar to DRAM with the exception that it never needs to be refreshed as long as power is applied; it loses its content when the power supply is lost. An uninterruptible power supply (UPS) can be used to give
1360-425: Is a level below secondary storage. Typically, it involves a robotic mechanism which will mount (insert) and dismount removable mass storage media into a storage device according to the system's demands; such data are often copied to secondary storage before use. It is primarily used for archiving rarely accessed information since it is much slower than secondary storage (e.g. 5–60 seconds vs. 1–10 milliseconds). This
1440-410: Is computer data storage on a medium or a device that is not under the control of a processing unit . The medium is recorded, usually in a secondary or tertiary storage device, and then physically removed or disconnected. It must be inserted or connected by a human operator before a computer can access it again. Unlike tertiary storage, it cannot be accessed without human interaction. Off-line storage
1520-609: Is estimable using S.M.A.R.T. diagnostic data that includes the hours of operation and the count of spin-ups, though its reliability is disputed. Flash storage may experience downspiking transfer rates as a result of accumulating errors, which the flash memory controller attempts to correct. The health of optical media can be determined by measuring correctable minor errors , of which high counts signify deteriorating and/or low-quality media. Too many consecutive minor errors can lead to data corruption. Not all vendors and models of optical drives support error scanning. As of 2011 ,
1600-818: Is larger than 5TB, it must be divided into chunks prior to uploading. When uploading, Amazon S3 allows a maximum of 5GB in a single upload operation; hence, objects larger than 5GB must be uploaded via the S3 multipart upload API. The broad adoption of Amazon S3 and related tooling has given rise to competing services based on the S3 API. These services use the standard programming interface but are differentiated by their underlying technologies and business models. A standard interface enables better competition from rival providers and allows economies of scale in implementation, among other benefits. Amazon Web Services introduced Amazon S3 in 2006. In November 2017 AWS added default encryption capabilities at bucket level. Amazon S3 provides
1680-416: Is primarily useful for extraordinarily large data stores, accessed without human operators. Typical examples include tape libraries and optical jukeboxes . When a computer needs to read information from the tertiary storage, it will first consult a catalog database to determine which tape or disc contains the information. Next, the computer will instruct a robotic arm to fetch the medium and place it in
SECTION 20
#17328848431451760-700: Is readily available for most storage devices. Hardware memory encryption is available in Intel Architecture, supporting Total Memory Encryption (TME) and page granular memory encryption with multiple keys (MKTME). and in SPARC M7 generation since October 2015. Distinct types of data storage have different points of failure and various methods of predictive failure analysis . Vulnerabilities that can instantly lead to total loss are head crashing on mechanical hard drives and failure of electronic components on flash storage. Impending failure on hard disk drives
1840-488: Is slow and memory must be erased in large portions before it can be re-written. Some embedded systems run programs directly from ROM (or similar), because such programs are rarely changed. Standard computers do not store non-rudimentary programs in ROM, and rather, use large capacities of secondary storage, which is non-volatile as well, and not as costly. Recently, primary storage and secondary storage in some uses refer to what
1920-511: Is stored in metadata servers and file data is stored in object storage servers. File system client software interacts with the distinct servers, and abstracts them to present a full file system to users and applications. Some early incarnations of object storage were used for archiving, as implementations were optimized for data services like immutability, not performance. EMC Centera and Hitachi HCP (formerly known as HCAP) are two commonly cited object storage products for archiving. Another example
2000-442: Is the only one directly accessible to the CPU. The CPU continuously reads instructions stored there and executes them as required. Any data actively operated on is also stored there in a uniform manner. Historically, early computers used delay lines , Williams tubes , or rotating magnetic drums as primary storage. By 1954, those unreliable methods were mostly replaced by magnetic-core memory . Core memory remained dominant until
2080-400: Is typically associated with a variable amount of metadata , and a globally unique identifier . Object storage can be implemented at multiple levels, including the device level (object-storage device), the system level, and the interface level. In each case, object storage seeks to enable capabilities not addressed by other storage architectures, like interfaces that are directly programmable by
2160-436: Is typically corrected upon detection. A bit or a group of malfunctioning physical bits (the specific defective bit is not always known; group definition depends on the specific storage device) is typically automatically fenced out, taken out of use by the device, and replaced with another functioning equivalent group in the device, where the corrected bit values are restored (if possible). The cyclic redundancy check (CRC) method
2240-505: Is typically measured in milliseconds (thousandths of a second), while the access time per byte for primary storage is measured in nanoseconds (billionths of a second). Thus, secondary storage is significantly slower than primary storage. Rotating optical storage devices, such as CD and DVD drives, have even longer access times. Other examples of secondary storage technologies include USB flash drives , floppy disks , magnetic tape , paper tape , punched cards , and RAM disks . Once
2320-470: Is typically used in communications and storage for error detection . A detected error is then retried. Data compression methods allow in many cases (such as a database) to represent a string of bits by a shorter bit string ("compress") and reconstruct the original string ("decompress") when needed. This utilizes substantially less storage (tens of percent) for many types of data at the cost of more computation (compress and decompress when needed). Analysis of
2400-496: Is used to transfer information since the detached medium can easily be physically transported. Additionally, it is useful for cases of disaster, where, for example, a fire destroys the original data, a medium in a remote location will be unaffected, enabling disaster recovery . Off-line storage increases general information security since it is physically inaccessible from a computer, and data confidentiality or integrity cannot be affected by computer-based attack techniques. Also, if
2480-457: Is used to manage over 500 million objects a day. As of March 3, 2014, EMC claims to have sold over 1.5 exabytes of Atmos storage. On July 1, 2014, Los Alamos National Lab chose the Scality RING as the basis for a 500-petabyte storage environment, which would be among the largest ever. "Captive" object storage systems like Facebook's Haystack have scaled impressively. In April 2009, Haystack
Amazon S3 - Misplaced Pages Continue
2560-687: The CPU ( secondary or tertiary storage ), typically hard disk drives , optical disc drives, and other devices slower than RAM but non-volatile (retaining contents when powered down). Historically, memory has, depending on technology, been called central memory , core memory , core storage , drum , main memory , real storage , or internal memory . Meanwhile, slower persistent storage devices have been referred to as secondary storage , external memory , or auxiliary/peripheral storage . Primary storage (also known as main memory , internal memory , or prime memory ), often referred to simply as memory ,
2640-556: The International Committee for Information Technology Standards (INCITS). T10 is responsible for all SCSI standards. One of the first object-storage products, Lustre , is used in 70% of the Top 100 supercomputers and ~50% of the Top 500 . As of June 16, 2013, this includes 7 of the top 10, including the current fourth fastest system on the list - China's Tianhe-2 and the seventh fastest,
2720-708: The REST application programming interface. Objects can be up to five terabytes in size. Requests are authorized using an access control list associated with each object bucket and support versioning which is disabled by default. Since buckets are typically the size of an entire file system mount in other systems, this access control scheme is very coarse-grained. In other words, unique access controls cannot be associated with individual files. Amazon S3 can be used to replace static web-hosting infrastructure with HTTP client-accessible objects, index document support, and error document support. The Amazon AWS authentication mechanism allows
2800-680: The Titan supercomputer at the Oak Ridge National Laboratory . Object-storage systems had good adoption in the early 2000s as an archive platform, particularly in the wake of compliance laws like Sarbanes-Oxley . After five years in the market, EMC's Centera product claimed over 3,500 customers and 150 petabytes shipped by 2007. Hitachi's HCP product also claims many petabyte -scale customers. Newer object storage systems have also gotten some traction, particularly around very large custom applications like eBay's auction site, where EMC Atmos
2880-523: The arithmetic logic unit (ALU). The former controls the flow of data between the CPU and memory, while the latter performs arithmetic and logical operations on data. Without a significant amount of memory, a computer would merely be able to perform fixed operations and immediately output the result. It would have to be reconfigured to change its behavior. This is acceptable for devices such as desk calculators , digital signal processors , and other specialized devices. Von Neumann machines differ in having
2960-417: The binary numeral system . Text, numbers, pictures, audio, and nearly any other form of information can be converted into a string of bits , or binary digits, each of which has a value of 0 or 1. The most common unit of storage is the byte , equal to 8 bits. A piece of information can be handled by any computer or device whose storage space is large enough to accommodate the binary representation of
3040-476: The disk read/write head on HDDs reaches the proper placement and the data, subsequent data on the track are very fast to access. To reduce the seek time and rotational latency, data are transferred to and from disks in large contiguous blocks. Sequential or block access on disks is orders of magnitude faster than random access, and many sophisticated paradigms have been developed to design efficient algorithms based on sequential and block access. Another way to reduce
3120-498: The "OBJECT BASED STORAGE DEVICES Command Set Proposal" dated 10/25/1999 was submitted by Seagate as edited by Seagate's Dave Anderson and was the product of work by the National Storage Industry Consortium (NSIC) including contributions by Carnegie Mellon University , Seagate, IBM, Quantum, and StorageTek. This paper was proposed to INCITS T-10 ( International Committee for Information Technology Standards ) with
3200-690: The 1970s, when advances in integrated circuit technology allowed semiconductor memory to become economically competitive. This led to modern random-access memory (RAM). It is small-sized, light, but quite expensive at the same time. The particular types of RAM used for primary storage are volatile , meaning that they lose the information when not powered. Besides storing opened programs, it serves as disk cache and write buffer to improve both reading and writing performance. Operating systems borrow RAM capacity for caching so long as it's not needed by running software. Spare memory can be utilized as RAM drive for temporary high-speed data storage. As shown in
3280-508: The I/O bottleneck is to use multiple disks in parallel to increase the bandwidth between primary and secondary memory. Secondary storage is often formatted according to a file system format, which provides the abstraction necessary to organize data into files and directories , while also providing metadata describing the owner of a certain file, the access time, the access permissions, and other information. Most computer operating systems use
Amazon S3 - Misplaced Pages Continue
3360-461: The OSD might physically copy the data to the new partition. The standard defines clones, which are writeable, and snapshots, which are read-only. A collection is a special kind of object that contains the identifiers of other objects. There are operations to add and delete from collections, and there are operations to get or set attributes for all the objects in a collection. Collections are also used for error reporting. If an object becomes damaged by
3440-480: The OSD, such as the number of bytes in an object and the modification time of an object. There is a special policy tag attribute that is part of the security mechanism. Other attributes are uninterpreted by the OSD. These are set on objects by the higher-level storage systems that use the OSD for persistent storage. For example, attributes might be used to classify objects, or to capture relationships among different objects stored on different OSDs. A list command returns
3520-406: The SCSI command set, "Object-Based Storage Devices - 2" (OSD-2) added support for snapshots, collections of objects, and improved error handling. A snapshot is a point-in-time copy of all the objects in a partition into a new partition. The OSD can implement a space-efficient copy using copy-on-write techniques so that the two partitions share objects that are unchanged between the snapshots, or
3600-446: The acronym, he explained his motivation behind the coinage, saying, "A blob is the thing that ate Cincinnatti [ sic ], Cleveland, or whatever," referring to the 1958 science fiction film The Blob . In 1995, research led by Garth Gibson on Network-Attached Secure Disks first promoted the concept of splitting less common operations, like namespace manipulations, from common operations, like reads and writes, to optimize
3680-716: The addressing and identification of individual objects by more than just file name and file path. Object storage adds a unique identifier within a bucket, or across the entire system, to support much larger namespaces and eliminate name collisions. Object storage explicitly separates file metadata from data to support additional capabilities. As opposed to fixed metadata in file systems (filename, creation date, type, etc.), object storage provides for full function, custom, object-level metadata in order to: Additionally, in some object-based file-system implementations: Object-based storage devices ( OSD ) as well as some software implementations (e.g., DataCore Swarm) manage metadata and data at
3760-521: The application, a namespace that can span multiple instances of physical hardware, and data-management functions like data replication and data distribution at object-level granularity. Object storage systems allow retention of massive amounts of unstructured data in which data is written once and read once (or many times). Object storage is used for purposes such as storing objects like videos and photos on Facebook , songs on Spotify , or files in online collaboration services, such as Dropbox . One of
3840-424: The computer to detect errors in coded data and correct them based on mathematical algorithms. Errors generally occur in low probabilities due to random bit value flipping, or "physical bit fatigue", loss of the physical bit in the storage of its ability to maintain a distinguishable value (0 or 1), or due to errors in inter or intra-computer communication. A random bit flip (e.g. due to random radiation )
3920-524: The concept of virtual memory , allowing the utilization of more primary storage capacity than is physically available in the system. As the primary memory fills up, the system moves the least-used chunks ( pages ) to a swap file or page file on secondary storage, retrieving them later when needed. If a lot of pages are moved to slower secondary storage, the system performance is degraded. The secondary storage, including HDD , ODD and SSD , are usually block-addressable. Tertiary storage or tertiary memory
4000-542: The concepts developed by the NASD team. Seagate Technology played a central role in the development of object storage. According to the Storage Networking Industry Association (SNIA), "Object storage originated in the late 1990s: Seagate specifications from 1999 Introduced some of the first commands and how operating system effectively removed from consumption of the storage." A preliminary version of
4080-619: The creation of authenticated URLs, valid for a specified amount of time. Every item in a bucket can also be served as a BitTorrent feed. The Amazon S3 store can act as a seed host for a torrent and any BitTorrent client can retrieve the file. This can drastically reduce the bandwidth cost for the download of popular objects. A bucket can be configured to save HTTP log information to a sibling bucket; this can be used in data mining operations. There are various User Mode File System (FUSE) –based file systems for Unix-like operating systems (for example, Linux ) that can be used to mount an S3 bucket as
SECTION 50
#17328848431454160-430: The desired data to primary storage. Secondary storage is non-volatile (retaining data when its power is shut off). Modern computer systems typically have two orders of magnitude more secondary storage than primary storage because secondary storage is less expensive. In modern computers, hard disk drives (HDDs) or solid-state drives (SSDs) are usually used as secondary storage. The access time per byte for HDDs or SSDs
4240-491: The desired location of data. Then it reads or writes the data in the memory cells using the data bus. Additionally, a memory management unit (MMU) is a small device between CPU and RAM recalculating the actual memory address, for example to provide an abstraction of virtual memory or other tasks. As the RAM types used for primary storage are volatile (uninitialized at start up), a computer containing only such storage would not have
4320-400: The diagram, traditionally there are two more sub-layers of the primary storage, besides main large-capacity RAM: Main memory is directly or indirectly connected to the central processing unit via a memory bus . It is actually two buses (not on the diagram): an address bus and a data bus . The CPU firstly sends a number through an address bus, a number called memory address , that indicates
4400-401: The eligibility criteria. Object storage Object storage (also known as object-based storage or blob storage ) is a computer data storage approach that manages data as "blobs" or "objects", as opposed to other storage architectures like file systems , which manage data as a file hierarchy, and block storage , which manages data as blocks within sectors and tracks. Each object
4480-459: The first computer designs, Charles Babbage 's Analytical Engine and Percy Ludgate 's Analytical Machine, clearly distinguished between processing and memory (Babbage stored numbers as rotations of gears, while Ludgate stored numbers as displacements of rods in shuttles). This distinction was extended in the Von Neumann architecture , where the CPU consists of two main parts: The control unit and
4560-498: The first version of the OSD standard, objects are specified with a 64-bit partition ID and a 64-bit object ID. Partitions are created and deleted within an OSD, and objects are created and deleted within partitions. There are no fixed sizes associated with partitions or objects; they are allowed to grow subject to physical size limitations of the device or logical quota constraints on a partition. An extensible set of attributes describe objects. Some attributes are implemented directly by
4640-477: The former using standard MOSFETs and the latter using floating-gate MOSFETs . In modern computers, primary storage almost exclusively consists of dynamic volatile semiconductor random-access memory (RAM), particularly dynamic random-access memory (DRAM). Since the turn of the century, a type of non-volatile floating-gate semiconductor memory known as flash memory has steadily gained share as off-line storage for home computers. Non-volatile semiconductor memory
4720-568: The information stored for archival purposes is rarely accessed, off-line storage is less expensive than tertiary storage. In modern personal computers, most secondary and tertiary storage media are also used for off-line storage. Optical discs and flash memory devices are the most popular, and to a much lesser extent removable hard disk drives; older examples include floppy disks and Zip disks. In enterprise uses, magnetic tape cartridges are predominant; older examples include open-reel magnetic tape and punched cards. Storage technologies at all levels of
4800-424: The limitations with object storage is that it is not intended for transactional data , as object storage was not designed to replace NAS file access and sharing; it does not support the locking and sharing mechanisms needed to maintain a single, accurately updated version of a file. Jim Starkey coined the term " blob " working at Digital Equipment Corporation to refer to opaque data entities. The terminology
4880-480: The lower a storage is in the hierarchy, the lesser its bandwidth and the greater its access latency is from the CPU. This traditional division of storage to primary, secondary, tertiary, and off-line storage is also guided by cost per bit. In contemporary usage, memory is usually fast but temporary semiconductor read-write memory , typically DRAM (dynamic RAM) or other such devices. Storage consists of storage devices and their media not directly accessible by
SECTION 60
#17328848431454960-641: The most commonly used data storage media are semiconductor, magnetic, and optical, while paper still sees some limited usage. Some other fundamental storage technologies, such as all-flash arrays (AFAs) are proposed for development. Semiconductor memory uses semiconductor -based integrated circuit (IC) chips to store information. Data are typically stored in metal–oxide–semiconductor (MOS) memory cells . A semiconductor memory chip may contain millions of memory cells, consisting of tiny MOS field-effect transistors (MOSFETs) and/or MOS capacitors . Both volatile and non-volatile forms of semiconductor memory exist,
5040-548: The object identifier or URL (the equivalent of the key) can be an arbitrary string. Second, data may be of an arbitrary size. There are, however, a few key differences between key–value stores and object stores. First, object stores also allow one to associate a limited set of attributes (metadata) with each piece of data. The combination of a key, value, and set of attributes is referred to as an object. Second, object stores are optimized for large amounts of data (hundreds of megabytes or even gigabytes), whereas for key–value stores
5120-659: The object-based-storage market annually using its MarketScape methodology. IDC describes the MarketScape as: "...a quantitative and qualitative assessment of the characteristics that assess a vendor's current and future success in the said market or market segment and provide a measure of their ascendancy to become a Leader or maintain a leadership. IDC MarketScape assessments are particularly helpful in emerging markets that are often fragmented, have several players, and lack clear leaders." In 2019, IDC rated Dell EMC , Hitachi Data Systems , IBM , NetApp , and Scality as leaders. In
5200-483: The occurrence of a media defect (i.e., a bad spot on the disk) or by a software error within the OSD implementation, its identifier is put into a special error collection. The higher-level storage system that uses the OSD can query this collection and take corrective action as necessary. The border between an object store and a key–value store is blurred, with key–value stores being sometimes loosely referred to as object stores. A traditional block storage interface uses
5280-761: The original collaborators (with Seagate represented by Anderson and Chris Malakapalli) and covered the benefits of object storage, scalable computing, platform independence, and storage management. One of the design principles of object storage is to abstract some of the lower layers of storage away from the administrators and applications. Thus, data is exposed and managed as objects instead of blocks or (exclusively) files. Objects contain additional descriptive properties which can be used for better indexing or management. Administrators do not have to perform lower-level storage functions like constructing and managing logical volumes to utilize disk capacity or setting RAID levels to deal with disk failure. Object storage also allows
5360-425: The performance and scale of both. In the same year, a Belgian company - FilePool - was established to build the basis for archiving functions. Object storage was proposed at Gibson's Carnegie Mellon University lab as a research project in 1996. Another key concept was abstracting the writes and reads of data to more flexible data containers (objects). Fine grained access control through object storage architecture
5440-518: The piece of information , or simply data . For example, the complete works of Shakespeare , about 1250 pages in print, can be stored in about five megabytes (40 million bits) with one byte per character. Data are encoded by assigning a bit pattern to each character , digit , or multimedia object. Many standards exist for encoding (e.g. character encodings like ASCII , image encodings like JPEG , and video encodings like MPEG-4 ). By adding bits to each encoded unit, redundancy allows
5520-566: The protocol and device layer was proposed 20 years ago and approved for the SCSI command set nearly 10 years ago as "Object-based Storage Device Commands" (OSD), however, it had not been put into production until the development of the Seagate Kinetic Open Storage platform. The SCSI command set for Object Storage Devices was developed by a working group of the SNIA for the T10 committee of
5600-600: The storage back-end to many popular applications like Smugmug and Dropbox , Amazon S3 has grown to massive scale, citing over 2-trillion objects stored in April 2013. Two months later, Microsoft claimed that they stored even more objects in Azure at 8.5 trillion. By April 2014, Azure claimed over 20-trillion objects stored. Windows Azure Storage manages Blobs (user files), Tables (structured storage), and Queues (message delivery) and counts them all as objects. IDC has begun to assess
5680-525: The storage device level: Object storage provides programmatic interfaces to allow applications to manipulate data. At the base level, this includes Create, read, update and delete ( CRUD ) functions for basic read, write and delete operations. Some object storage implementations go further, supporting additional functionality like object/file versioning , object replication, life-cycle management and movement of objects between different tiers and types of storage. Most API implementations are REST -based, allowing
5760-413: The storage hierarchy can be differentiated by evaluating certain core characteristics as well as measuring characteristics specific to a particular implementation. These core characteristics are volatility, mutability, accessibility, and addressability. For any particular implementation of any storage technology, the characteristics worth measuring are capacity and performance. Non-volatile memory retains
5840-422: The stored information even if not constantly supplied with electric power. It is suitable for long-term storage of information. Volatile memory requires constant power to maintain the stored information. The fastest memory technologies are volatile ones, although that is not a universal rule. Since the primary storage is required to be very fast, it predominantly uses volatile memory. Dynamic random-access memory
5920-421: The trade-off between storage cost saving and costs of related computations and possible delays in data availability is done before deciding whether to keep certain data compressed or not. For security reasons , certain types of data (e.g. credit card information) may be kept encrypted in storage to prevent the possibility of unauthorized information reconstruction from chunks of storage snapshots. Generally,
6000-596: The use of many standard HTTP calls. The vast majority of cloud storage available in the market leverages an object-storage architecture. Some notable examples are Amazon Web Services S3 , which debuted in March 2006, Microsoft Azure Blob Storage, Rackspace Cloud Files (whose code was donated in 2010 to Openstack project and released as OpenStack Swift ), and Google Cloud Storage released in May 2010. Some distributed file systems use an object-based architecture, where file metadata
6080-412: The value is expected to be relatively small (kilobytes). Finally, object stores usually offer weaker consistency guarantees such as eventual consistency , whereas key–value stores offer strong consistency . Computer data storage Computer data storage or digital data storage is a technology consisting of computer components and recording media that are used to retain digital data . It
6160-488: Was adopted for Rdb/VMS . "Blob" is often humorously explained to be an abbreviation for "binary large object". According to Starkey, this backronym arose when Terry McKiever, working in marketing at Apollo Computer felt that the term needed to be an abbreviation. McKiever began using the expansion "Basic Large Object". This was later eclipsed by the retroactive explanation of blobs as "Binary Large Objects". According to Starkey, "Blob don't stand for nothin'." Rejecting
6240-827: Was further described by one of the NASD team, Howard Gobioff, who later was one of the inventors of the Google File System . Other related work includes the Coda filesystem project at Carnegie Mellon , which started in 1987, and spawned the Lustre file system . There is also the OceanStore project at UC Berkeley, which started in 1999 and the Logistical Networking project at the University of Tennessee Knoxville, which started in 1998. In 1999, Gibson founded Panasas to commercialize
6320-441: Was historically called, respectively, secondary storage and tertiary storage . The primary storage, including ROM , EEPROM , NOR flash , and RAM , are usually byte-addressable . Secondary storage (also known as external memory or auxiliary storage ) differs from primary storage in that it is not directly accessible by the CPU. The computer usually uses its input/output channels to access secondary storage and transfer
6400-403: Was managing 60 billion photos and 1.5 petabytes of storage, adding 220 million photos and 25 terabytes a week. Facebook more recently stated that they were adding 350 million photos a day and were storing 240 billion photos. This could equal as much as 357 petabytes. Cloud storage has become pervasive as many new web and mobile applications choose it as a common way to store binary data . As
#144855