MsQuic - Misplaced Pages

MsQuic is a free and open source implementation of the IETF QUIC protocol written in C that is officially supported on the Microsoft Windows (including Server ), Linux , and Xbox platforms. The project also provides libraries for macOS and Android , which are unsupported. It is designed to be a cross-platform general purpose QUIC library optimized for client and server applications benefitting from maximal throughput and minimal latency. By the end of 2021 the codebase had over 200,000 lines of production code, with 50,000 lines of "core" code, sharable across platforms. The source code is licensed under MIT License and available on GitHub .

#4995

54-711: Among its features are, in part, support for asynchronous IO , receive-side scaling (RSS), UDP send and receive coalescing, and connection migrations that persist connections between client and server to overcome client IP or port changes, such as when moving throughout mobile networks. Both the HTTP/3 and SMB stacks of Microsoft Windows leverage MsQuic, with msquic.sys providing kernel-mode functionality. Being dependent upon Schannel for TLS 1.3 , kernel mode therefore does not support 0-RTT. User-mode programs can implement MsQuic, with support 0-RTT, through msquic.dll , which can be built from source code or downloaded as

108-555: A shared library through binary releases on the repository. Its support for the Microsoft Game Development Kit makes MsQuic possible on both Xbox and Windows. This article about software created, produced or developed by Microsoft is a stub . You can help Misplaced Pages by expanding it . Asynchronous IO In computer science, asynchronous I/O (also non-sequential I/O ) is a form of input/output processing that permits other processing to continue before

162-439: A disk operation that takes ten milliseconds to perform, a processor that is clocked at one gigahertz could have performed ten million instruction-processing cycles. A simple approach to I/O would be to start the access and then wait for it to complete. But such an approach, called synchronous I/O or blocking I/O , would block the progress of a program while the communication is in progress, leaving system resources idle. When

216-462: A process that does not use asynchronous I/O or that uses one of the other forms, hampering code reuse . Does not require additional special synchronization mechanisms or thread-safe libraries, nor are the textual (code) and time (event) flows separated. Available in VMS and AmigaOS (often used in conjunction with a completion port). Bears many of the characteristics of the completion queue method, as it

270-466: A program makes many I/O operations (such as a program mainly or largely dependent on user input ), this means that the processor can spend almost all of its time idle waiting for I/O operations to complete. Alternatively, it is possible to start the communication and then perform processing that does not require that the I/O be completed. This approach is called asynchronous input/output. Any task that depends on

324-419: A queue at the speed of the computer, then retrieved and printed at the speed of the printer. Multiple processes can write documents to the spool without waiting, and can then perform other tasks, while the "spooler" process operates the printer. For example, when a large organization prepares payroll cheques, the computation takes only a few minutes or even seconds, but the printing process might take hours. If

378-562: A queue, until the first callback returns. Light-weight processes (LWPs) or threads are available in most modern operating systems. Like the process method, but with lower overhead and without the data isolation that hampers coordination of the flows. Each LWP or thread itself uses traditional blocking synchronous I/O, which simplifies programming logic; this is a common paradigm used in many programming languages including Java and Rust. Multithreading needs to use kernel-provided synchronization mechanisms and thread-safe libraries. This method

432-460: A small number of printers. They are also valuable when a single job can produce multiple documents. Depending on the configuration, banner pages might be generated on each client computer, on a centralized print server, or by the printer itself. On printers using fanfold continuous forms a leading banner page would often be printed twice, so that one copy would always be face-up when the jobs were separated. The page might include lines printed over

486-524: Is also required, as it is very common for multiple potential sources of interrupts to share a common interrupt signal line, in which case polling is used within the device driver to resolve the actual source. (This resolution time also contributes to an interrupt system's performance penalty. Over the years a great deal of work has been done to try to minimize the overhead associated with servicing an interrupt. Current interrupt systems are rather lackadaisical when compared to some highly tuned earlier ones, but

540-414: Is available; see batch processing . The spool itself refers to the sequence of jobs, or the storage area where they are held. In many cases, the spooler is able to drive devices at their full rated speed with minimal impact on other processing. Spooling is a combination of buffering and queueing . Nowadays, the most common use of spooling is printing: documents formatted for printing are stored in

594-892: Is designed to maximize CPU utilization and throughput by offloading most I/O onto a coprocessor. The coprocessor has onboard DMA, handles device interrupts, is controlled by the main CPU, and only interrupts the main CPU when it's truly necessary. This architecture also supports so-called channel programs that run on the channel processor to do heavy lifting for I/O activities and protocols. Available in Windows Server 2012 and Windows 8 . Optimized for applications that process large numbers of small messages to achieve higher I/O operations per second with reduced jitter and latency. The vast majority of general-purpose computing hardware relies entirely upon two methods of implementing asynchronous I/O: polling and interrupts. Usually both methods are used together,

SECTION 10

#1732927323005

648-452: Is essentially a completion queue of depth one. To simulate the effect of queue 'depth', an additional event flag is required for each potential unprocessed (but completed) event, or event information can be lost. Waiting for the next available event in such a clump requires synchronizing mechanisms that may not scale well to larger numbers of potentially parallel events. Available in mainframes by IBM , Groupe Bull , and Unisys . Channel I/O

702-511: Is issued asynchronously, and when it is completed a signal ( interrupt ) is generated. As in low-level kernel programming, the facilities available for safe use within the signal handler are limited, and the main flow of the process could have been interrupted at nearly any point, resulting in inconsistent data structures as seen by the signal handler. The signal handler is usually not able to issue further asynchronous I/O by itself. The signal approach, though relatively simple to implement within

756-480: Is modeled after the BSD implementation. A variation on the theme of polling, a select loop uses the select system call to sleep until a condition occurs on a file descriptor (e.g., when data is available for reading), a timeout occurs, or a signal is received (e.g., when a child process dies). By examining the return parameters of the select call, the loop finds out which file descriptor has changed and executes

810-458: Is not most suitable for extremely large-scale applications like web servers due to the large numbers of threads needed. This approach is also used in the Erlang programming language runtime system. The Erlang virtual machine uses asynchronous I/O using a small pool of only a few threads or sometimes just one process, to handle I/O from up to millions of Erlang processes. I/O handling in each process

864-415: Is possible that some may provide all of them. Available in early Unix. In a multitasking operating system, processing can be distributed across different processes, which run independently, have their own memory, and process their own I/O flows; these flows are typically connected in pipelines . Processes are fairly expensive to create and maintain, so this solution only works well if the set of processes

918-541: Is required to prevent this. When exposing asynchronous I/O to applications there are a few broad classes of implementation. The form of the API provided to the application does not necessarily correspond with the mechanism actually provided by the operating system; emulations are possible. Furthermore, more than one method may be used by a single application, depending on its needs and the desires of its programmer(s). Many operating systems provide more than one of these mechanisms, it

972-546: Is small and relatively stable. It also assumes that the individual processes can operate independently, apart from processing each other's I/O; if they need to communicate in other ways, coordinating them can become difficult. An extension of this approach is dataflow programming , which allows more complicated networks than just the chains that pipes support. Variations: Polling provides non-blocking synchronous API which may be used to implement some asynchronous API. Available in traditional Unix and Windows . Its major problem

1026-627: Is still found in the documentation for email and Usenet software. Peripheral devices have always been much slower than core processing units. This was an especially severe problem for early mainframes . For example, a job which read punched cards or generated printed output directly was forced to run at the speed of the slow mechanical devices. The first spooling programs, such as IBM's "SPOOL System" (7070-IO-076) copied data from punched cards to magnetic tape, and from tape back to punched cards and printers. Hard disks , which offered faster I/O speeds and support for random access , started to replace

1080-464: Is that it can waste CPU time polling repeatedly when there is nothing else for the issuing process to do, reducing the time available for other processes. Also, because a polling application is essentially single-threaded it may be unable to fully exploit I/O parallelism that the hardware is capable of. Available in BSD Unix , and almost anything else with a TCP/IP protocol stack that either utilizes or

1134-438: Is the problem with using polling in any form to synthesize a different form of asynchronous I/O. Every CPU cycle that is a poll is wasted, and lost to overhead rather than accomplishing a desired task. Every CPU cycle that is not a poll represents an increase in latency of reaction to pending I/O. Striking an acceptable balance between these two opposing forces is difficult. (This is why hardware interrupt systems were invented in

SECTION 20

#1732927323005

1188-431: Is the same example with Async/await : Here is the example with Reactor pattern : Spooling In computing , spooling is a specialized form of multi-programming for the purpose of copying data between different devices. In contemporary systems, it is usually used for mediating between a computer application and a slow peripheral , such as a printer . Spooling allows programs to "hand off" work to be done by

1242-554: Is written mostly using blocking synchronous I/O. This way high performance of asynchronous I/O is merged with simplicity of normal I/O (c.f. the Actor model ). Many I/O problems in Erlang are mapped to message passing, which can be easily processed using built-in selective receive. Fibers / Coroutines can be viewed as a similarly lightweight approach to do asynchronous I/O outside of the Erlang runtime system, although they do not provide exactly

1296-457: The signal method as it is fundamentally the same thing, though rarely recognized as such. The difference is that each I/O request usually can have its own completion function, whereas the signal system has a single callback. On the other hand, a potential problem of using callbacks is that stack depth can grow unmanageably, as an extremely common thing to do when one I/O is finished is to schedule another. If this should be satisfied immediately,

1350-494: The windowing system and a few for open files, but becomes more of a problem as the number of potential event sources grows, and can hinder development of many-client server applications, as in the C10k problem ; other asynchronous methods may be noticeably more efficient in such cases. Some Unixes provide system-specific calls with better scaling; for example, epoll in Linux (that fills

1404-427: The I/O having completed (this includes both using the input values and critical operations that claim to assure that a write operation has been completed) still needs to wait for the I/O operation to complete, and thus is still blocked, but other processing that does not have a dependency on the I/O operation can continue. Many operating system functions exist to implement asynchronous I/O at many levels. In fact, one of

1458-504: The I/O operation has finished. A name used for asynchronous I/O in the Windows API is overlapped I/O . Input and output (I/O) operations on a computer can be extremely slow compared to the processing of data. An I/O device can incorporate mechanical devices that must physically move, such as a hard drive seeking a track to read or write; this is often orders of magnitude slower than the switching of electric current. For example, during

1512-483: The JVM. The JVM may poll (or take an interrupt) periodically to institute an internal flow of control change, effecting the appearance of multiple simultaneous processes, at least some of which presumably exist in order to perform asynchronous I/O. (Of course, at the microscopic level the parallelism may be rather coarse and exhibit some non-ideal characteristics, but on the surface it will appear to be as desired.) That, in fact,

1566-447: The OS, brings to the application program the unwelcome baggage associated with writing an operating system's kernel interrupt system. Its worst characteristic is that every blocking (synchronous) system call is potentially interruptible; the programmer must usually incorporate retry code at each call. Available in the classic Mac OS , VMS and Windows . Bears many of the characteristics of

1620-463: The Overflow bit outside of the device driver!) Using only these two tools (polling, and interrupts), all the other forms of asynchronous I/O discussed above may be (and in fact, are) synthesized. In an environment such as a Java virtual machine (JVM), asynchronous I/O can be synthesized even though the environment the JVM is running in may not offer it at all. This is due to the interpreted nature of

1674-424: The application to run at the speed of the CPU while operating peripheral devices at their full rated speed. A batch processing system uses spooling to maintain a queue of ready-to-run tasks, which can be started as soon as the system has the resources to process them. Some store and forward messaging systems, such as uucp , used "spool" to refer to their inbound and outbound message queues, and this terminology

MsQuic - Misplaced Pages Continue

1728-453: The appropriate code. Often, for ease of use, the select loop is implemented as an event loop , perhaps using callback functions ; the situation lends itself particularly well to event-driven programming . While this method is reliable and relatively efficient, it depends heavily on the Unix paradigm that " everything is a file "; any blocking I/O that does not involve a file descriptor will block

1782-469: The available buffer or in other ways unsuitable to the recipient. The select loop does not reach the ultimate system efficiency possible with, say, the completion queues method, because the semantics of the select call, allowing as it does for per-call tuning of the acceptable event set, consumes some amount of time per invocation traversing the selection array. This creates little overhead for user applications that might have open one file descriptor for

1836-502: The balance depends heavily upon the design of the hardware and its required performance characteristics. ( DMA is not itself another independent method, it is merely a means by which more work can be done per poll or interrupt.) Pure polling systems are entirely possible, small microcontrollers (such as systems using the PIC ) are often built this way. CP/M systems could also be built this way (though rarely were), with or without DMA. Also, when

1890-705: The complexity of interrupt handling from the user. Spooling was one of the first forms of multitasking designed to exploit asynchronous I/O. Finally, multithreading and explicit asynchronous I/O APIs within user processes can exploit asynchronous I/O further, at the cost of extra software complexity. Asynchronous I/O is used to improve energy efficiency, and in some cases, throughput. However, it can have negative effects on latency and throughput in some cases. Forms of I/O and examples of POSIX functions: All forms of asynchronous I/O open applications up to potential resource conflicts and associated failure. Careful programming (often using mutual exclusion , semaphores , etc.)

1944-441: The first callback is not 'unwound' off the stack before the next one is invoked. Systems to prevent this (like 'mid-ground' scheduling of new work) add complexity and reduce performance. In practice, however, this is generally not a problem because the new I/O will itself usually return as soon as the new I/O is started allowing the stack to be 'unwound'. The problem can also be prevented by avoiding any further callbacks, by means of

1998-404: The first place.) The trick to maximize efficiency is to minimize the amount of work that has to be done upon reception of an interrupt in order to awaken the appropriate application. Secondarily (but perhaps no less important) is the method the application itself uses to determine what it needs to do. Particularly problematic (for application efficiency) are the exposed polling methods, including

2052-413: The fold, which would be visible along the edge of a stack of printed output, allowing the operator to easily separate the jobs. Some systems would also print a banner page at the end of each job, assuring users that they had collected all of their printout. Spooling is also used to mediate access to punched card readers and punches, magnetic tape drives, and other slow, sequential I/O devices. It allows

2106-409: The general increase in hardware performance has greatly mitigated this.) Hybrid approaches are also possible, wherein an interrupt can trigger the beginning of some burst of asynchronous I/O, and polling is used within the burst itself. This technique is common in high-speed device drivers, such as network or disk, where the time lost in returning to the pre-interrupt task is greater than the time until

2160-446: The main functions of all but the most rudimentary of operating systems is to perform at least some form of basic asynchronous I/O, though this may not be particularly apparent to the user or the programmer. In the simplest software solution, the hardware device status is polled at intervals to detect whether the device is ready for its next operation. (For example, the CP/M operating system

2214-419: The next required servicing. (Common I/O hardware in use these days relies heavily upon DMA and large data buffers to make up for a relatively poorly-performing interrupt system. These characteristically use polling inside the driver loops, and can exhibit tremendous throughput. Ideally the per-datum polls are always successful, or at most repeated a small number of times.) At one time this sort of hybrid approach

MsQuic - Misplaced Pages Continue

2268-678: The payroll program printed cheques directly, it would be unable to proceed to other computations until all the cheques were printed. Similarly, before spooling was added to PC operating systems, word processors were unable to do anything else, including interact with the user, while printing. Spooler or print management software often includes a variety of related features, such as allowing priorities to be assigned to print jobs, notifying users when their documents have been printed, distributing print jobs among several printers, selecting appropriate paper for each document, etc. A print server applies spooling techniques to allow many computers to share

2322-417: The peripheral and then proceed to other tasks, or to not begin until input has been transcribed. A dedicated program, the spooler , maintains an orderly sequence of jobs for the peripheral and feeds it data at its own rate. Conversely, for slow input peripherals, such as a card reader , a spooler can maintain a sequence of computational jobs waiting for data, starting each job when all of the relevant input

2376-423: The process. The select loop also relies on being able to involve all I/O in the central select call; libraries that conduct their own I/O are particularly problematic in this respect. An additional potential problem is that the select and the I/O operations are still sufficiently decoupled that select's result may effectively be a lie: if two processes are reading from a single file descriptor (arguably bad design)

2430-432: The processor's fetch or store hardware and reducing the programmed loop to two operations. (In effect using the processor itself as a DMA engine.) The 6502 processor offered an unusual means to provide a three-element per-datum loop, as it had a hardware pin that, when asserted, would cause the processor's Overflow bit to be set directly. (Obviously one would have to take great care in the hardware design to avoid overriding

2484-563: The return selection array with only those event sources on which an event has occurred), kqueue in FreeBSD , and event ports (and /dev/poll ) in Solaris . SVR3 Unix provided the poll system call. Arguably better-named than select , for the purposes of this discussion it is essentially the same thing. SVR4 Unixes (and thus POSIX ) offer both calls. Available in BSD and POSIX Unix. I/O

2538-520: The same guarantees as Erlang processes. Available in Microsoft Windows , Solaris , AmigaOS , DNIX and Linux (using io_uring , available on 5.1 and above). I/O requests are issued asynchronously, but notifications of completion are provided via a synchronizing queue mechanism in the order they are completed. Usually associated with a state-machine structuring of the main process ( event-driven programming ), which can bear little resemblance to

2592-450: The same printer or group of printers. Print spoolers can be configured to add a banner page , also called a burst page , job sheet , or printer separator , to the beginning and end of each document and job. These separate documents from each other, identify each document (e.g. with its title ) and often also state who printed it (e.g. by username or job name). Banner pages are valuable in office environments where many people share

2646-403: The select may indicate the availability of read data that has disappeared by the time that the read is issued, thus resulting in blocking; if two processes are writing to a single file descriptor (not that uncommon) the select may indicate immediate writability yet the write may still block, because a buffer has been filled by the other process in the interim, or due to the write being too large for

2700-746: The select/poll mechanisms. Though the underlying I/O events they are interested in are in all likelihood interrupt-driven, the interaction to this mechanism is polled and can consume a large amount of time in the poll. This is particularly true of the potentially large-scale polling possible through select (and poll). Interrupts map very well to Signals, Callback functions, Completion Queues, and Event flags, such systems can be very efficient. The following examples show three approaches to reading I/O. The objects and functions are abstract. 1. Blocking, synchronous: 2. Blocking and non-blocking, synchronous: (here IO.poll() blocks for up to 5 seconds, but device.read() doesn't) 3. Non-blocking, asynchronous: Here

2754-511: The use of magnetic tape for spooling in the middle 1960s, and by the 1970s had largely replaced it altogether. Because the unit record equipment on IBM mainframes of the early 1960s was slow, it was common for larger systems to use a small offline computer such as an IBM 1401 instead of spooling. The term "spool" may originate with the Simultaneous Peripheral Operations On-Line (SPOOL) software; this derivation

SECTION 50

#1732927323005

2808-502: The utmost performance is necessary for only a few tasks, at the expense of any other potential tasks, polling may also be appropriate as the overhead of taking interrupts may be unwelcome. (Servicing an interrupt requires time [and space] to save at least part of the processor state, along with the time required to resume the interrupted task.) Most general-purpose computing systems rely heavily upon interrupts. A pure interrupt system may be possible, though usually some component of polling

2862-456: Was built this way. Its system call semantics did not require any more elaborate I/O structure than this, though most implementations were more complex, and thereby more efficient.) Direct memory access (DMA) can greatly increase the efficiency of a polling-based system, and hardware interrupts can eliminate the need for polling entirely. Multitasking operating systems can exploit the functionality provided by hardware interrupts, whilst hiding

2916-416: Was common in disk and network drivers where there was not DMA or significant buffering available. Because the desired transfer speeds were faster even than could tolerate the minimum four-operation per-datum loop (bit-test, conditional-branch-to-self, fetch, and store), the hardware would often be built with automatic wait state generation on the I/O device, pushing the data ready poll out of software and onto

#4995