Speedup - Misplaced Pages

In computer architecture , speedup is a number that measures the relative performance of two systems processing the same problem. More technically, it is the improvement in speed of execution of a task executed on two similar architectures with different resources. The notion of speedup was established by Amdahl's law , which was particularly focused on parallel processing . However, speedup can be used more generally to show the effect on performance after any resource enhancement.

#642357

59-429: Speedup can be defined for two different types of quantities: latency and throughput . Latency of an architecture is the reciprocal of the execution speed of a task: where Throughput of an architecture is the execution rate of a task: where Latency is often measured in seconds per unit of execution workload. Throughput is often measured in units of execution workload per second. Another unit of throughput

118-424: A real-time operating system . Note that in software systems , benchmarking against "average" and "median" latency can be misleading because few outlier numbers can distort them. Instead, software architects and software developers should use "99th percentile". In simulation applications, latency refers to the time delay, often measured in milliseconds , between initial input and output clearly discernible to

177-456: A complex and variable network latency profile. Latency limits total throughput in reliable two-way communication systems as described by the bandwidth-delay product . Latency in optical fiber is largely a function of the speed of light . This would equate to a latency of 3.33 μs for every kilometer of path length. The index of refraction of most fiber optic cables is about 1.5, meaning that light travels about 1.5 times as fast in

236-476: A day making the trip or 10000, the latency of the trip would remain the same. From the point of view of flight operations personnel, latency can be entirely different. Consider the staff at the London and New York airports. Only a limited number of planes are able to make the transatlantic journey, so when one lands they must prepare it for the return trip as quickly as possible. It might take, for example: Assuming

295-456: A definite conclusion, it should return false . An incorrect true result may cause the backtrack procedure to miss some valid solutions. The procedure may assume that reject ( P , t ) returned false for every ancestor t of c in the search tree. On the other hand, the efficiency of the backtracking algorithm depends on reject returning true for candidates that are as close to the root as possible. If reject always returns false ,

354-423: A game session are rewarded while slow response times may carry penalties. Due to a delay in transmission of game events, a player with a high latency internet connection may show slow responses in spite of appropriate reaction time . This gives players with low-latency connections a technical advantage. Joel Hasbrouck and Gideon Saar (2011) measure latency to execute financial transactions based on three components:

413-514: A gateway receives multiple packets from different sources heading toward the same destination. Since typically only one packet can be transmitted at a time, some of the packets must queue for transmission, incurring additional delay. Processing delays are incurred while a gateway determines what to do with a newly received packet. Bufferbloat can also cause increased latency that is an order of magnitude or more. The combination of propagation, serialization, queuing, and processing delays often produces

472-421: A greater impact on subsequent choices). One could also allow the next function to choose which variable should be assigned when extending a partial candidate, based on the values of the variables already assigned by it. Further improvements can be obtained by the technique of constraint propagation . In addition to retaining minimal recovery values used in backing up, backtracking implementations commonly keep

531-475: A list of integers x = ( x [1], x [2], …, x [ n ]) , each in some range {1, 2, …, m }, that satisfies some arbitrary constraint (Boolean function) F . For this class of problems, the instance data P would be the integers m and n , and the predicate F . In a typical backtracking solution to this problem, one could define a partial candidate as a list of integers c = ( c [1], c [2], …, c [k]) , for any k between 0 and n , that are to be assigned to

590-514: A relatively quick test of whether it can possibly be completed to a valid solution. It is useless, for example, for locating a given value in an unordered table. When it is applicable, however, backtracking is often much faster than brute-force enumeration of all complete candidates, since it can eliminate many candidates with a single test. Backtracking is an important tool for solving constraint satisfaction problems , such as crosswords , verbal arithmetic , Sudoku , and many other puzzles. It

649-425: A set of partial candidates that, in principle, could be completed in various ways to give all the possible solutions to the given problem. The completion is done incrementally, by a sequence of candidate extension steps. Conceptually, the partial candidates are represented as the nodes of a tree structure , the potential search tree. Each partial candidate is the parent of the candidates that differ from it by

SECTION 10

#1732858944643

708-399: A single extension step; the leaves of the tree are the partial candidates that cannot be extended any further. The backtracking algorithm traverses this search tree recursively , from the root down, in depth-first order . At each node c , the algorithm checks whether c can be completed to a valid solution. If it cannot, the whole sub-tree rooted at c is skipped ( pruned ). Otherwise,

767-500: A solution to the given instance P . The algorithm can be modified to stop after finding the first solution, or a specified number of solutions; or after testing a specified number of partial candidates, or after spending a given amount of CPU time. Examples where backtracking can be used to solve puzzles or problems include: The following is an example where backtracking is used for the constraint satisfaction problem : The general constraint satisfaction problem consists in finding

826-416: A standard chessboard so that no queen attacks any other. In the common backtracking approach, the partial candidates are arrangements of k queens in the first k rows of the board, all in different rows and columns. Any partial solution that contains two mutually attacking queens can be abandoned. Backtracking can be applied only for problems which admit the concept of a "partial candidate solution" and

885-504: A vacuum as it does in the cable. This works out to about 5.0 μs of latency for every kilometer. In shorter metro networks, higher latency can be experienced due to extra distance in building risers and cross-connects. To calculate the latency of a connection, one has to know the distance traveled by the fiber, which is rarely a straight line, since it has to traverse geographic contours and obstacles, such as roads and railway tracks, as well as other rights-of-way. Due to imperfections in

944-415: A variable trail, to record value change history. An efficient implementation will avoid creating a variable trail entry between two successive changes when there is no choice point, as the backtracking will erase all of the changes as a single operation. An alternative to the variable trail is to keep a timestamp of when the last change was made to the variable. The timestamp is compared to the timestamp of

1003-572: A way to detect this situation, at least for some candidates c , without enumerating all those m n -tuples. For example, if F is the conjunction of several Boolean predicates, F = F [1] ∧ F [2] ∧ … ∧ F [ p ] , and each F [ i ] depends only on a small subset of the variables x [1], …, x [ n ] , then the reject procedure could simply check the terms F [ i ] that depend only on variables x [1], …, x [ k ] , and return true if any of those terms returns false . In fact, reject needs only check those terms that do depend on x [ k ], since

1062-398: Is instructions per cycle (IPC) and its reciprocal, cycles per instruction (CPI), is another unit of latency. Speedup is dimensionless and defined differently for each type of quantity so that it is a consistent metric. Speedup in latency is defined by the following formula: where Speedup in latency can be predicted from Amdahl's law or Gustafson's law . Speedup in throughput

1121-464: Is a class of algorithms for finding solutions to some computational problems , notably constraint satisfaction problems , that incrementally builds candidates to the solutions, and abandons a candidate ("backtracks") as soon as it determines that the candidate cannot possibly be completed to a valid solution. The classic textbook example of the use of backtracking is the eight queens puzzle , that asks for all arrangements of eight chess queens on

1180-601: Is best illustrated by the following two examples involving air travel . From the point of view of a passenger, latency can be described as follows. Suppose John Doe flies from London to New York . The latency of his trip is the time it takes him to go from his house in England to the hotel he is staying at in New York. This is independent of the throughput of the London-New York air link – whether there were 100 passengers

1239-513: Is better to use specific software, for example: hping , Netperf or Iperf . However, in a non-trivial network, a typical packet will be forwarded over multiple links and gateways, each of which will not begin to forward the packet until it has been completely received. In such a network, the minimal latency is the sum of the transmission delay of each link, plus the forwarding latency of each gateway. In practice, minimal latency also includes queuing and processing delays. Queuing delay occurs when

SECTION 20

#1732858944643

1298-431: Is defined by the formula: where We are testing the effectiveness of a branch predictor on the execution of a program. First, we execute the program with the standard branch predictor on the processor, which yields an execution time of 2.25 seconds. Next, we execute the program with our modified (and hopefully improved) branch predictor on the same processor, which produces an execution time of 1.50 seconds. In both cases

1357-432: Is essentially one of change of perspective or displacement of objects such as the horizon, which takes some time to build up to discernible amounts after the initial acceleration which caused the displacement. A simulator should, therefore, reflect the real-world situation by ensuring that the motion latency is equal to or less than that of the visual system and not the other way round. Backtracking Backtracking

1416-433: Is often the most convenient technique for parsing , for the knapsack problem and other combinatorial optimization problems. It is also the program execution strategy used in the programming languages Icon , Planner and Prolog . Backtracking depends on user-given " black box procedures " that define the problem to be solved, the nature of the partial candidates, and how they are extended into complete candidates. It

1475-423: Is particularly important that the latency of the motion system not be greater than of the visual system, or symptoms of simulator sickness may result. This is because, in the real world, motion cues are those of acceleration and are quickly transmitted to the brain, typically in less than 50 milliseconds; this is followed some milliseconds later by a perception of change in the visual scene. The visual scene change

1534-467: Is possible to reduce the latency to the length of the longest task. If some steps have prerequisites, it becomes more difficult to perform all steps in parallel. In the example above, the requirement to clean the plane before loading passengers results in a minimum latency longer than any single task. Any mechanical process encounters limitations modeled by Newtonian physics . The behavior of disk drives provides an example of mechanical latency. Here, it

1593-435: Is the cache effect resulting from the different memory hierarchies of a modern computer: in parallel computing, not only do the numbers of processors change, but so does the size of accumulated caches from different processors. With the larger accumulated cache size, more or even all of the working set can fit into caches and the memory access time reduces dramatically, which causes the extra speedup in addition to that from

1652-404: Is the delay between the events generated by the hardware clock and the actual transitions of voltage from high to low or low to high. Many desktop operating systems have performance limitations that create additional latency. The problem may be mitigated with real-time extensions and patches such as PREEMPT RT . On embedded systems, the real-time execution of instructions is often supported by

1711-412: Is the delay between when an audio signal enters and when it emerges from a system. Potential contributors to latency in an audio system include analog-to-digital conversion , buffering , digital signal processing , transmission time , digital-to-analog conversion and the speed of sound in air. Video latency refers to the degree of delay between the time a transfer of a video stream is requested and

1770-446: Is the time seek time for the actuator arm to be positioned above the appropriate track and then rotational latency for the data encoded on a platter to rotate from its current position to a position under the disk read-and-write head . Computers run instructions in the context of a process . In the context of computer multitasking , the execution of the process can be postponed if other processes are also executing. In addition,

1829-465: Is therefore a metaheuristic rather than a specific algorithm – although, unlike many other meta-heuristics, it is guaranteed to find all solutions to a finite problem in a bounded amount of time. The term "backtrack" was coined by American mathematician D. H. Lehmer in the 1950s. The pioneer string-processing language SNOBOL (1962) may have been the first to provide a built-in general backtracking facility. The backtracking algorithm enumerates

Speedup - Misplaced Pages Continue

1888-472: Is typically between 0 and 1. Programs with linear speedup and programs running on a single processor have an efficiency of 1, while many difficult-to-parallelize programs have efficiency such as 1/ln( s ) that approaches 0 as the number of processors A = s increases. In engineering contexts, efficiency curves are more often used for graphs than speedup curves, since In marketing contexts, speedup curves are more often used, largely because they go up and to

1947-510: The medium being used to transfer information. In reliable two-way communication systems, latency limits the maximum rate at which information can be transmitted, as there is often a limit on the amount of information that is in-flight at any given moment. Perceptible latency has a strong effect on user satisfaction and usability in the field of human–machine interaction . Online games are sensitive to latency ( lag ), since fast response times to new events occurring during

2006-402: The speed of light . Therefore, every physical system with any physical separation (distance) between cause and effect will experience some sort of latency, regardless of the nature of the stimulation to which it has been exposed. The precise definition of latency depends on the system being observed or the nature of the simulation. In communications , the lower limit of latency is determined by

2065-401: The system being observed. Lag , as it is known in gaming circles , refers to the latency between the input to a simulation and the visual or auditory response, often occurring because of network delay in online games. Latency is physically a consequence of the limited velocity at which any physical interaction can propagate. The magnitude of this velocity is always less than or equal to

2124-412: The above are done consecutively, minimum plane turnaround time is: However, cleaning, refueling and loading the cargo can be done at the same time. Passengers can only be loaded after cleaning is complete. The reduced latency, then, is: The people involved in the turnaround are interested only in the time it takes for their individual tasks. When all of the tasks are done at the same time, however, it

2183-515: The actual computation. An analogous situation occurs when searching large datasets, such as the genomic data searched by BLAST implementations. There the accumulated RAM from each of the nodes in a cluster enables the dataset to move from disk into RAM thereby drastically reducing the time required by e.g. mpiBLAST to search it. Super-linear speedups can also occur when performing backtracking in parallel: an exception in one thread can cause several other threads to backtrack early, before they reach

2242-448: The actual time that transfer begins. Networks that exhibit relatively small delays are known as low-latency networks, while their counterparts are known as high-latency networks. Any individual workflow within a system of workflows can be subject to some type of operational latency. It may even be the case that an individual system may have more than one type of latency, depending on the type of participant or goal-seeking behavior. This

2301-459: The actual tree times the cost of obtaining and processing each node. This fact should be considered when choosing the potential search tree and implementing the pruning test. In order to apply backtracking to a specific class of problems, one must provide the data P for the particular instance of the problem that is to be solved, and six procedural parameters , root , reject , accept , first , next , and output . These procedures should take

2360-405: The algorithm (1) checks whether c itself is a valid solution, and if so reports it to the user; and (2) recursively enumerates all sub-trees of c . The two tests and the children of each node are defined by user-given procedures. Therefore, the actual search tree that is traversed by the algorithm is only a part of the potential tree. The total cost of the algorithm is the number of nodes of

2419-439: The algorithm will still find all solutions, but it will be equivalent to a brute-force search. The accept procedure should return true if c is a complete and valid solution for the problem instance P , and false otherwise. It may assume that the partial candidate c and all its ancestors in the tree have passed the reject test. The general pseudo-code above does not assume that the valid solutions are always leaves of

Speedup - Misplaced Pages Continue

2478-585: The call next ( P , s ) should return the next sibling of node s , in that order. Both functions should return a distinctive "NULL" candidate, if the requested child does not exist. Together, the root , first , and next functions define the set of partial candidates and the potential search tree. They should be chosen so that every solution of P occurs somewhere in the tree, and no partial candidate occurs more than once. Moreover, they should admit an efficient and effective reject predicate. The pseudo-code above will call output for all candidates that are

2537-533: The destination receiving it), or round-trip delay time (the one-way latency from the source to the destination plus the one-way latency from the destination back to the source). Round-trip latency is more often quoted, because it can be measured from a single point. Many software platforms provide a service called ping that can be used to measure round-trip latency. Ping uses the Internet Control Message Protocol (ICMP) echo request which causes

2596-409: The entry of the order (at the vendor's computer) to the transmission of an acknowledgment (from the vendor's computer). Trading using computers has developed to the point where millisecond improvements in network speeds offer a competitive advantage for financial institutions. Network latency in a packet-switched network is measured as either one-way (the time from the source sending a packet to

2655-399: The exception themselves. Super-linear speedups can also occur in parallel implementations of branch-and-bound for optimization: the processing of one node by one processor may affect the work other processors need to do for the other nodes. Latency (engineering) Latency , from a general point of view, is a time delay between the cause and the effect of some physical change in

2714-425: The execution workload is the same and both architectures are not pipelined nor parallel. Using the speedup formula gives We can also measure speedup in instructions per cycle ( IPC ), which is a throughput and the inverse of CPI. Using the speedup formula gives We achieve the same 1.5x speedup, though we measured different quantities. Let S be the speedup of execution of a task and s the speedup of execution of

2773-421: The execution workload is the same. Using our speedup formula, we know Our new branch predictor has provided a 1.5x speedup over the original. We can also measure speedup in cycles per instruction (CPI) which is a latency. First, we execute the program with the standard branch predictor, which yields a CPI of 3. Next, we execute the program with our modified branch predictor, which yields a CPI of 2. In both cases

2832-415: The fiber, light degrades as it is transmitted through it. For distances of greater than 100 kilometers, amplifiers or regenerators are deployed. Latency introduced by these components needs to be taken into account. Satellites in geostationary orbits are far enough away from Earth that communication latency becomes significant – about a quarter of a second for a trip from one ground-based transmitter to

2891-429: The first k variables x [1], x [2], …, x [ k ] . The root candidate would then be the empty list (). The first and next procedures would then be Here length ( c ) is the number of elements in the list c . The call reject ( P , c ) should return true if the constraint F cannot be satisfied by any list of n integers that begins with the k elements of c . For backtracking to be effective, there must be

2950-412: The instance data P as a parameter and should do the following: The backtracking algorithm reduces the problem to the call backtrack ( P , root ( P )), where backtrack is the following recursive procedure: The reject procedure should be a Boolean-valued function that returns true only if it is certain that no possible extension of c is a valid solution for P . If the procedure cannot reach

3009-471: The operating system can schedule when to perform the action that the process is commanding. For example, suppose a process commands that a computer card's voltage output be set high-low-high-low and so on at a rate of 1000 Hz. The operating system schedules the process for each transition (high-low or low-high) based on a hardware clock such as the High Precision Event Timer . The latency

SECTION 50

#1732858944643

3068-424: The part of the task that benefits from the improvement of the resources of an architecture. Linear speedup or ideal speedup is obtained when S = s . When running a task with linear speedup, doubling the local speedup doubles the overall speedup. As this is ideal, it is considered very good scalability . Efficiency is a metric of the utilization of the resources of the improved system defined as Its value

3127-440: The potential search tree. In other words, it admits the possibility that a valid solution for P can be further extended to yield other valid solutions. The first and next procedures are used by the backtracking algorithm to enumerate the children of a node c of the tree, that is, the candidates that differ from c by a single extension step. The call first ( P , c ) should yield the first child of c , in some order; and

3186-480: The recipient to send the received packet as an immediate response, thus it provides a rough way of measuring round-trip delay time. Ping cannot perform accurate measurements, principally because ICMP is intended only for diagnostic or control purposes, and differs from real communication protocols such as TCP . Furthermore, routers and internet service providers might apply different traffic shaping policies to different protocols. For more accurate measurements it

3245-423: The right and thus appear better to the less-informed. Sometimes a speedup of more than A when using A processors is observed in parallel computing , which is called super-linear speedup . Super-linear speedup rarely happens and often confuses beginners, who believe the theoretical maximum speedup should be A when A processors are used. One possible reason for super-linear speedup in low-level computations

3304-403: The satellite and back to another ground-based transmitter; close to half a second for two-way communication from one Earth station to another and then back to the first. Low Earth orbit is sometimes used to cut this delay, at the expense of more complicated satellite tracking on the ground and requiring more satellites in the satellite constellation to ensure continuous coverage. Audio latency

3363-449: The simulator trainee or simulator subject. Latency is sometimes also called transport delay . Some authorities distinguish between latency and transport delay by using the term latency in the sense of the extra time delay of a system over and above the reaction time of the vehicle being simulated, but this requires detailed knowledge of the vehicle dynamics and can be controversial. In simulators with both visual and motion systems, it

3422-420: The terms that depend only on x [1], …, x [ k − 1] will have been tested further up in the search tree. Assuming that reject is implemented as above, then accept ( P , c ) needs only check whether c is complete, that is, whether it has n elements. It is generally better to order the list of variables so that it begins with the most critical ones (i.e. the ones with fewest value options, or which have

3481-400: The time it takes for information to reach the trader, execution of the trader's algorithms to analyze the information and decide a course of action, and the generated action to reach the exchange and get implemented. Hasbrouck and Saar contrast this with the way in which latencies are measured by many trading venues that use much more narrow definitions, such as the processing delay measured from

#642357