Loop Tiling for Parallelism by Jingling Xue (auth.)

Posted by

By Jingling Xue (auth.)

Loop tiling, as some of the most very important compiler optimizations, is helpful for either parallel machines and uniprocessors with a reminiscence hierarchy. This ebook explores using loop tiling for lowering communique price and enhancing parallelism for disbursed reminiscence machines. the writer presents mathematical foundations, investigates loop permutability within the framework of nonsingular loop adjustments, discusses the mandatory machineries required, and offers state of the art effects for locating verbal exchange- and time-minimal tiling offerings. during the e-book, theorems and algorithms are illustrated with a number of examples and diagrams. The thoughts offered in Loop Tiling for Parallelism may be tailored to paintings for a cluster of workstations, and also are without delay appropriate to shared-memory machines as soon as the machines are modeled as BSP (Bulk Synchronous Parallel) machines.
beneficial properties and key themes:

  • targeted overview of the mathematical foundations, together with convex polyhedra and cones;
  • Self-contained therapy of nonsingular loop modifications, code iteration, and entire loop permutability;
  • Tiling loop nests by means of rectangles and parallelepipeds, together with their mathematical definition, dependence research, legality attempt, and code iteration;
  • a whole suite of recommendations for producing SPMD code for a tiled loop nest;
  • up to date effects on tile measurement and form choice for decreasing verbal exchange and enhancing parallelism;
  • End-of-chapter references for additional examining.

Researchers and practitioners interested in optimizing compilers and scholars in complicated desktop structure experiences will locate this a lucid and well-presented reference paintings with various citations to unique sources.

Show description

Read Online or Download Loop Tiling for Parallelism PDF

Best design & architecture books

Web caching and its applications

The decade has noticeable super development in utilization of the realm vast net. internet caching is a expertise aimed toward decreasing the transmission of redundant community site visitors and bettering entry to the internet. the foremost concept in net caching is to cache often- accessed content material in order that it can be used profitably later.

Quality of experience for multimedia : application to content delivery network architecture

According to a convergence of community applied sciences, the subsequent iteration community (NGN) is being deployed to hold prime quality video and voice facts. actually, the convergence of community applied sciences has been pushed through the converging wishes of end-users. The perceived end-to-end caliber is among the major objectives required via clients that has to be assured by means of the community operators and the net provider services, via producer apparatus.

Machine Learning Control – Taming Nonlinear Dynamics and Turbulence

This can be the 1st textbook on a often appropriate regulate procedure for turbulence and different advanced nonlinear platforms. The technique of the booklet employs strong tools of desktop studying for optimum nonlinear keep watch over legislation. This desktop studying keep watch over (MLC) is inspired and exact in Chapters 1 and a couple of.

Extra info for Loop Tiling for Parallelism

Example text

X ~ O} = If a cone is given as a polyhedral cone: C = {xIAx~O} its dual can be finitely generated as follows: C* = cone{ii\, ... , am} where aI, ... , am are the m rows of A. 9. 8. 3 Decomposition of Cones A cone has one vertex, which is the origin every cone C can be uniquely represented as: C = 0, iff it is pointed. L n C By construction, Q is pointed and can thus be expressed as: Q = cone{TI""'~} where TI,' .. ,~are the p extremal rays of Q, which are unique up to multiplication by a positive scalar.

The loops in the transformed loop nest are referred to as the transformed loops. The transformed loop nest of the form do i~ = L~ + 8~, UL step~ do i~ = L~ + 8~, CD =T-l S(i 1 , ... ,in) U~, step~ CD can be constructed using the following algorithm from (Xue, 1994): 1. Apply Fourier-Motzkin elimination to BT-1i' ~ b to generate the loop bounds for each loop variable. Set L~ as a maxima of all lower bounds of i~ and U~ as a minima of all upper bounds of i~. The ceiling and floor operations are introduced wherever appropriate to ensure that a loop bound always evaluates to integer values.

Fourier-Motzkin elimination is also exact in

Download PDF sample

Rated 4.51 of 5 – based on 37 votes