University of Nizhni Novgorod

Faculty of Computational Mathematics & Cybernetics

Teaching Course: CS338. Introduction to Parallel Programming


Exercises

Introduction to Parallel Computer Systems

1.        Give some additional examples of parallel computer systems.

2.        Consider some additional methods of computer systems classification.

3.        Consider the ways of cache coherence provision in the systems with common shared memory.

4.        Make a review of the software libraries which provide carrying out data transmission operations for the systems with distributed memory. 

5.        Consider the binary tree data transmission network topology.

6.        Choose efficiently realized problems for each data transmission network topology type.

Parallel Computation Modeling and Analysis

1.       Develop a model and evaluate speedup and efficiency of the parallel computations:

·         For the problem of the scalar product of two vectors

·         For the problem of choosing the maximum and minimum values for the given set of numeric values

·         For the problem of finding the mean value for the given set of numeric values

2.       Evaluate according the Amdahl's law the maximum attainable speedup for the problems given in  11.1

3.       Evaluate the scalability speedup for the problems in 11.1

4.       Construct the isoefficiency function for the problems given in  11.1

5.       Work out a model and make a complete analysis of parallel computation efficiency (speedup, efficiency, maximum attainable efficiency, scalability speedup, isoefficiency function) for the problem of matrix – vector multiplication.

The Estimation of the Communication Complexity of Parallel Algorithms

1.        Develop the execution algorithms of the basic data transmission operations for the network topology in the form of a three-dimensional grid.

2.        Develop the execution algorithm of the basic data transmission operations for the network topology in the form of a binary tree.

3.        Use model B from subsection 3.4 for the estimation of the time complexity of data transmission operations. Compare the obtained results.

4.        Use model C from subsection 3.4 for the estimation of the time complexity of data transmission operations. Compare the obtained results.

5.        Develop the algorithms of logical presentation of the binary tree for various physical network topologies.

Parallel Programming with MPI

1.        Develop a program for finding the minimum (maximum) value of the vector elements.

2.        Develop a program for computing the scalar product of two vectors.

3.        Develop a program, where two processes repeatedly exchange messages of n byte length. Carry out the experiments and estimate the dependence of the data operation execution time against the message length. Compare it to the theoretical estimations created according to the Hockney model.

4.        Prepare the variants of the previously developed programs with different modes of data transmission. Compare the execution time of data transmission operations in cases of different modes.

5.        Prepare the variants of the previously developed programs using non-blocking method of data transmission operations. Estimate the necessary amount of the computational operations, which is needed to superpose completely data transmission and computations. Develop the program, which has no computation delays caused by waiting for the transmitted data.

6.        Do the exercises 3 and use the operation of simultaneous data transmission and reception. Compare the results of the computational experiments.

7.        Develop a sample program for each collective operation available in MPI.

8.        Develop the realizations of collective operations using point-to-point exchanges among processes. Carry out the computational experiments and compare the execution time of the developed programs to the functions of MPI for collective operations.

9.        Develop a program, carry out the experiments and compare the results for different algorithms of data gathering, data processing and data broadcasting (the function MPI_Allreduce).

10.     Develop a sample program for each method of constructing derived data types available in MPI.

11.     Develop a sample program using data packing and unpacking functions. Carry out the experiments and compare the results to the results obtained in case of the use of the derived data types.

12.     Develop the derived data types for the rows, columns and diagonals of matrices.

13.     Develop a sample program for each of the discussed functions of managing communicators and processes.

14.     Develop a program for presenting a set of processes as a rectangular grid. Create communicators for each row and each column of the processes. Perform a collective operation for all the processes and one of the created communicators. Compare the operation execution time.

15.     Study by yourself and develop sample programs for transmitting data among the processes of different communicators.

16.     Develop a sample program for the Cartesian topology.

17.     Develop a sample program for a graph topology.

18.     Develop subprograms for creating a set of additional virtual topologies (a star, a tree etc.).

Parallel Method Development

1.        Perform the implementation of a cascade scheme of computing a sum of numerical values (see the Section 2) and compare the execution time of this program and and the function MPI_Bcast of MPI.

2.         Implement the discussed methods of multi-node gather operation and compare their execution time. Associate the obtained time characteristics with the available theoretical estimations. Compare them with the time of executing the function MPI_Allgather of MPI.

3.        Design a scheme of parallel computations using the methodology described in the Section for designing and developing parallel methods:

·         For the problem of searching the maximum value among minimal elements of matrix rows (such calculations occur in solving the problems of matrix games):

(pay special attention to the situation when the number of processors exceeds the matrix order, i.e. p>N),

·         For the problem of computing a definite integral using the method of rectangles:

(the description of this numerical integration method is given, for instance, in Kahaner, Moler and Nash (1988))

4.        Develop parallel programs for the proposed parallel methods in the Exercise 3.

5.        Design the scheme of parallel computations for the problem of matrix-vector multiplication using the methodology of designing and developing parallel methods, described in Section.

Parallel Methods for Matrix-Vector Multiplication

1.        Develop the implementation of the parallel algorithm based on the columnwise striped matrix decomposition. Estimate theoretically the algorithm execution time. Carry out the computational experiments. Compare the results of the experiments with the theoretical estimations.

2.        Develop the implementation of the parallel algorithm based on the checkerboard block decomposition. Estimate theoretically the algorithm execution time. Carry out the computational experiments. Compare the results of the experiments with the theoretical estimations.

Parallel Methods for Matrix Multiplication

1.        Develop the implementation of two block-striped algorithms of matrix multiplication. Compare the execution time of these algorithms.

2.        Develop the Cannon algorithm implementation. Formulate the theoretical estimation of the algorithm execution time. Execute the computational experiments. Compare the experimental results with the theoretical estimations.

3.        Develop the implementation of the checkerboard block algorithm of matrix multiplication, which may be carried out for rectangular processor grids.

4.        Develop the implementation of matrix multiplication using the previously developed programs of matrix-vector multiplication.

Parallel Methods for Solving Linear Equation Systems

1.        Analyze the efficiency of the parallel computations for the Gaussian elimination and the back substitution stages of the Gauss method separately. At which stage is there the greater decrease of parallel efficiency?

2.        Develop the parallel variant of the Gauss method based on the columnwise block-striped matrix decomposition. Estimate theoretically the execution time of the algorithm. Carry out the computational experiments. Compare the experimental results with the theoretical estimations obtained previously.

3.        Develop the implementation of the parallel conjugate gradient method. Estimate theoretically the execution time of the algorithm. Carry out the computational experiments. Compare the experimental results with the theoretical estimations obtained previously.

4.        Develop the parallel variant of the Jacobi and Seidel methods of solving linear equation systems (see, for instance, Kumar, et al. (1994), Quinn (2004)). Estimate theoretically the execution time of the algorithm. Carry out the computational experiments. Compare the experimental results with the theoretical estimations obtained previously.

Parallel Methods for Data Sorting

1.        Develop the implementation of the parallel bubble sort algorithm. Carry out the necessary experiments. Formulate the theoretical estimations of parallel computation efficiency. Compare the obtained theoretical estimations with the results of the experiments.

2.        Develop the implementation of the parallel quick sort algorithm using one of the described schemes. Determine the values of the latency, the network bandwidth and the execution time of the basic sorting operation. Determine the estimations of the speedup and efficiency characteristics for the method. Compare the obtained theoretical estimations with the results of the experiments.

3.        Design a parallel computation scheme for the well-known merge sort algorithm (a detailed description of the method may be found, for instance, in Knuth (1997) or Cormen et al. (2001)). Develop the implementation of the developed algorithm and formulate the necessary theoretical estimations of the method complexity.

Parallel Methods for Graph Calculations

1.        Develop the Floyd parallel algorithm using the given program code. Execute the computational experiments. Formulate the theoretical estimations. Compare the theoretical estimations with the experimental data.

2.        Develop software implementation for the Prim parallel algorithm. Execute the computational experiments. Formulate the theoretical estimations. Compare the theoretical estimations with the experimental data.

3.        Develop software implementation for the Kernighan-Lin algorithm. Estimate the possibilities of the algorithm parallelizing.

Parallel methods for partial differential equations

1.        Develop the first and the second variants of the Gauss-Seidel parallel algorithm. Carry out the computational experiments and compare the execution time.

2.        Implement the parallel algorithm based on the wavefront computation scheme. Develop the implementation of the parallel algorithm, in which the block-oriented approach to the wavefront processing method is applied. Carry out the computational experiments  and compare the execution time.

3.        Develop the implementation of the parallel computation job queue for shared memory systems. It is necessary to provide processing the neighboring blocks on the same processors.