![]() |
University of Nizhni NovgorodFaculty of Computational Mathematics & CyberneticsTeaching Course: CS338. Introduction to Parallel Programming |
![]() |
Exercises
Introduction
to Parallel Computer Systems
1.
Give some
additional examples of parallel computer systems.
2.
Consider
some additional methods of computer systems classification.
3.
Consider
the ways of cache coherence provision in the systems with common shared memory.
4.
Make a
review of the software libraries which provide carrying out data transmission
operations for the systems with distributed memory.
5.
Consider
the binary tree data transmission network topology.
6.
Choose
efficiently realized problems for each data transmission network topology type.
Parallel Computation Modeling and
Analysis
1.
Develop a
model and evaluate speedup and efficiency of the parallel computations:
·
For the
problem of the scalar product of two vectors
·
For the
problem of choosing the maximum and minimum values for the given set of numeric
values
·
For the
problem of finding the mean value for the given set of numeric values
2.
Evaluate
according the Amdahl's law the maximum attainable speedup for the problems
given in 11.1
3.
Evaluate
the scalability speedup for the problems in 11.1
4.
Construct
the isoefficiency function for the problems given
in 11.1
5.
Work out
a model and make a complete analysis of parallel computation efficiency
(speedup, efficiency, maximum attainable efficiency, scalability speedup, isoefficiency function) for the problem of matrix – vector
multiplication.
The Estimation of the
Communication Complexity of Parallel Algorithms
1.
Develop
the execution algorithms of the basic data transmission operations for the
network topology in the form of a three-dimensional grid.
2.
Develop
the execution algorithm of the basic data transmission operations for the
network topology in the form of a binary tree.
3.
Use model
B from subsection 3.4 for the estimation of the time complexity of data
transmission operations. Compare the obtained results.
4.
Use model
C from subsection 3.4 for the estimation of the time complexity of data
transmission operations. Compare the obtained results.
5.
Develop
the algorithms of logical presentation of the binary tree for various physical
network topologies.
Parallel Programming with MPI
1.
Develop a
program for finding the minimum (maximum) value of the vector elements.
2.
Develop a
program for computing the scalar product of two vectors.
3.
Develop a
program, where two processes repeatedly exchange messages of n byte length.
Carry out the experiments and estimate the dependence of the data operation
execution time against the message length. Compare it to the theoretical
estimations created according to the Hockney model.
4.
Prepare
the variants of the previously developed programs with different modes of data
transmission. Compare the execution time of data transmission operations in
cases of different modes.
5.
Prepare
the variants of the previously developed programs using non-blocking method of
data transmission operations. Estimate the necessary amount of the computational
operations, which is needed to superpose completely data transmission and
computations. Develop the program, which has no computation delays caused by
waiting for the transmitted data.
6.
Do the
exercises 3 and use the operation of simultaneous data transmission and
reception. Compare the results of the computational experiments.
7.
Develop a
sample program for each collective operation available in MPI.
8.
Develop
the realizations of collective operations using point-to-point exchanges among
processes. Carry out the computational experiments and compare the execution
time of the developed programs to the functions of MPI for collective
operations.
9.
Develop a
program, carry out the experiments and compare the results for different
algorithms of data gathering, data processing and data broadcasting (the
function MPI_Allreduce).
10. Develop a sample program for each method of
constructing derived data types available in MPI.
11. Develop a sample program using data packing and
unpacking functions. Carry out the experiments and compare the results to the
results obtained in case of the use of the derived data types.
12. Develop the derived data types for the rows, columns
and diagonals of matrices.
13. Develop a sample program for each of the discussed
functions of managing communicators and processes.
14. Develop a program for presenting a set of processes
as a rectangular grid. Create communicators for each row and each column of the
processes. Perform a collective operation for all the processes and one of the
created communicators. Compare the operation execution time.
15. Study by yourself and develop sample programs for
transmitting data among the processes of different communicators.
16. Develop a sample program for the Cartesian topology.
17. Develop a sample program for a graph topology.
18. Develop subprograms for creating a set of additional
virtual topologies (a star, a tree etc.).
Parallel Method Development
1.
Perform
the implementation of a cascade scheme of computing a sum of numerical values
(see the Section 2) and compare the execution time of this program and and the function MPI_Bcast of
MPI.
2.
Implement the discussed methods of multi-node
gather operation and compare their execution time. Associate the obtained time
characteristics with the available theoretical estimations. Compare them with
the time of executing the function MPI_Allgather of
MPI.
3.
Design a
scheme of parallel computations using the methodology described in the Section
for designing and developing parallel methods:
·
For the
problem of searching the maximum value among minimal elements of matrix rows
(such calculations occur in solving the problems of matrix games):
(pay special attention to
the situation when the number of processors exceeds the matrix order, i.e.
p>N),
·
For the
problem of computing a definite integral using the method of rectangles:
(the description of this
numerical integration method is given, for instance, in Kahaner,
Moler and Nash (1988))
4.
Develop
parallel programs for the proposed parallel methods in the Exercise 3.
5.
Design
the scheme of parallel computations for the problem of matrix-vector
multiplication using the methodology of designing and developing parallel
methods, described in Section.
Parallel Methods for Matrix-Vector
Multiplication
1.
Develop
the implementation of the parallel algorithm based on the columnwise
striped matrix decomposition. Estimate theoretically the algorithm execution
time. Carry out the computational experiments. Compare the results of the
experiments with the theoretical estimations.
2.
Develop
the implementation of the parallel algorithm based on the checkerboard block
decomposition. Estimate theoretically the algorithm execution time. Carry out
the computational experiments. Compare the results of the experiments with the
theoretical estimations.
Parallel Methods for Matrix Multiplication
1.
Develop
the implementation of two block-striped algorithms of matrix multiplication.
Compare the execution time of these algorithms.
2.
Develop
the Cannon algorithm implementation. Formulate the theoretical estimation of
the algorithm execution time. Execute the computational experiments. Compare
the experimental results with the theoretical estimations.
3.
Develop
the implementation of the checkerboard block algorithm of matrix
multiplication, which may be carried out for rectangular processor grids.
4.
Develop
the implementation of matrix multiplication using the previously developed
programs of matrix-vector multiplication.
Parallel Methods for Solving
Linear Equation Systems
1.
Analyze
the efficiency of the parallel computations for the Gaussian elimination and
the back substitution stages of the Gauss method separately. At which stage is
there the greater decrease of parallel efficiency?
2.
Develop
the parallel variant of the Gauss method based on the columnwise
block-striped matrix decomposition. Estimate theoretically the execution time
of the algorithm. Carry out the computational experiments. Compare the
experimental results with the theoretical estimations obtained previously.
3.
Develop
the implementation of the parallel conjugate gradient method. Estimate
theoretically the execution time of the algorithm. Carry out the computational
experiments. Compare the experimental results with the theoretical estimations
obtained previously.
4.
Develop
the parallel variant of the Jacobi and Seidel methods
of solving linear equation systems (see, for instance, Kumar, et al. (1994),
Quinn (2004)). Estimate theoretically the execution time of the algorithm.
Carry out the computational experiments. Compare the experimental results with
the theoretical estimations obtained previously.
Parallel Methods for Data Sorting
1.
Develop
the implementation of the parallel bubble sort algorithm. Carry out the
necessary experiments. Formulate the theoretical estimations of parallel computation
efficiency. Compare the obtained theoretical estimations with the results of
the experiments.
2.
Develop
the implementation of the parallel quick sort algorithm using one of the
described schemes. Determine the values of the latency, the network bandwidth
and the execution time of the basic sorting operation. Determine the
estimations of the speedup and efficiency characteristics for the method.
Compare the obtained theoretical estimations with the results of the
experiments.
3.
Design a
parallel computation scheme for the well-known merge sort algorithm (a detailed
description of the method may be found, for instance, in Knuth (1997) or Cormen et al. (2001)). Develop the implementation of the
developed algorithm and formulate the necessary theoretical estimations of the
method complexity.
Parallel Methods for Graph Calculations
1.
Develop
the Floyd parallel algorithm using the given program code. Execute the
computational experiments. Formulate the theoretical estimations. Compare the
theoretical estimations with the experimental data.
2.
Develop
software implementation for the Prim parallel algorithm. Execute the
computational experiments. Formulate the theoretical estimations. Compare the
theoretical estimations with the experimental data.
3.
Develop
software implementation for the Kernighan-Lin algorithm. Estimate the
possibilities of the algorithm parallelizing.
Parallel methods for partial
differential equations
1.
Develop
the first and the second variants of the Gauss-Seidel parallel algorithm. Carry
out the computational experiments and compare the execution time.
2.
Implement
the parallel algorithm based on the wavefront
computation scheme. Develop the implementation of the parallel algorithm, in
which the block-oriented approach to the wavefront
processing method is applied. Carry out the computational experiments and compare the execution time.
3.
Develop
the implementation of the parallel computation job queue for shared memory
systems. It is necessary to provide processing the neighboring blocks on the
same processors.