University of Nizhni Novgorod

Faculty of Computational Mathematics & Cybernetics

Teaching Course: CS338. Introduction to Parallel Programming


Discussions

Introduction to Parallel Computer Systems

1.       What are the basic ways to achieve parallelism?

2.       What are the possible differences of parallel computing systems?

3.       What is Flynn’s classification based on?

4.       What is the essence of multiprocessor systems subdivision into multiprocessors and multicomputers?

5.       What types of systems are known as multiprocessors?

6.       What are the advantages and disadvantages of symmetric multiprocessors?

7.       What system types are known as multicomputers?

8.       What are the advantages and disadvantages of cluster systems?

9.       What data transmission network topologies are widely used in multiprocessor systems development?

10.    What are the peculiarities of data transmission networks for clusters?

11.    What are the basic characteristics of data transmission networks?

12.    What system platforms can be used for cluster design?

Parallel Computation Modeling and Analysis

1.        How is the “operations-operands” model defined?

2.        How is the schedule for the distribution of computations among processors defined?

3.        How is the time of parallel algorithm execution defined?

4.        What schedule is optimal?

5.        How can the minimum possible time of problem solving be defined?

6.        What is a paracomputer? What can this concept be useful for?

7.        What estimates should be used as the characteristics of the sequential problem solving time?

8.        How to define the minimum possible time of parallel problem solving according to “operands-operations” graph?

9.        What dependences may be obtained for parallel problem solving time if the number of processor being used is increased or decreased?

10.     What number of processors corresponds to the parallel algorithm execution time (periods) comparable in the order with the estimates of minimum possible  time of problem solving?

11.     How are the concepts “speedup” and “efficiency” defined?

12.     Is it possible to attain superlinear speedup?

13.     What is the contradictoriness of the speedup and efficiency characteristics?

14.     How is the concept of computation cost defined?

15.     What is the concept  of the cost-optimal algorithm ?

16.     What does the problem of parallelizing a sequential algorithm of the numeric values summation lie in?

17.     What is the essence of the summation cascade scheme? What is the aim of considering the modified version of the scheme?

18.     What is the difference between the speedup and efficiency characteristics  for the discussed versions of the summation cascade scheme?

19.     What is the parallel algorithm of all the partial sums computation of a numeric value sequence?

20.     How is Amdahl’s law formulated? Which aspect of parallel computation does it allow to take into account?

21.     What suppositions are used to ground the Gustafson-Barsis's law?

22.     How is the isoefficiency function defined?

23.     Which algorithm is scalable? Give examples of methods with different level of scalability.

The Estimation of the Communication Complexity of Parallel Algorithms

1.        What basic characteristics are used for the estimation of the data transmission network topology? Give the values of the characteristics for the following types of communication structures (a complete graph, a linear array, a grid etc.)

2.        What basic methods are applied to routing the data transmitted in the network?

3.        What are the basic methods of data transmission? Give the analytical estimations of the execution time for these methods.

4.        What data transmission operations may be selected as the basic ones?

5.        What are the execution algorithms of one-to-all broadcast for the ring, the grid and the hypercube topologies? Give the estimations of the time complexity for these algorithms.

6.        What are the execution algorithms of all-to-all broadcast for the ring, the grid and the hypercube topologies? Give the estimations of the time complexity for these algorithms.

7.        What are the possible execution algorithms of reduction? Which of them is the best as far as the execution time is concerned?

8.        What does the execution algorithm of the circular shift consist in?

9.        Why is it efficient to use logical topologies? Give the examples of the algorithms for logical presentation of communication network structure.

10.     How do the models for estimating the execution time of data transmission in cluster computer systems differ from one another? Which model is the most accurate? Which of them may be used for the preliminary analysis of the time comlexity of the communication operations?

Parallel Programming with MPI

1.        What minimum set of operations of sufficient for the organization of parallel computations in the distributed memory systems?

2.        Why is it important to standardize message passing?

3.        How can a parallel program be defined?

4.        What are the difference between the concepts of “process” and “processor”?

5.        What minimum set of MPI functions makes possible to start the development of parallel progrmas?

6.        How are the messages being passed described?

7.        How do we organize the reception of messages from concrete processes?

8.        How do we determine the execution time of an MPI based program?

9.        What is the difference between point-to-point and collective data transmission operations?

10.     Which MPI function provides transmitting data from a process to all the processes?

11.     What is the data reduction operation?

12.     In what cases should we apply barrier synchronization?

13.     What data transmission modes are supported in MPI?

14.     In what way is the non-blocking data exchange organized in MPI?

15.     What is a deadlock? In what cases does the function of the simultaneous transmission/reception guarantee the absence of deadlock situations?

16.     What collective data transmission operations are supported in MPI?

17.     What is the derived data type in MPI?

18.     What methods of constructing types are available in MPI?

19.     In what situations may data packing and unpacking be useful?

20.     What is a communicator in MPI?

21.     What can new communicators be created for?

22.     What is a virtual topology in MPI?

23.     What types of virtual topologies are supported in MPI?

24.     What may virtual topologies appear to be useful for?

25.     What are the peculiarities of developing MPI based parallel programs in Fortran?

26.     What are the basic additional features of the standard MPI-2?

Parallel Method Development

1.        What are the initial assumptions for the methodology of parallel algorithm development discussed in the Section?

2.        What are the basic stages of the methodology of parallel computation design and development?

3.        How is the “subtasks-messages” model defined?

4.        How is the “processors-channels” model defined?

5.        What basic requirements should be met in parallel algorithm development?

6.        What are the basic operations at the stage of  subtask selection?

7.        What are the basic operations at the stage of analyzing information dependencies?

8.        What are the main operations at the stage of scaling the available subtask set?

9.        What are the main operations at the stage of distributing subtasks among the processors of a computer  system?

10.     How does the “manager-worker” scheme provide dynamic management ofe computational load?

11.     Which parallel computation method was developed for solving the gravitational problem of N bodies?

12.     Which method of multi-node gather operation execution is the most efficient?

Parallel Methods for Matrix-Vector Multiplication

1.        What are the main methods of distributing matrix elements among processors?

2.        What is the statement of the matrix-vector multiplication problem?

3.        What is the computational complexity of the sequential matrix-vector multiplication?

4.        Why is it admissible to duplicate the vector-operand to all the processors in developing a parallel algorithm of matrix-vector multiplication?

5.        What approaches of the development of parallel algorithms may be suggested?

6.        Describe the general schemes of the parallel algorithms discussed in the Section.

7.        Evaluate the efficiency characteristics for one of the algorithms discussed in the Section?

8.        Which of the algorithms has the best speedup and efficiency?

9.        Can the use of the cyclic data distribution scheme influence the execution time of each of the algorithms?

10.     What information communications are carried out for the algorithms in case of block-striped data distribution scheme?  What is the difference between the data communications in case of rowwise matrix distribution and those required in case of columnwise distribution?

11.     What information communications are performed for the checkerboard block matrix-vector multiplication algorithm?

12.     What kind of communication network topology is adequate for each algorithm discussed in the Section?

13.     Analyze the software implementation of matrix-vector algorithm in case of rowwise data distribution. How may software implementations of the other algorithms differ from each other?

14.     What functions of the library MPI appeared to be necessary in the software implementation of the algorithms?

Parallel Methods for Matrix Multiplication

1.        What is the statement of the matrix multiplication problem?

2.        Give the examples of the problems, which make use of the matrix multiplication operations.

3.        Give the examples of various sequential algorithms of matrix multiplication operations. Is the complexity various in case of different algorithms?

4.        What methods of data distribution are used in developing parallel algorithms of matrix multiplication?

5.        Describe the general schemes of the parallel algorithms considered in the Section.

6.        Analyze and compute the efficiency of the block-striped algorithm for horizontal partitioning of the multiplied matrices.

7.        What information communications are carried out for the algorithms in case of the block-striped data decomposition?

8.        What information interactions are carried out in case of the checkerboard block matrix multiplication algorithms?

9.        What communication network topology is efficient for each of the algorithms discussed?

10.     Which of the discussed algorithms requires the least memory size and which of them requires the greatest necessary memory size?

11.     Which of the discussed algorithms has the best speedup and efficiency?

12.     Estimate the possibility to carry out matrix multiplication as a sequence of matrix-vector operations.

13.     Give the general description of the software implementation for the Fox algorithm. In what way may be differences of software implementation of other algorithms?

14.     What functions of the library MPI appear to be necessary in the software implementation of the algorithms?

Parallel Methods for Solving Linear Equation Systems

1.        What is a system of linear equations? What types of systems do you know? What methods may be used for solving systems of various types?

2.        What is the problem statement for solving linear equation systems?

3.        What is the essence of the parallel Gauss method implementation?

4.        What information communications are there among the basic subtasks in the parallel variant of Gauss method?

5.        What are the efficiency characteristics for the parallel variant of Gauss method?

6.        What is the scheme of program implementation of the parallel variant of Gauss method?

7.        What is the concept of the parallel implementation of the conjugate gradient method?

8.        Which of the algorithms has the greater communication complexity?

Parallel Methods for Data Sorting

1.        What is the statement of the data sorting problem?

2.        Give several examples of data sorting algorithms. What is the computational complexity for each of the algorithms?

3.        What is the basic operation for the problem of data sorting?

4.        What is the essence of the parallel generalization of this basic operation?

5.        What is the essence of the odd-even transposition algorithm?

6.        Describe the parallel variant of the Shell algorithm. In what way does it differ from the odd-even transposition method?

7.        What is essence of the parallel variant of the quick sort method?

8.        What depends on the adequate choice of the pivot element for the parallel quick sort algorithm influence?

9.        What methods may be suggested for choosing the pivot element?

10.     To what topologies can the described algorithms be applied?

11.     What is the essence of the parallel sorting by regular sampling?

Parallel Methods for Graph Calculations

1.        Give the definition of the graph. What are the main methods of graph representation on a computer?

2.        What does the problem of searching all the shortest paths consist in?

3.        Give the general scheme of the Floyd algorithm. What is the time complexity of the algorithm?

4.        What approach can be applied to parallelize the Floyd algorithm?

5.        What is the essence of the problem of searching the minimum spanning tree? Give an example illustrating how the problem can be used in practice.

6.        Give the general scheme of the Prim algorithm. What is the time complexity of the algorithm?

7.        What approach can be applied to parallelize the Prim algorithm?

8.        What is the difference between the geometric and the combinatorial methods of graph partition? Which of them are preferable and why?

9.        Describe the coordinate nested dissection method and the levelized nested dissection algorithm. Which of them is easier to implement?

Parallel methods for partial differential equations

1.        How is the Dirichlet problem for Poisson equation defined?

2.        How is the method of finite differences applied for solving the Dirichlet problem?

3.        What approaches determine the parallel computations for the grid methods on shared memory multiprocessor systems?

4.        What is the essence of parallel computation synchronization?

5.        What are the characteristics of the race condition of the parallel program threads?

6.        What is the essence of the deadlock problem?

7.        Which method guarantees the grid method determinacy but requires a great additional memory for implementation?

8.        How does the computation amount change in case of using the wavefront processing methods?

9.        How is chunking applied to reduce computation synchronization?

10.     What are the ways to increase the efficiency of wavefront data processing methods?

11.     In which way does the job queue provide for computational load balancing?

12.     What are the problems to be solved in the course of parallel computations on distributed memory systems?

13.     What mechanisms can be involved into the process of data transmission?

14.     In which way can the multiple wave calculations be applied for increasing the wave computation efficiency in distributed memory systems?

Parallel Methods for Global Optimization

1.        What does the problem of searching for optimum solution consist in?

2.        What may be the objective of such a problem?

3.        What does the complexity of finding the optimum in multiextremal optimization problems consist in?

4.        What approach to the estimation of the optimum solution is used in the Section? What other approaches are known for global optimization?

5.        What are the reasons for the necessity of developing the numeric methods of searching for the global optimum, which might be more efficient than the grid method in the domain of searching for the solution in multidimensional problems?

6.        What makes the implementation of the asynchronous computation scheme in the parallel modification of the index method necessary?

7.        What does the general implementation scheme of software global optimization system consist in?