Topics
|
Core Bloom Level
|
Learning Outcomes and Teaching Suggestions (core)
|
Advanced Bloom Level
|
Learning Outcome (advanced)
|
Where Covered
|
Parallel Programming Paradigms and Notations
|
|
|
|
|
|
|
By the target machine model
|
|
|
|
|
|
|
|
Concurrency and Parallelism
|
C
|
Understand concurrency is an algorithmic property; it exposes potential for parallelization. If concurrency is present in an algorithm, it can be parallelized, without concurrency there is no scope for parallelization. Concurrency can be present in a sequential program, parallelization takes advantage of concurrency to increase performance.
|
|
|
CS1; CS2; DS/A
|
|
|
SIMD
|
K
|
Understand common vector operations including element-by-element operations and reductions.
|
|
|
CS2: Systems
|
|
|
- Processor vector extensions
|
K
|
Know examples - e.g., Intel AVX or Power VSX macros
|
C
|
Understand examples from Intel/Power vector instructions
|
Systems; Arch2
|
|
|
- Array language extensions
|
N
|
|
A
|
Know how to write parallel array code in some language (e.g., Fortran95, Intel’s C/C++ Array Extension[CEAN])
|
ParProg
|
|
|
Shared memory
|
A
|
Be able to write correct thread-based programs (protecting shared data) and understand how to obtain speed up.
|
|
|
CS2; DS/A
|
|
|
- Language parts or extensions
|
K
|
Know about language extensions for parallel programming. Illustrate with examples from Cilk (spawn/join), Java (Java threads), or other languages.
|
|
|
|
|
|
- Compiler directives/ pragmas
|
C
|
Understand what simple directives, such as those of OpenMP, mean (parallel for, concurrent section), show examples
|
|
|
|
|
|
|
C
|
Know one in detail, and know of the existence of some other example libraries such as Pthreads, Pfunc, Intel's TBB (Thread building blocks), Microsoft's TPL (Task Parallel Library), C++ threads, etc.
|
|
|
|
|
|
Distributed memory
|
K
|
Know basic notions of messaging among processes, different ways of message passing, collective operations
|
|
|
Systems; DS/A
|
|
|
|
N
|
|
C
|
Know about the overall organization of a message passing program as well as point-to-point and collective communication primitives (e.g., MPI)
|
ParProg
|
|
|
|
N
|
|
C
|
Know about partitioned address spaces, other parallel constructs (e.g., UPC, CoArray Fortran, Chapel)
|
ParProg
|
|
|
Client Server and Peer-to-Peer models
|
C
|
Know notions of invoking and providing services (e.g., RPC, RMI, web services) - understand these as concurrent processes; know about network model of distributed computing (e.g., sockets); know that in distributed systems such handshaking interaction is crucial for the efficient communication between asynchronous processes
|
A
|
Be able to program a basic client/server and/or P2P interface
|
Systems; Networking
|
|
|
Big Data Technology Stack
|
N
|
|
A
|
Understand the Big data technology stack and its layered architecture. Be able to write code (e.g., in Python, R) using some of the tools that facilitate data storage, organization, management, and/or analysis within a Big Data stack
|
ParProg; DistSystems
|
|
|
Hybrid
|
K
|
Know the notion of programming over multiple classes of machines simultaneously (CPU, GPU, TPU, etc.)
|
A
|
Be able to write correct programs using two programming paradigms: e.g., shared memory and GPU (OpenMP+CUDA), Distributed and shared memory (MPI+OpenMP), Distributed memory and GPU (MPI+CUDA)
|
Systems; ParProg
|
|
By the control statement
|
|
|
|
|
|
|
|
Task/thread spawning
|
A
|
Be able to write correct programs with threads, synchronize (fork-join, producer/consumer, master/worker, etc.), use dynamic threads (in number and possibly recursively) thread creation - (e.g., Pthreads, Java threads, etc.) - builds on shared memory topic above
|
|
|
CS2; DS/A
|
|
|
Event-Driven Execution
|
K
|
Know about the need for event-driven execution; possible approaches to implementing it. Know about the notion of causality among events (e.g., remote file access, GUI). These effects may be easier to discuss in the context of distributed systems.
|
A
|
|
CS2; DistSystems
|
|
|
SPMD
|
C
|
Understand how SPMD program is written and how it executes
|
|
Be able to write an SPMD program and understand how it executes
|
|
|
|
|
C
|
Know the existence of highly threaded data parallel notations (e.g., CUDA, OpenCL), message passing (e.g., MPI), and some others (e.g., Global Arrays, BSP library)
|
A
|
|
CS2; DS/A; ParProg
|
|
|
Data parallel
|
A
|
Be able to write a correct data-parallel program for shared-memory machines and get speedup, should do an exercise.
|
|
Understand relation between different notations for data parallel: Array notations, SPMD, and parallel loops. Builds on shared memory topic above.
|
|
|
|
- Parallel loops for shared memory
|
A
|
Know, through an example, one way to implement parallel loops, understand collision/dependencies across iterations (e.g., OpenMP, Intel's TBB)
|
A
|
|
CS2; DS/A; Lang
|
|
|
- Data parallel for distributed memory
|
N
|
|
K
|
Know data parallel notations for distributed memory (e.g., UPC, Chapel, Co-Array Fortran)
|
ParProg
|
|
|
|
K
|
Understand how problems can be solved by mapreduce, and how algorithms can be written using map and reduce
|
A
|
Solve problems using mapreduce, and write algorithms using map and reduce
|
CS2; Lang; ParProg
|
|
|
Offloading to accelerators
|
K
|
Know about running parts of applications on accelerators (e.g., GPU, TPU, FPGA)
|
|
|
CS2; Systems
|
|
|
|
N
|
|
A
|
Be able to write an accelerator program that takes advantage of the hardware (e.g., CUDA, OpenACC, OpenMP 4.5 or above, OpenCL, TensorFlow)
|
ParProg
|
|
|
Functional/logic languages
|
N
|
|
K
|
Understanding advantages and disadvantages of very different programming styles (e.g., Parallel Haskell, Parlog, Erlang)
|
ParProg
|
Semantics and correctness issues
|
|
|
|
|
|
|
Tasks and threads
|
A
|
Be able to write parallel programs that create and assign work to threads/processes,, in at least one parallel environment (e.g., OpenMP, Intel TBB, pthreads, etc.)
|
A
|
|
CS2; DS/A; Systems; Lang
|
|
Synchronization
|
A
|
Be able to write shared memory programs with critical regions, producer- consumer communication, and get speedup; know the notions of mechanisms for concurrency (monitors, semaphores, etc.)
|
|
|
CS2; DS/A; Systems
|
|
|
Critical regions
|
A
|
Be able to write shared memory programs that use critical regions for synchronization
|
|
|
|
|
|
Producer-consumer
|
A
|
Be able to write shared memory programs that use the producer-consumer pattern to share data and synchronize threads
|
|
|
|
|
|
Handshaking
|
N
|
|
A
|
Analyze handshaking protocols and derive performance bounds (e.g., response time in sliding window, TCP connection management)
|
Networking; DistSystems
|
|
Concurrency issues
|
C
|
Understand the notions of deadlock (detection, prevention), race conditions (definition), determinacy/non-determinacy in parallel programs (e.g., if there is a race condition, the correctness of the output may depend on the order of execution)
|
|
|
DS/A; Systems
|
|
|
Deadlock/Livelock
|
C
|
Understand what deadlock and livelock are, and methods for detecting and preventing them; also cast in terms of distributed systems
|
|
|
|
|
|
Starvation
|
C
|
Understand how starvation (of a thread or process) can occur, in context of an example (e.g., dining philosophers)
|
|
|
|
|
|
Race condition
|
C
|
Know what a race condition is, and how to use synchronization to prevent it
|
|
|
|
|
|
Distributed Data Structures and Applications
|
N
|
|
C
|
Understand synchronization in the context of data structures; correctness in a concurrent context
|
ParProg; DistSystems
|
|
|
Tools to detect concurrency defects
|
N
|
|
K
|
Know the existence of tools to detect race conditions (e.g., Eraser) and debugging (e.g., Intel Parallel Toolkit)
|
ParProg
|
|
Memory models
|
N
|
|
C
|
Know what a memory model is, and the implications of the difference between strict and relaxed models (performance vs. ease of use)
|
ParProg
|
|
|
Sequential consistency
|
N
|
|
|
Understand semantics of sequential consistency for shared memory programs
|
|
|
|
Relaxed consistency
|
N
|
|
|
Understand semantics of one relaxed consistency model (e.g., release consistency) for shared memory programs
|
|
|
|
Consistency in distributed transactions
|
N
|
|
C
|
Recognize consistency problems. Know that consistency is an issue in transactions issued concurrently by multiple agents. Implement transaction commit protocols in databases.
|
DB; DistSystems
|
Performance and Energy issues
|
|
|
|
|
|
|
Computation
|
C
|
Understand the basic notions of static and dynamic scheduling, mapping and impact of load balancing on performance
|
|
|
|
|
|
Computation decomposition strategies
|
C
|
Understand different ways to assign computations to threads or processes
|
|
|
CS2; DS/A
|
|
|
|
C
|
Understand how to assign loop iterations to threads based on which thread/process owns the data element(s) written in an iteration
|
|
|
|
|
|
- Decomposition into atomic tasks
|
C
|
Understand how to decompose computations into tasks with communication only at the beginning and end of each task, and assign them to threads/processes
|
|
|
|
|
|
- Decomposition into Map and Reduce tasks
|
K
|
Know that divide and conquer can be expressed and programmed as a Map and Reduce decomposition
|
A
|
Understand how to decompose computation using the Map Reduce paradigm. Be able to reason about computation speed ups from this decomposition and the communication tradeoffs resulting from reduction(s). Be able to produce code expressing this decomposition
|
CS2; ParProg
|
|
|
|
N
|
|
C
|
Understand one way to do dynamic assignment of computations
|
ParProg
|
|
|
- Offloading onto an accelerator
|
N
|
|
C
|
Understand when it is worthwhile to decompose onto an accelerator (e.g., GPU, TPU, FPGA)
|
ParProg
|
|
|
Program transformations
|
N
|
|
C
|
Be able to perform simple loop transformations by hand, and understand how that impacts performance of the resulting code (e.g., loop fusion, fission, skewing, blocking)
|
Compilers; ParProg
|
|
|
Load balancing
|
C
|
Understand the effects of load imbalances on performance, and ways to balance load across threads or processes
|
|
|
DS/A; Systems
|
|
|
Scheduling and mapping
|
C
|
Understand the importance of a programmer, compiler and/or runtime system mapping and scheduling computations to threads/processes, both statically and dynamically
|
A
|
Can apply a static and a dynamic strategy for mapping processes of large scale parallel programs onto processors that optimizes performance through reduction of communication delays
|
DS/A; Systems; ParProg
|
|
|
Effect of timing failures/delay in distributed systems
|
K
|
Understand that a failure in one node can cause a global failure in a distributed system. For example, one could use waiting on a non-terminating program to illustrate a failure scenario in distributed systems (e.g., in the context of consensus).
|
|
|
CS2
|
|
|
- protocol timeout protections
|
N
|
|
A
|
Understanding the use of timeouts in situations with a high probability of error
|
Networking
|
|
Data
|
|
Understand impact of data distribution, layout and locality on performance; notion that transfer of data has fixed cost plus bit rate (irrespective of transfer from memory or inter-processor); know false sharing and its impact on performance (e.g., in a cyclic mapping in a parallel loop) in ParProg;
|
C
|
|
DS/A; Lang; ParProg
|
|
|
Data distribution
|
N
|
|
C
|
Understand what block, cyclic, and block-cyclic data distributions are, and what it means to distribute data across multiple threads/processes
|
ParProg
|
|
|
Data layout
|
C
|
Know how to lay out data in memory to get improved performance and energy (memory hierarchy in shared memory parallel system)
|
|
|
DS/A; Systems
|
|
|
|
K
|
Know that for cache coherent shared memory systems, data is kept coherent in blocks, not individual words, and how to avoid false sharing across threads of data for a block
|
C
|
Be aware of false sharing, able to give examples where it occurs, and understand why it happens
|
Systems; ParProg
|
|
|
|
K
|
Know the energy cost of loading data into memory from secondary storage (and writing out modified data to secondary storage)
|
|
|
Systems
|
|
|
Data Representation
|
|
|
|
|
|
|
|
- Floating point and integer precision (64-bit, 32-bit, and 16-bit or less)
|
K
|
Power savings using smaller data representation 64-bit vs. 32-bit vs. 16-bit floating-point precision, 32-bit vs. 16-bit integer). For example, machine learning on GPUs is driving lower (16-bit) floating-point precision.
|
|
|
Systems
|
|
|
Data locality
|
K
|
Know what spatial and temporal locality are, and how to organize data to take advantage of them
|
|
|
DS/A; Systems
|
|
|
- Performance impact of data movement
|
K
|
Know the performance cost of moving data to secondary storage for big data analysis, distributed vs centralized analysis. Be aware of specific mechanisms that take advantage of locality (e.g., in-situ processing)
|
C
|
Understand performance costs of data locality with respect to various metrics, and be able to contrast mechanisms like in-situ vs. in-transit vs. offline processing
|
Systems; ParProg; DistSystems
|
|
|
- Structured vs unstructured data
|
K
|
Know the differences and tradeoffs between these data representations
|
A
|
Be able to build solutions for the different types of data
|
DS/A; Databases
|
|
|
- Graph representations and databases
|
K
|
Know of graph representations of data to facilitate graph analysis.
|
C
|
Understand performance gains due to graph-specific mechanisms versus other more general data representations
|
DS/A; Databases
|
|
|
Data handling and manipulation
|
N
|
|
C
|
Understand the differences and performance impact of data formatting and storing mechanisms. Be able to implement adequate solutions for the different types of storing
|
DistSystems; Databases
|
|
|
|
N
|
|
C
|
Comprehend the principles behind distributed databases and the motivation of tradeoffs to support scalability
|
DistSystems; Databases
|
|
|
|
N
|
|
A
|
Comprehend how NoSQL databases enable scalable data manipulation, include exemplars to become familiar with some of them (e.g., MongoDB, Hive)
|
Databases
|
|
|
- Eventual consistency vs ACID
|
N
|
|
C
|
Comprehend how replication enables scalability in databases but transactions lose their ACID properties
|
Databases
|
|
|
Distributed file systems
|
K
|
Be aware of existence of distributed file systems and common examples of where they are used and why
|
K
|
Comprehend the basic principles of how distributed file systems work, their scalability benefits, and performance and reliability problems
|
Systems; Operating Systems; DistSystems; ParProg
|
|
|
|
N
|
|
C
|
Know of distributed file systems such as HDFS and its replication and fault tolerance mechanisms. ParProg be able to use a DFS.
|
DistSystems; ParProg
|
|
|
- Key-value storage systems
|
N
|
|
K
|
Understand the concept of key-value storage. ParProg be able to use an appropriate API.
|
DistSystems; ParProg
|
|
Tools and metrics
|
|
|
|
|
|
|
|
Performance monitoring tools
|
K
|
Know of tools for runtime monitoring (e.g., gprof, perf, Intel Performance Toolkit, TAU)
|
|
|
DS/A; Systems
|
|
|
Performance metrics
|
C
|
Know the basic definitions of performance metrics (speedup, efficiency, work, cost), Amdahl's law; know the notion of scalability
|
|
|
CS2; DS/A
|
|
|
|
C
|
Understand how to compute speedup, and what it means
|
|
|
|
|
|
|
C
|
Understand how to compute efficiency, and why it matters
|
|
|
|
|
|
|
C
|
Understand that speedup and efficiency is a single point of measure for a particular problem size and number of processes/threads. These metrics change as problem size and/or number of processes/threads vary. Understand that scalability is a metric that measures how speedup varies as problem size and/or number of processes/threads vary.
|
|
|
|
|
|
|
C
|
Understand that speedup is limited by the sequential portion of a parallel program, if problem size is kept fixed
|
|
|
|
|
|
|
K
|
Know the idea of weak scaling, where problem size increases as the number of processes/threads increases
|
|
|
|
|
Power/Energy efficiency
|
|
|
|
|
|
|
|
Power-latency tradeoff
|
K
|
Familiar with the notion that problem decompositions (including their granularity), and active/idle states (e.g., including modulation of CPU frequency) may be exploited to adjust balance among throughput, latency, and energy consumption.
|
|
|
Systems
|
|
|
Energy efficiency vs. load balancing
|
K
|
Aware that unbalanced work decomposition and communication congestion can prolong computation and reduce energy efficiency.
|
|
|
Systems
|
|
|
Active power management methods
|
N
|
|
A
|
Aware that systems expose various execution parameters (e.g., P states on Intel) and have examined the effect of modulating at least one to optimize performance or energy consumption.
|
ParProg; Arch2
|
|
|
Idle power management methods
|
N
|
|
A
|
Aware that architectures and OSs provide various interfaces that enable computational units to be passivated (e.g., C states) and have examined tradeoffs (e.g., reduced subsystem power consumption v. increased latency) involved in exploiting at least one of them.
|
ParProg; Arch2
|
|
|
Power consumption of parallel programs
|
N
|
|
K
|
Aware that optimal energy efficiency may not be achieved through aggressive reduction of CPU clock frequencies and exploitation of sleep modes due to increased execution time and static components of system power consumption.
|
ParProg
|
Security
|
|
|
|
|
|
|
|
Security protocols
|
N
|
|
A
|
understand IPsec basics
|
Networking
|