4. Pervasive PDC Concepts
There are a few broad themes that are conceptual underpinnings of parallel and distributed computing. These themes are fundamental in nature, cut across all areas of PDC (algorithms, architecture, and programming), and in some cases, may transcend PDC itself. They appear repeatedly at different depths and complexity throughout the curriculum. Many of the PDC topics are related to and stem out of these pervasive themes. Some of these concepts are inter-related to each other. Because of their pervasive nature, it is recommended that coverage of these concepts be made an explicit and essential part of a PDC curriculum. These themes should be introduced very early in the curriculum, often through unplugged activities. It is also recommended that these pervasive topics be further reinforced in later classes in multiple context, depth, and complexity as appropriate in the architecture, programming, and algorithm areas. At the basic level, these concepts should be taught at least at “C” level in terms of Bloom classification. While we have identified four such pervasive topics, we recognize that this list is a work in progress.
Concurrency: Algorithms are generally specified as sequential programs in languages that explicitly specify a set of variables and a sequence of operations to be performed upon them. In order to reduce the time required to execute an algorithm through parallel speedup, its sequential specification can be relaxed to permit individual operations or their groups to be executed concurrently by multiple operational units (e.g., thread, CPU). Hence, concurrency is a property of an algorithm, it exposes potential for parallelization. If concurrency is present in an algorithm then the concurrent operations can be executed in parallel (simultaneously) by multiple operation units (CPU’s) if available, without concurrency there is no scope for parallelization. Concurrency can be present in a sequential program, parallelization takes advantage of concurrency to increase performance. When there is a sequential dependency among operations, mechanisms must be provided to forward data among operational units. When the result of one early operation affects the sequence of future operations, the speculative early execution of multiple potential future computational paths may yield additional speedup. This decomposition of complex tasks in a manner that exploits and manages available parallelism at multiple scales occurs in other contexts that are accessible to students. For example, a kitchen staff in a restaurant will concurrently prepare multiple meals, and may speculatively begin to prepare some dishes prior to the arrival of customer orders. Optimization of such a kitchen’s operations involves nuanced analysis of available concurrency, speculation, and coordination needs such as using warming stations to enable concurrency.
Asynchrony: Both Turing machines and beginning programmers assume that when an operation is invoked, it is completed immediately. Modern computers, however, are increasingly asynchronous for reasons of performance and efficiency. This characteristic is similar across CPU cache lines, I/O buffers, networks, and database entries. Asynchrony is primarily locality in time where operations are not instantaneous, but it also can manifest itself as locality in space (e.g. cache and main memory have different data values). The observance of this phenomena can be introduced in the entry-level Computer Science classes (e.g. all students open up a shared google docs spreadsheet and everyone enters their name in cell A1). This sets the stage for discussions on how to design mitigations in later courses. It is also important to tie the generality of the concept across scales of architecture from CPUs to distributed networks as manifestations of the same basic concept — that operations cannot truly be instantaneous and that clocks are not perfectly synchronized. In short, in computer science, time is messy. Of course, this issue of time goes beyond Computer Science. Clocks became widespread due to the introduction of trains and the increasing accuracy of clocks was driven by ocean navigation. Races become possible due to asynchrony. Much of the complexity of modern programming is trying to impose the perception of synchronous behavior where it doesn’t really exist. The difficulty of this task can be ascertained by the continuing issue of thread-safe libraries decades after threading was introduced in computers. But the issues of asynchrony extend far beyond just threads.
Locality: One of the overarching concepts in computing is that of locality of both time and space. Like concurrency, locality is a concept that predates computer science. Armies are recursively divided into regiments and groups of individual soldiers that coordinate at various levels of granularity. Caches of supplies such as ammunition must be actively managed and coordinated. Many animals benefit from social pack structures that productively exploit parallelism that includes some level of local autonomy within a larger organization. Computer architecture is similarly simplified through localized control. Each core executes its own instruction stream through prefetching and concurrently scheduling of independent operations onto multiple (pipelined) functional units, and those functional units (mostly) manage their internal resources. Each computational unit (a CPU in a shared memory machine or a node in a cluster) may have their own clock and their own notion of time. Memory subsystems proactively predict and cache future memory references based upon recent memory reference patterns. A challenge with localized control is the detection and management of conflict. For example, in order to provide the illusion of a single shared low-latency memory, each core of a CPU is provisioned with an independent local cache that implements a coherence protocol with the other caches. Effective memory access times are highly dependent upon access patterns: A program whose memory access patterns trigger frequent transfers of data among caches will consume more energy and have much higher access latency and lower memory throughput than programs which do not. There is tension between global and local control in the management of sub-tasks at every scale. Broad-scale coordination can require substantial allocation of resources for monitoring and micro-management that might otherwise be used towards speeding up computation, and the cost or delays required to compute and deploy optimized allocations may be onerous. Localized control trades global knowledge for simplicity and speed. However, myopic localized control can miss opportunities for global optimizations such as resource reallocation in response to asynchrony or localized faults.
Performance: Another overarching concept that spans over all areas is performance. No matter what computing artifact (program algorithms, hardware) that are being designed, studied, and analyzed, one should be aware of how good the artifact is and strive to make it better. Students should be introduced to what to measure, how to measure, and what metric to be used to measure the goodness of an artifact . Space, time, and energy are the basic commodities to measure and the metrics for these commodities may differ based on whether the context is sequential or parallel. For example in a sequential program run time may be used as a measure of goodness but in the parallel version of the same program there is an additional variable, the number of cores, so the notion of runtime does not capture the goodness. Different metrics are needed for parallel and distributed programs (e.g. speedup, efficiency, etc.). Similarly asymptotic notations such as big O are used for sequential algorithms but for parallel algorithms scalability is a more appropriate metric for goodness.
Table 1. Pervasive Concepts
Pervasive Concepts
|
|
|
|
Concepts
|
Core BLOOM#
|
Where Covered
|
Learning Outcome
|
Asynchrony
|
C
|
CS1; CS2; DS/A; A(Arch2, OS)
|
1) Understand cause and effect of Asynchrony and how to ensure that computational correctness. 2)Understand asynchrony is the characteristics of modern systems and understand asynchrony in PDC context. 3)1) Can utilize a standard coordination strategy to prevent incorrect operation due to uncontrolled concurrency (race conditions).
|
Concurrency and Dependency
|
A
|
CS1; CS2; DS/A; System
|
1) Understand concurrency is an algorithmic property and difference between concurrency and parallelism. 2) Understand sequential dependency limit degree of concurrency and hence parallelism. 3)Identify sequential dependency in algorithms.
|
Locality
|
A
|
CS1; CS2; DS/A; Systems
|
1)Understand the locality of space and time. 2) Can describe one case of locality affecting behavior(e.g. web cache). 3) Know about some algorithm/program sensitive to some sort of artifact of architectural locality. 4) Implement some programs that are optimized around some locality. 5) Aware of at least two forms of locality.
|
Performance
|
A
|
CS1; CS2; DS/A; Systems
|
1) Understand what to measure and how. 2) Understand space, time, & energy are fundamental commodities to measure. 3)Able to implement/tune some algorithms around performance metric. 4) Comprehend, explore, and analyze some speedup, efficiency, and scalability.
|