1. Introduction
1.1 Motivation
Parallel and Distributed Computing (PDC) now permeates most computing activities. Multiple cores and general-purpose graphics processing units (GPUs) are common even on laptops and handheld devices, and many productivity tools depend on distributed services. PDC is not just an integral part of the work of computing professionals who explicitly design systems that exploit concurrency to achieve performance; it is also relevant to developers and users of applications that hide much of the complexity of harnessing PDC technology. These implicit consumers of PDC may include developers with applications that interface with everyday tools or libraries such as collaborative environments, productivity tools, and multimedia applications that utilize local and/or remote PDC technology implemented below their visibility threshold. The penetration of PDC into the daily lives of both “explicit” and “implicit” users has made it imperative that all computing professionals be able to understand its scope, effectiveness, efficiency, and reliability.
The preceding trends point to the need for imparting a broad-based skill set in PDC technology at various levels in the educational fabric of Computer Science (CS) and Computer Engineering (CE) programs as well as related computational disciplines. Thus, it is imperative that all computing professionals, including those whose interaction with PDC is only implicit, are familiar with its basic concepts and the interactions and implications of these concepts upon the semantics of systems they exploit.
The Center for Parallel and Distributed Computing Curriculum Development and Educational Resources (CDER), in conjunction with the IEEE Technical Committee on Parallel Processing (TCPP), developed undergraduate TCPP curriculum guidelines for PDC during 2010-121 , the first version of this curriculum, with the premise that every computing undergraduate should achieve a specified skill level regarding PDC-related topics as a result of required coursework. The TCPP curriculum informed the ACM/IEEE 2013 Computer Science Curricula2 and has been broadly adopted. To keep up with the dynamic landscape of computing research and practice, the 2012 TCPP curriculum has now been revised, especially incorporating aspects of Big Data, Energy, and Distributed Computing.
1.2 Our Vision and Intended Use
We recognize that any revision of a core curriculum is a long-term community effort. Our vision has been one of stakeholder experts working together, and periodically providing guidance on restructuring standard curricula across various courses to incorporate parallel and distributed computing. The primary beneficiaries are CS/CE students and their instructors who receive up to date guidelines identifying aspects of PDC that are important to cover and the suggested specific courses in which their coverage might find an appropriate context. Various programs at colleges, nationally and internationally, can receive guidance in setting up courses and/or integrating parallelism within the Computer Science, Computer Engineering, or Computational Science curriculum. Employers can have a better sense of what they can expect from students in the area of parallel and distributed computing skills. Curriculum guidelines can similarly help inform retraining and certification for existing professionals as well as prepare the ground for curriculum setting professional societies for their periodic updates.
Our larger vision in proposing and updating this curriculum has been to enable students to be fully prepared for their future careers in light of technological shifts and mass adoption of PDC through multicores, GPUs, cloud computing, big data, IoT and corresponding software environments, and to make a real impact with respect to all stakeholders, including employers, authors, and educators. These curricular guidelines, along with periodic feedback and other evaluation data on their adoption and use, will also help to steer companies hiring students and interns, hardware and software vendors, and, of course, authors, instructors, and researchers.
This updated curriculum proposal continues to seek adoption and use in a manner that is flexible and broad, always allowing for local variations in emphasis. The field of PDC is changing too rapidly for a proposal with any rigidity to remain valuable to the community for a useful length of time. However, it is essential that curricula begin the process of incorporating parallel thinking into the core courses. Therefore, this curriculum attempts to identify basic concepts and learning goals that are likely to retain their relevance for the foreseeable future. We see PDC topics as being most appropriately sprinkled throughout a CS/CE curriculum in a way that enhances what is already taught and that melds parallel and distributed computing with existing material in whatever ways are most natural for a given institution/program. The document also includes additional elective topics for upper level courses. While advocating the thesis that relegating the most basic PDC subjects to a separate course is not the best means to shift the mindset of students away from purely sequential thinking, we recognize that the separate-course route may work better for some programs, particularly for the coverage of advanced/elective topics.
1.3 The Curriculum and its Update
Architecture, Programing and Algorithms Areas: For the three main PDC areas of Architecture, Programming, and Algorithms, starting from the previous area tables, the working group reconsidered various topics and subtopics and their level of coverage, identified in which current lower level core courses these could be introduced, and provided examples of how they might be taught. For those topics which are either not suitable for coverage in lower level core courses or for which deeper treatment can benefit an undergraduate student, their level of coverage and learning outcomes for advanced or elective courses have also been identified. For each topic/subtopic, the process involved the following.
- Assign a learning level using Bloom’s classification3 using the following notation.
K = Know the term
C = Comprehend so as to paraphrase/illustrate
A = Apply it in some way
- Write learning outcomes and teaching suggestions for core courses, and if warranted, for advanced/elective courses.
- Identify core and advanced/elective courses where the topic could be covered.
A fourth table for emerging topics, which could not be accommodated within the three areas, have also been identified and updated. These are of significant current and emerging interests, and are still evolving. These topics are in general better suited for upper division classes but may be introduced in the early core courses in a limited way. A few previous topics have been eliminated from these four tables and the tables have been reorganized.
Pervasive Concepts: A set of pervasive concepts that percolate across area boundaries are also identified, and most core topics in the three areas support building students’ knowledge and comprehension of the pervasive concepts and their various manifestations. It is desirable for educational programs to enable students to comprehend PDC’s pervasive concepts in multiple contexts. Four key concepts that impact the performance, correctness, and semantics of PDC systems in a pervasive manner include the following (while recognizing that this list is a work in progress):
- Concurrency: The availability and exploitation of simultaneous or overlapping actions.
- Asynchrony: How concurrency can enable actions to occur in multiple orderings, and how this affects the design of programs that achieve high performance and correctness.
- Locality: That data or computational subsystems are local only to the system that actually contains it, remote access imposes delays, costs, and potential inconsistencies. Locality is relevant at multiple levels, and high performance and correctness requires that it be effectively managed.
- Performance: Metrics for characterizing throughput, speedup, efficiency, scalability, etc., at various levels of abstraction.
Big Data, Energy, and Distributed Computing: The majority of updates to the curriculum have focused on enhancing coverage related to the topical aspects of Big Data, Energy, and Distributed Computing. Topics from these aspects have been integrated into the three area tables and the table for emerging topics. The Big Data aspect was introduced in response to the increasing need for PDC in data intensive problems, the growing demand for a skilled workforce in data science and machine learning, and the rise of academic programs in data science, data analytics4 and machine learning. While other guidelines5,6,7 addressing data science in undergraduate programs focus on computational and statistical thinking in general, our goal is to address data science challenges in the PDC context. Topics of relevance include hardware and software support for data collection, storage, organization, and processing; constraints imposed by I/O and memory hierarchies; performance bottlenecks due to data movement; and parallel algorithmic approaches useful for massive data analyses.
Energy and power have emerged as key concerns in computing in the last 15 years or so, and are usually not parts of traditional coverage in undergraduate CS or CE curricula. However, with power consumption issues becoming the primary roadblock to increasing single thread performance, and a key stimulus for increased parallelism and heterogeneity, it is imperative that it is discussed in the undergraduate PDC curriculum. Topics of relevance include the basics of dynamic and static power management for individual cores and sockets, and how related techniques can be exploited in the PDC context.
Traditionally, elements of distributed computing (such as mutual exclusion and synchronization) have been discussed in undergraduate curricula. However, the proliferation of independent and untethered computing devices and applications such as collaborative environments and cloud services has made the early discussion of distributed computing concepts vital. These new applications also present an opportunity to introduce PDC ideas against a backdrop that most students would readily identify with. The revised curriculum has distilled some of the core ideas of distributed computing, often weaving them through several conventional settings before fully exploring them in advanced courses on distributed systems or parallel programming.
Organization: This rest of the report is organized as follows. Section 2 describes the background work on developing the PDC curriculum, its impact so far, and companion activities and future work. Section 3 then explains how to read the curriculum in a manner consistent with its underlying intent. Sections 4 describes the Pervasive concepts and their rationale. Sections 5, 6, and 7, respectively, provide rationale and tables for each of the major topic areas in the curriculum: architecture, programming, and algorithms. Finally, Section 8 provides the table for the emerging topics.
1 NSF/IEEE-TCPP Curriculum Initiative on Parallel and Distributed Computing - Core Topics for Undergraduates. http://tcpp.cs.gsu.edu/curriculum/?q=system/files/NSF-TCPP-curriculum-version1.pdf
2 ACM/IEEE-CS 2013 Computer Science Curricula. https://www.acm.org/binaries/content/assets/education/cs2013_web_final.pdf
3 (i) Anderson, L.W., & Krathwohl (Eds.). (2001). A Taxonomy for Learning, Teaching, and Assessing: A Revision of Bloom's Taxonomy of Educational Objectives. New York: Longman, (ii) Huitt, W. (2009). Bloom et al.'s taxonomy of the cognitive domain. Educational Psychology Interactive. Valdosta, GA: Valdosta State University. http://www.edpsycinteractive.org/topics/cogsys/bloom.html
4 50 Best Big Data Degrees, collegechoice.net, https://www.collegechoice.net/rankings/best-big-data-degrees/
5 A. Danyluk, P. Leidig, L. Cassel, and C. Servin. 2019. ACM Task Force on Data Science Education: Draft Report and Opportunity for Feedback. In Proceedings of the 50th ACM Technical Symposium on Computer Science Education (SIGCSE '19). ACM, New York, NY, USA, 496-497
6 Curriculum Guidelines for Undergraduate Programs in Data Science, PCMI 2016
https://www.stat.berkeley.edu/~nolan/Papers/Data.Science.Guidelines.16.9.25.pdf
7 Curriculum Guidelines for Undergraduate Programs in Statistical Science (ASA), [ASA 2015]
http://www.amstat.org/education/pdfs/guidelines2014-11-15.pdf