Reducing dark bandwidth through data reduction near memory

Date and Time:

Wednesday 2018 Apr 4th

Location:

CG Auditorium

Speaker:

Jonathan Beard

Abstract

Data movement is a key bottleneck to scalability of modern computer systems. The race to provide higher usable bandwidth to the compute system (processors) with less energy is garnering increasing interest. As systems scale out, the problem of providing usable bandwidth at a reasonable power level only multiplies. Despite the realization that data movement is a key bottleneck, programmers still waste a sizeable portion of bandwidth. On average, across several benchmarks, L1-D cache utilization ranges from approximately 45% through a high of 91%. Improving the utilization can greatly improve energy efficiency through: reduced data movement, reduced processor effort during execution (effort expended managing memory), and through improved cache line reuse (when lines are fully utilized).

Gather-scatter broadly refers to any technique that takes a non-contiguous section of memory and makes it contiguous so that it can be fetched together from memory (also called packing). Some of these techniques can be utilized to pack the data from sparse representations so that they fully utilize bursts from memory and fully utilize cache lines. The end result is that fewer data moves between levels in the memory hierarchy and cache. This talk will cover some reasons for dark bandwidth, two different use cases for gather/scatter near memory, and potential outcomes resulting from turning data access patterns with poor utilization into either compacted high reuse data or into streaming data depending on use case.

Talk outline

Introduce the data movement problem and “Dark Bandwidth” in general
Describe what is really happening at the architecture and micro-architecture level (use HPCG SpMV as a running example)
Re-introduce the problem at another level now that the basics are commonly understood
Describe some current solutions, past solutions, and what can be done for the future (again, using SpMV as an example)
Show results for our current implementation of SPiDRE
Discuss the trade-offs in interfacing

Turning sparse, high reuse distance access to dense streaming
Compacting data for high reuse and less data movement

Impact on computing/programming models/runtime and future work

References:

Beard, J.C., & Randall, J. (2017). Eliminating Dark Bandwidth: a data-centric view of scalable efficient performance, post-Moore. Proc. High Performance Computing Post-Moore (HCPM’17)
Beard, J. C. (2017, October). The sparse data reduction engine: chopping sparse data one byte at a time. In Proceedings of the International Symposium on Memory Systems (pp. 34-48). ACM.

Speaker Description:

Jonathan is an experienced leader, manager, and researcher. His current research is in the area of data movement reduction hardware/software system architecture solutions for post-exascale and post-Moore systems.

Jonathan is currently a staff research engineer at ARM in Austin, Texas, advisor to FastData.io, and owner of Arkhesoft LLC. He is a graduate of Louisiana State University, The Johns Hopkins University, and Washington University in St. Louis. Jonathan holds baccalaureate degrees in Biology and International studies, a masters degree in Bioinformatics and a doctorate in Computer Science under the research direction of Dr. Roger Chamberlain.

Jonathan is a U.S. Army veteran. He served in multiple countries, in roles ranging from platoon leader and general's Aide-de-Camp to deputy director within a large successful multi-national organization. Jonathan has successfully led companies in size from 50 through 250, including managing and directing start-up operations of an organization with a multi-million USD budget.

Video recorded:

Slides: http://www.jonathanbeard.io/slides/beard_ucar_sea_darkBandwidth.pdf

Event Category:

conference-talk