Lectures¶
Lecture Slides¶
Lecture 01: Introduction and LogisticsLecture 02: Performance MetricsLecture 03: Amdahl's LawLecture 04: Introduction to MIPSLecture 05: Cache IntroductionLecture 06: Cache OptimizationsLecture 07: Virtual MemoryLecture 08: PipeliningLecture 09: Handling BranchesLecture 10: Out-of-Order ExecutionLecture 11: Main MemoryLecture 12: SIMDLecture 13: MultiprocessorsLecture 14: Consistency and Coherence
Note
I have heard reports that some parts of these PDFs don’t render properly in Adobe Reader. I recommend that you view these PDFs in your browser or via an alternative PDF viewer such as Zathura or Evince (Linux) or Foxit Reader (Windows).
Worksheets¶
Worksheet Lecture 02: Performance(Solutions)Worksheet Lecture 03: Amdahl's Law(Solutions)Worksheet Lecture 05: Cache Introduction(Solutions)Worksheet Lecture 06: Cache Optimizations(Solutions)Worksheet Lecture 08: Pipelining Hazards(Solutions)Worksheet Lecture 09: Handling Branches(Solutions)Worksheet Lecture 10: Out-of-Order Execution(Solutions)Worksheet Lecture 11: Main Memory(Solutions)
Turning in Worksheets
You should turn in your worksheets on Gradescope. They are due the Friday after we finish covering them in class. You need to upload the worksheet as a PDF. There are many apps that can do this. On Android the Google Drive app has scan-to-PDF functionality. On iOS, you can scan using the Notes app and export to a PDF.
Gradescope requires that you submit the same number of pages as the blank worksheet template. If you don’t want to print out the worksheet and instead do the problems on a blank sheet of paper, you may need to add additional blank pages to make the number of pages match the required number of pages.
Supplemental Links¶
- Lecture 02: Performance Metrics
- AWS takes advantage of the bandwidth of trucks carrying hard drives with AWS Snowmobile
- CPU Bandwidth - The Worrisome 2020 Trend
- Lecture 05: Cache Introduction
- Lecture 06: Cache Optimizations
- Lecture 07: Virtual Memory
Notes from Lecture- Virtual Memory – Translation-Lookaside Buffer (TLB)
- Why
reallocis actually efficient due to virtual memory and being able to manipulate the page table. A story of Realloc (and Laziness)
- Lecture 09: Branch Prediction
- Why is processing a sorted array faster than processing an unsorted array - Stack Overflow
- A StackOverflow answer which talks about why predicated code isn’t always the best idea gcc optimization flag -O3 makes code slower than -O2
- Linus Torvalds to the LKML on why CMOV (conditional move) is not always that great
- Lecture 11: Main Memory
- Flipping Bits in Memory Without Accessing Them: An Experimental Study of
DRAM Disturbance Errors (
PDF, DOI 10.1145/2678373.2665726)
- Flipping Bits in Memory Without Accessing Them: An Experimental Study of
DRAM Disturbance Errors (
- Lecture 13: Multiprocessors
- Lecture 14: Consistency and Coherence
Lecture Recordings¶
Click here for the lecture recording YouTube playlist.