CUDA Programming The GPU Universe
💡
Exercise 3

Why Parallelism? 15 XP Easy

Ctrl+Enter Run Ctrl+S Save

Chapter 1, Part 3: The Power of Doing Everything at Once

💡 Story: Imagine counting every grain of sand on a beach. Alone it takes 1,000 years. But if you split the beach into 10,000 sections and assign one person to each section — it takes less than a day. That's parallelism!

Two types of problems:

  • 🔗 Sequential — Step B needs the result of Step A. You can't skip ahead. (Example: Following a recipe where each step depends on the last)
  • 🔀 Parallel (Data Parallel) — You can do the same operation on thousands of pieces of data independently. (Example: Adding 1 to every pixel in an image)
  • 🏆 GPU loves data-parallel work — The SAME operation on DIFFERENT data = perfect GPU job!
// Sequential image brightness: Loop through each pixel void brightenCPU(unsigned char* image, int size) { for (int i = 0; i < size; i++) { // 8 million iterations for a 4K image! image[i] = min(255, image[i] + 50); } } // Parallel image brightness: ALL pixels at the SAME time! __global__ void brightenGPU(unsigned char* image) { int i = threadIdx.x + blockIdx.x * blockDim.x; image[i] = min(255, image[i] + 50); // Thread i handles pixel i } // 8 million threads at once!

🧮 Amdahl's Law — The theoretical speedup from parallelization:

If 90% of your program is parallelizable and runs on 1,000 GPU cores vs 1 CPU core, your speedup is: 1 / (0.1 + 0.9/1000) ≈ 9.9x — Even with 1,000 workers, the 10% sequential part bottlenecks you. This is why optimizing the parallelizable code matters!

  • 🖼️ Great for GPU — Image processing, matrix math, neural networks, physics simulations
  • 🧵 Bad for GPU — Parsing complex files, recursive algorithms, tasks with many data dependencies
  • 💡 The rule — If you can write it as 'do this to every element in a list', the GPU will LOVE it
📋 Instructions
Calculate and print the theoretical speedups using Amdahl's Law. Formula: `Speedup = 1 / (serial_fraction + parallel_fraction / num_processors)` Print the following: ``` Amdahl's Law Speedup Examples: 10% serial, 1000 cores: 9.90x 1% serial, 1000 cores: 90.99x 0% serial, 1000 cores: 1000.00x 50% serial, 1000 cores: 2.00x ``` Use `float` division and print to 2 decimal places.
Implement the formula: return 1.0f / (serial_frac + (1.0f - serial_frac) / (float)cores); — Remember to cast cores to float for division.
⚠️ Try solving it yourself first — you'll learn more!
#include <stdio.h>

float amdahl(float serial_frac, int cores) {
    return 1.0f / (serial_frac + (1.0f - serial_frac) / (float)cores);
}

int main() {
    printf("Amdahl's Law Speedup Examples:\n");
    printf("10%% serial, 1000 cores: %.2fx\n", amdahl(0.10f, 1000));
    printf("1%% serial, 1000 cores: %.2fx\n",  amdahl(0.01f, 1000));
    printf("0%% serial, 1000 cores: %.2fx\n",  amdahl(0.00f, 1000));
    printf("50%% serial, 1000 cores: %.2fx\n", amdahl(0.50f, 1000));
    return 0;
}
🧪 Test Cases
Input
amdahl(0.10, 1000)
Expected
9.90
10% serial
Input
amdahl(0.01, 1000)
Expected
90.99
1% serial
Input
amdahl(0.50, 1000)
Expected
2.00
50% serial
main.py
Hi! I'm Rex 👋
Output
Ready. Press ▶ Run or Ctrl+Enter.