CUDA Programming Sync or Chaos!
💡
Exercise 21

Race Conditions 10 XP Medium

Ctrl+Enter Run Ctrl+S Save

⚠️ Chapter 5: Sync or Chaos! — The Danger of Parallel Execution

💡 Story: Imagine 1,000 soldiers all trying to write their name in the same logbook at the exact same time. Scribbles, overwritten entries, chaos! That's a race condition — when multiple threads read/write the same memory location without coordination.

🔴 The Classic Race Condition:

// BUG: Multiple threads incrementing a counter __global__ void countBuggy(int* counter) { // All threads do this simultaneously: // 1. Read counter value (say it's 5) // 2. Add 1 to local copy (6) // 3. Write back to memory (6) *counter = *counter + 1; // THIS IS A RACE CONDITION! // Problem: If 4 threads read '5' before ANY of them writes '6', // they ALL compute '6' and write '6'. Net result: counter = 6 // WRONG! Should be 9 (was 5 + 4 increments) }

🧠 Why race conditions happen:

  • 📖 Read-Modify-Write — 3 steps that aren't done as a single atomic unit
  • ⏱️ Thread interleaving — Thread A may read, be interrupted, Thread B reads same value, both write same result
  • 💀 Non-deterministic — The bug may not appear every run! Makes it hard to debug
  • 🔀 Common cases — Shared counters, histograms, sums, any write from multiple threads to the same location
// Simulating the race condition: void simulateRace() { int counter = 0; // 4 threads all read counter=0 simultaneously int t0_reads = counter; // 0 (Thread 0 reads) int t1_reads = counter; // 0 (Thread 1 reads at the same time!) int t2_reads = counter; // 0 int t3_reads = counter; // 0 // All compute +1: // Thread 0 writes 1 // Thread 1 writes 1 (NOT 2! — read old value 0) // Thread 2 writes 1 // Thread 3 writes 1 // Final counter = 1 (WRONG! Should be 4) } // Solutions: 1) atomicAdd 2) __syncthreads + reduction 3) Avoid sharing

🛡️ Three ways to fix race conditions:

  • Atomic operationsatomicAdd(), atomicMax() etc. — Hardware-guaranteed single-step operations
  • 🔄 Parallel reduction — Restructure to avoid writing to the same location
  • 🔒 Mutex / lock — Only one thread proceeds at a time (slowest)
📋 Instructions
Simulate a race condition counter. Run 4 threads without synchronization vs with the correct sequential approach and show the difference: ``` === Race Condition Demo === --- WITHOUT atomic (simulated race) --- Thread 0 reads counter=0, writes 1 Thread 1 reads counter=0, writes 1 Thread 2 reads counter=0, writes 1 Thread 3 reads counter=0, writes 1 Final counter (WRONG): 1 --- WITH atomic (correct) --- counter after 4 increments: 4 ```
The code is already written! Run it to see how a race condition causes wrong results. The key insight: in a true race, all threads read the original value before any thread writes back, so they all compute the same new value.
main.py
Hi! I'm Rex 👋
Output
Ready. Press ▶ Run or Ctrl+Enter.