🎉 Chapter 2: Your First CUDA Kernel — The Day You Command the GPU!
Every programmer's journey starts with Hello World. Yours starts with Hello, GPU! — but instead of one "Hello", you'll print it from multiple parallel threads simultaneously. Welcome to the parallel universe!
💡 Story: You are the General. You speak the magic words <<<1, 5>>> — and 5 GPU soldiers wake up and ALL shout 'Hello!' at the exact same moment. That's your first CUDA kernel.
📤 Expected output (order may vary — they're PARALLEL!):
The 3 magic lines explained:
__global__ — Declares this function runs on the GPU, called from CPU<<<1, 5>>> — Launch configuration: 1 block, 5 threads per blockcudaDeviceSynchronize() — Makes the CPU WAIT until all GPU threads are done⚠️ Important: Notice that in the output above, the threads might not print in order 0,1,2,3,4 — they run in parallel, so the order depends on who finishes first. This is the nature of parallel computing!
#include <stdio.h>
__global__ void reportKernel() {
printf("Thread %d reporting for duty!\n", threadIdx.x);
}
int main() {
reportKernel<<<1, 8>>>();
cudaDeviceSynchronize();
printf("All 8 threads reported!\n");
return 0;
}