Hello, GPU! 10 XP Easy

Ctrl+Enter Run Ctrl+S Save

🎉 Chapter 2: Your First CUDA Kernel — The Day You Command the GPU!

Every programmer's journey starts with Hello World. Yours starts with Hello, GPU! — but instead of one "Hello", you'll print it from multiple parallel threads simultaneously. Welcome to the parallel universe!

💡 Story: You are the General. You speak the magic words <<<1, 5>>> — and 5 GPU soldiers wake up and ALL shout 'Hello!' at the exact same moment. That's your first CUDA kernel.

#include <stdio.h>

// Step 1: Define the kernel (GPU function)
// __global__ = "This runs on the GPU!"
__global__ void helloKernel() {
    // Each thread runs this code
    // threadIdx.x tells each thread its ID (0, 1, 2, ...)
    printf("Hello from GPU Thread %d!\n", threadIdx.x);
}

int main() {
    // Step 2: Launch the kernel
    // <<<1, 5>>> means: 1 block, 5 threads
    helloKernel<<<1, 5>>>();
    
    // Step 3: Wait for GPU to finish before CPU continues
    cudaDeviceSynchronize();
    
    printf("Hello from the CPU!\n");
    return 0;
}

📤 Expected output (order may vary — they're PARALLEL!):

Hello from GPU Thread 0!
Hello from GPU Thread 1!
Hello from GPU Thread 2!
Hello from GPU Thread 3!
Hello from GPU Thread 4!
Hello from the CPU!

The 3 magic lines explained:

🔑 __global__ — Declares this function runs on the GPU, called from CPU
🚀 <<<1, 5>>> — Launch configuration: 1 block, 5 threads per block
⏳ cudaDeviceSynchronize() — Makes the CPU WAIT until all GPU threads are done

⚠️ Important: Notice that in the output above, the threads might not print in order 0,1,2,3,4 — they run in parallel, so the order depends on who finishes first. This is the nature of parallel computing!

📋 Instructions

Write a CUDA program where each GPU thread introduces itself with its thread number. Launch the kernel with **1 block and 8 threads**. Each thread should print: ``` Thread X reporting for duty! ``` Where X is the thread's ID (0 through 7). After all threads finish, the CPU prints: ``` All 8 threads reported! ```

Inside the kernel: printf("Thread %d reporting for duty!\n", threadIdx.x); — threadIdx.x gives each thread its unique ID. Launch with reportKernel<<<1, 8>>>(); then cudaDeviceSynchronize();

⚠️ Try solving it yourself first — you'll learn more!

#include <stdio.h>

__global__ void reportKernel() {
    printf("Thread %d reporting for duty!\n", threadIdx.x);
}

int main() {
    reportKernel<<<1, 8>>>();
    cudaDeviceSynchronize();
    printf("All 8 threads reported!\n");
    return 0;
}

← Previous Next Exercise →

main.py

Hi! I'm Rex 👋

Output

Ready. Press ▶ Run or Ctrl+Enter.

›