CUDA Programming Your First CUDA Kernel
💡
Exercise 6

Hello, GPU! 10 XP Easy

Ctrl+Enter Run Ctrl+S Save

🎉 Chapter 2: Your First CUDA Kernel — The Day You Command the GPU!

Every programmer's journey starts with Hello World. Yours starts with Hello, GPU! — but instead of one "Hello", you'll print it from multiple parallel threads simultaneously. Welcome to the parallel universe!

💡 Story: You are the General. You speak the magic words <<<1, 5>>> — and 5 GPU soldiers wake up and ALL shout 'Hello!' at the exact same moment. That's your first CUDA kernel.

#include <stdio.h> // Step 1: Define the kernel (GPU function) // __global__ = "This runs on the GPU!" __global__ void helloKernel() { // Each thread runs this code // threadIdx.x tells each thread its ID (0, 1, 2, ...) printf("Hello from GPU Thread %d!\n", threadIdx.x); } int main() { // Step 2: Launch the kernel // <<<1, 5>>> means: 1 block, 5 threads helloKernel<<<1, 5>>>(); // Step 3: Wait for GPU to finish before CPU continues cudaDeviceSynchronize(); printf("Hello from the CPU!\n"); return 0; }

📤 Expected output (order may vary — they're PARALLEL!):

Hello from GPU Thread 0! Hello from GPU Thread 1! Hello from GPU Thread 2! Hello from GPU Thread 3! Hello from GPU Thread 4! Hello from the CPU!

The 3 magic lines explained:

  • 🔑 __global__ — Declares this function runs on the GPU, called from CPU
  • 🚀 <<<1, 5>>> — Launch configuration: 1 block, 5 threads per block
  • cudaDeviceSynchronize() — Makes the CPU WAIT until all GPU threads are done

⚠️ Important: Notice that in the output above, the threads might not print in order 0,1,2,3,4 — they run in parallel, so the order depends on who finishes first. This is the nature of parallel computing!

📋 Instructions
Write a CUDA program where each GPU thread introduces itself with its thread number. Launch the kernel with **1 block and 8 threads**. Each thread should print: ``` Thread X reporting for duty! ``` Where X is the thread's ID (0 through 7). After all threads finish, the CPU prints: ``` All 8 threads reported! ```
Inside the kernel: printf("Thread %d reporting for duty!\n", threadIdx.x); — threadIdx.x gives each thread its unique ID. Launch with reportKernel<<<1, 8>>>(); then cudaDeviceSynchronize();
⚠️ Try solving it yourself first — you'll learn more!
#include <stdio.h>

__global__ void reportKernel() {
    printf("Thread %d reporting for duty!\n", threadIdx.x);
}

int main() {
    reportKernel<<<1, 8>>>();
    cudaDeviceSynchronize();
    printf("All 8 threads reported!\n");
    return 0;
}
main.py
Hi! I'm Rex 👋
Output
Ready. Press ▶ Run or Ctrl+Enter.