CUDA Programming Your First CUDA Kernel
💡
Exercise 7

Kernel Syntax 15 XP Easy

Ctrl+Enter Run Ctrl+S Save

⚙️ Chapter 2, Part 2: The Three CUDA Qualifiers

💡 Story: Think of your CUDA code as a two-army operation. The CPU army is at base camp (Host). The GPU army is deployed on the battlefield (Device). Soldiers can only be given orders that match where they're stationed!

CUDA has 3 function qualifiers that tell the compiler WHERE a function runs and WHO can call it:

// 1. __global__ — Runs on GPU, called from CPU (or from other __global__ in new CUDA) // This is what you use for kernels! __global__ void myKernel(int* data, int n) { int i = threadIdx.x; if (i < n) data[i] *= 2; } // 2. __device__ — Runs on GPU, called from GPU // Helper functions used inside kernels __device__ int square(int x) { return x * x; } // 3. __host__ — Runs on CPU, called from CPU (default — same as normal C) __host__ void prepareData() { // This is just regular CPU code } // You can combine __host__ and __device__ for functions that work on BOTH: __host__ __device__ float clamp(float val, float lo, float hi) { return val < lo ? lo : (val > hi ? hi : val); }

Function Call Rules:

  • ✅ CPU can call __global__ (with <<<>>> syntax)
  • ✅ CPU can call __host__ (normal function call)
  • ✅ GPU can call __device__
  • ✅ GPU kernel can call __device__ functions
  • ❌ CPU CANNOT call __device__ directly
  • ❌ GPU CANNOT call __host__ directly
// Complete example: CPU calls kernel, kernel calls __device__ helper #include <stdio.h> __device__ int doubled(int x) { // GPU helper function return x * 2; } __global__ void processKernel(int* arr, int n) { // GPU kernel int i = threadIdx.x; if (i < n) { arr[i] = doubled(arr[i]); // Kernel calls __device__ function } } int main() { // CPU code // ... allocate memory, copy data ... processKernel<<<1, 5>>>(/* args */); // CPU calls kernel cudaDeviceSynchronize(); return 0; }

⚠️ Return types: __global__ kernels MUST return void. They can't return values — instead, they write results into memory that the CPU reads back.

📋 Instructions
Fill in the CUDA qualifiers to make this program work correctly. Print a table of the qualifier rules: ``` === CUDA Function Qualifiers === __global__ : Runs on GPU, called from CPU __device__ : Runs on GPU, called from GPU __host__ : Runs on CPU, called from CPU Return type of __global__: void Can CPU call __device__? NO Can GPU call __host__? NO ```
Just add printf() for each remaining rule. The pattern is printf("qualifier : description\n");
main.py
Hi! I'm Rex 👋
Output
Ready. Press ▶ Run or Ctrl+Enter.