💾 Chapter 4: GPU Memory — The Treasure Map
💡 Story: Your GPU has different types of memory — like a kingdom with different storage systems! There's the Capital's main warehouse (Global Memory), a neighborhood pantry (Shared Memory), individual soldiers' pockets (Registers), and a royal broadcast (Constant Memory). Using the right storage at the right time is the difference between a fast GPU and a slow one!
🌐 Global Memory — The Main Warehouse
#include <cuda_runtime.h>
#include <stdio.h>
__global__ void doubleArray(int* d_arr, int n) {
int i = threadIdx.x + blockIdx.x * blockDim.x;
if (i < n) {
d_arr[i] *= 2; // Reading & writing global memory
}
}
int main() {
int n = 5;
int h_arr[] = {1, 2, 3, 4, 5}; // Host (CPU) array
int* d_arr; // Device (GPU) pointer
// Step 1: Allocate GPU memory
cudaMalloc(&d_arr, n * sizeof(int)); // Like malloc() but on GPU
// Step 2: Copy CPU → GPU
cudaMemcpy(d_arr, h_arr, n * sizeof(int), cudaMemcpyHostToDevice);
// Step 3: Launch kernel
doubleArray<<<1, n>>>(d_arr, n);
cudaDeviceSynchronize();
// Step 4: Copy GPU → CPU
cudaMemcpy(h_arr, d_arr, n * sizeof(int), cudaMemcpyDeviceToHost);
// Step 5: Free GPU memory
cudaFree(d_arr);
// Print results
for (int i = 0; i < n; i++) printf("%d ", h_arr[i]);
printf("\n");
// Output: 2 4 6 8 10
return 0;
}
📋 The CUDA Memory Workflow (ALWAYS in this order):
⚡ Performance tip: Global memory accesses are expensive! A cache miss can cost 600-800 clock cycles. This is why shared memory (next exercise) is so valuable — it's 100× faster!
📋 Instructions
Write a program that simulates the full CUDA memory workflow. Since we're running in a regular C environment, simulate it using host arrays and print at each step:
```
=== CUDA Global Memory Workflow ===
Step 1: Allocate GPU memory (5 ints)
Step 2: Copy to GPU: 1 2 3 4 5
Step 3: Kernel runs: doubles each element
Step 4: Copy from GPU: 2 4 6 8 10
Step 5: GPU memory freed
Done!
```
This program is already complete! Just look at the code structure — it shows the exact CUDA workflow: Allocate → Upload → Process → Download → Free. Run it to see the output.