-
Is the GPU faster at addition or multiplication?
-
This puzzle presents three different ways to add elements of a tensor. Can you figure out the fastest implementation?
-
The order of operations matters on the GPU. Can you find the faster ordering?
-
When is matrix multiplication compute bound and when is it memory bandwidth bound on a GPU?
-
What is the optimal way to do a matrix transpose on a GPU?
-
Can GPUs communicate and compute at the same time?
-
Can the arithmetic intensity of a program be increased?
-
Data can be transmitted in many ways but, can you find the most efficient way?