c++ - Is it possible to speed up matrix multiplication with texture memory? -
i learning cuda.
would possible speedup simple matrix multiplication texture memory? spatial locality nice property addition tiling, overhead using texture memory outweigh it?
i can't seem find implementations of matrix multiplication use texture memory.
matrix multiply can implemented in variety of ways.
compared naive implementation of matrix multiply uses global memory, yes, it's possible speed using texture memory.
compared better-written version of matrix multiply uses shared memory, it's not texture memory give or benefit.
if want best performance cuda matrix multiply, should use cublas. don't write own matrix multiply code.
Comments
Post a Comment