c - CUDA program running slower on Tesla K20 than GTX 965 -
i'm doing project have compare various gpu cards performance analysis.
i had ran same cuda code canny edge detection in both gpu's , found gtx 965 faster(200%) tesla k20. observed tesla c2075 running same of tesla k20.
as far know k20 has 2496 cores, 965 has 1024 cores , c2075 has 448 cores. k20 , c2075 nvidia kepler architecture , 965 maxwell architecture.
what i'm doing wrong or there difference in hardware part causing problem?
also, can check power consumed graphic card using program or theoretical calculations?
many cores not mean shorter execution times. if cuda app utilizing single thread , run app on:
- k20, has lots of cores 706mhz frequency,
- as opposed gtx965 has half of them working on 944mhz
... gtx965 can work faster. in theory, long utilizing less 1024 cores app, gtx can outperform k20, in case if memory not bottleneck k20 has:
- bigger memory bandwidth,
- much more memory in general,
- a tiny bit higher memory clock.
so, sum up, quite easy "tailor" cuda app suit 1 gpu better others, taking hardware limitations account. take consideration such simple things kernel launch parameters, i.e. grid size , block size.
also, same goes c2075 according spec, core clock 1.15ghz, superior both k20 , gtx965.
Comments
Post a Comment