全文预览

深度学习大数据分析中英文外文文献翻译

上传者:读书之乐 |  格式:doc  |  页数:18 |  大小:26KB

文档介绍
ze. Finally, the GPU approach presented here is far from being optimized. The first concern is how much memory is sent from the CPU to the GPU. GPUs have limited memories and a worker may be able to store more data in the main memory. mon approach for this scenario is to split the GPU memory in chunks. While the GPU is processing a chunk, the CUDA driver is asynchronously sending the remaining chunks. By synchronizing data copies with kernel executions, it is possible to process more data than what really fits in memory. The overhead of this process can be lower than expected, since copies can be performed in parallel putations that do not target the same data. anization of work-groups and work-items can also be improved. Our solution left many GPU cores idle while trying to maximize cache

收藏

分享

举报
下载此文档