Tensorflow

【代码分析】Tensorflow的session执行分析

December 23, 2022

6 minutes

Tensorflow GPU Antman

Tensorflow kernal launch 的过程

分析session执行的过程，并分析Antman对执行过程的修改

函数调用链 Run()–>RunInternel()–>RunAsync()–>ScheduleReady()–>Process()

修改了direct_session.cc , 在session执行前后运行中间件框架

【代码分析】Antman对Tensorflow的修改

December 4, 2022

8 minutes

Tensorflow GPU Antman

Antman对Tensorflow的代码修改

总体的关系图，主要包括两个实现，内存方面的GPUResourceManagement以及算力方面的GpuOpManager。

graph TD A>gpu_resource_manage_file] B[SessionRunRegistry] C[SessionRunAction] D[Executor] E[GPUResouceManagement] F[GPU Statistic] G[GpuOpManager] H[GpuUsageAdjustment] I(dump gpu statistic) J[GPU Process State] K[GPUVMemAllocator] L[GPUAdjustableAllocator] A -->|FileListener| E B -->|Register| E E -->|need_to_adjust_memory_| H H -->|new| L H -->|get| K C -->|Derive| E C -->|Derive| F B -->|Register| F F -->|need_to_dump_statistics_| I B -->|Run| C J -->|maybe_create_gpu_vmem_allocator|K D -->|run thread| G E -->|GetEstimatedIdleTime| G

GPUVMemAllocator

GPUVMemAllocator 可以分配host的mem作为显存的备用，以免出现OOM错误。