[AI 컴파일러] Operator Fusion 기법 with TVM

Posted Dec 6, 2025

By jouhy 2 min read

Operator Fusion

여러 개의 작은 연사자들을 하나의 큰 커널(kernel) 함수로 합치는 최적화 기법

Fusion 전 A + B -> Temp -> ReLU(Temp) -> C
1. GPU가 메모리에서 A, B를 읽음
2. 더하기 연산 수행
3. 결과값을 VRAM에 씀
4. VRAM에서 다시 읽음
5. ReLU 수행 후 결과값 C를 VRAM에 씀
Fusion 후 Fused_Func(A, B) -> C
1. GPU가 A,B를 읽음
2. 더하기 연산을 수행하고, 결과를 레지스터(캐시)에 잠시 저장
3. 저장한 값을 바로 ReLU 수행
4. 결과값 C를 VRAM에 씀

=> Memory I/O 감소 & Kernel Launch Overhead 감소

Injective(일대일 매핑) : element-wise 연산 (addition, subtraction, scale, ReLU)
-> injective operation끼리 서로 쉽게 fuse 가능
Reduction(축소) : 여러 element 들간의 연산 (e.g., sum)
-> injective operators 마지막에 fuse 가능
Complex-out-fusable(복잡한 연산) : 복잡한 연산 (e.g., Conv2D) -> Complex-out-fusable operator 뒤에 injective operator가 fuse 가능
Opaque(fuse할 수 없는 연산) : (e.g., sort)

BN 공식:

y = gamma * (x - mean) / sqrt(var + eps) + beta

=> Conv + BN
새 weight:

W' = W * gamma / sqrt(var + eps)

새 bias:

b' = beta + (b - mean) * gamma / sqrt(var + eps)

출처

https://operatingsystems.tistory.com/entry/TVM-An-Automated-End-to-End-Optimizing-Compiler-for-Deep-Learning
https://computing-jhson.tistory.com/45#google_vignette
https://github.com/andersy005/tvm-in-action?tab=readme-ov-file

This post is licensed under CC BY 4.0 by the author.