Model Parallelisum

티스토리 뷰

Research (연구 관련)

Model Parallelisum

홍돌 2024. 8. 23. 09:48

Model Parallel

https://pytorch.org/tutorials/intermediate/model_parallel_tutorial.html

모델 잘라서 여러 gpu들에 분산 시킴. Pipelining 잘하면 당연히 속도 빨라질 수 있음.

Pipelining에서 중요한 건 split_size, 즉 각각의 gpu device가 처리하는 data batch의 size다. 이 split_size가 작으면 각 gpu의 idle time이 줄어들지만, cuda kernel launch가 너무 많아져서 비효율적일 수도 있고 (정확히는 무슨 말인 지 모름), split_size가 너무 크면 각 gpu의 idle time이 커짐.

"Intuitively speaking, using small split_size leads to many tiny CUDA kernel launch, while using large split_size results to relatively long idle times during the first and last splits. Neither are optimal. There might be an optimal split_size configuration for this specific experiment."

General한 Solution은 없음.

Multi-Process를 asynchronous하게 돌리면 처음 빼고는 idle time을 없앨 수 있지 않나 싶은데? DDP 는 그렇게 안하고

Recall from the prior tutorial that if your model is too large to fit on a single GPU, you must use model parallel to split it across multiple GPUs. DistributedDataParallel works with model parallel; DataParallel does not at this time. When DDP is combined with model parallel, each DDP process would use model parallel, and all processes collectively would use data parallel.
https://pytorch.org/tutorials/intermediate/ddp_tutorial.html?highlight=distributeddataparallel

아마 FSDP가 그렇게 하는 것 같음. 확실치 않음.

https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html

저작자표시

'Research (연구 관련)' 카테고리의 다른 글

SSH from Remote (0)	2024.08.30
HHF Part10 - Applied Aspects of Hand Function (0)	2024.08.27
HHF Part9 - Hand Function Across the Lifespan (0)	2024.08.23
HHF Part8 - End-effector Constraints (0)	2024.08.21
HHF Part7 - Non-prehensive Skilled Movements (0)	2024.08.20

공지사항

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

글 보관함

홍석쓰 블로그

티스토리 뷰

Model Parallelisum

'Research (연구 관련)' 카테고리의 다른 글

티스토리툴바