'2024/11/25 글 목록

What is KV caching? KV caching is specifically related to the auto-regressive approach of a transformer decoder. In a transformer decoder, it attends to the past and current tokens, but not to future tokens. At each time step, the transformer repeatedly calculates the attention scores between the query and the key, and computes the values by multiplying the scores with the previously computed va..