In the MLP attention section, does W_kk+W_qq
means concatenation in pytorch?. Its said that
Intuitively, you can imagine Wkk+Wqq as concatenating the key and value in the feature dimension
So are we adding or concatenating?
In the MLP attention section, does W_kk+W_qq
means concatenation in pytorch?. Its said that
Intuitively, you can imagine Wkk+Wqq as concatenating the key and value in the feature dimension
So are we adding or concatenating?
Hi @pyzeus, great question. Here, the sentence is interpreting the “W_k * k + W_q * q” in the formula 10.1.7, rather than purely “added” as a traditional math formula. As you can see in the implementation in Pytorch, we broadcast the query and key in the forward
function. Let me know if it helps?!
Sorry for the late reply. I understood that we are not adding the query and key. But why don’t we do torch.cat([query,key],dim=-1)
if we want to concatenate along feature dimension, this is how it should be right? or did I misconstrued the statement?.