1 Comment
⭠ Return to thread

What you mean by LLM is a transformer. Whether a multimodal transformer is sufficient is an entirely different question. My current thinking is that it is not.

Expand full comment