|
|
Log in / Subscribe / Register

Portable LLMs with llamafile

Portable LLMs with llamafile

Posted May 16, 2024 10:00 UTC (Thu) by yaap (subscriber, #71398)
Parent article: Portable LLMs with llamafile

> The difference is attributable to the fact that during prompt evaluation, the model can use matrix-matrix multiplications, instead of matrix-vector multiplications.

As I understand it, it's not the only factor. The evaluation processes the whole input buffer (so all the input tokens) in a single pass of the LLM encoder. While the generation is iterative: there will be a pass through the LLM decoder for each newly generated token.


to post comments


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds