In this section, we discuss MagicPIG's algorithms, briefly introducing importance sampling and Locality-Sensitive hashing and how they can be related to attention approximation problems.
In this section, we discuss MagicPIG's algorithms, briefly introducing importance sampling and Locality-Sensitive hashing and how they can be related to attention approximation problems.
To apply sampling and estimation to attention approximation, several problems must be addressed.
Pre-processing: Build hash tables
Runtime: Retrieve
Our algorithm applies to a single attention head; see the left figure.
The details of Encode, Query, and the hash table construction are described in prior work.
Note that most of the derivations in this section might be classical and can even be found in textbooks. Still, our goal is to leverage them to motivate MagicPIG design and precisely demonstrate the power of a rigorously sound algorithm with system co-design in deep generative models.