MagicPIG - QA & Future work

Question and Answers

What is the reasoning benchmark used in evaluations?

That is a benchmark developed by our lab. Stay tuned.

2. Have MagicPIG already outperformed TopK attention?

Not yet. By approximating Oracle sampling and estimation, we show that MagicPIG has the potential to outperform TopK attention. MagicPIG demonstrates strong performance in tasks such as common and frequent word extraction and our upcoming reasoning benchmark. However, for most retrieval tasks, TopK attention still holds the advantage. Our paper highlights another oracle—oracle sampling—that consistently outperforms TopK attention.

3. Are there other sampling and estimation algorithms than LSH?

Yes. Many other KV cache compression algorithms can produce a "probability" similar to the one generated by LSH in MagicPIG. For example, KV cache quantization can first provide an approximated attention score with low precision computation and then follow a sampling & estimation procedure described in Importance sampling, Locality Sensitive hashing, and MagicPIG. We leave it as a future work to find more efficient/robust/accurate sampling algorithms.

Future Work Plan

Support advanced hashing algorithms
Support AVX2 CPUs (Currently only support AVX512)
Support CPU-GPU pipeline
Support Tensor/pipeline parallelism
Develop GPU-friendly hashing algorithms to support H200/B200 and unified memory devices

If you are interested in collaborating, email us at infiniailab@gmail.com.

Page updated

Google Sites

Report abuse

Question and Answers

What is the reasoning benchmark used in evaluations?

2. Have MagicPIG already outperformed TopK attention?

3. Are there other sampling and estimation algorithms than LSH?

Future Work Plan