That is a benchmark developed by our lab. Stay tuned.
Not yet. By approximating Oracle sampling and estimation, we show that MagicPIG has the potential to outperform TopK attention. MagicPIG demonstrates strong performance in tasks such as common and frequent word extraction and our upcoming reasoning benchmark. However, for most retrieval tasks, TopK attention still holds the advantage. Our paper highlights another oracle—oracle sampling—that consistently outperforms TopK attention.
Yes. Many other KV cache compression algorithms can produce a "probability" similar to the one generated by LSH in MagicPIG. For example, KV cache quantization can first provide an approximated attention score with low precision computation and then follow a sampling & estimation procedure described in Importance sampling, Locality Sensitive hashing, and MagicPIG. We leave it as a future work to find more efficient/robust/accurate sampling algorithms.
Support advanced hashing algorithms
Support AVX2 CPUs (Currently only support AVX512)
Support CPU-GPU pipeline
Support Tensor/pipeline parallelism
Develop GPU-friendly hashing algorithms to support H200/B200 and unified memory devices
If you are interested in collaborating, email us at infiniailab@gmail.com.