💾 trunk/0849195a965637d4c674b80ae7d60692b1a84283: Stabilize efficient attention checkpoint metadata (#184166)
Nachrichtenbereich: 💾 Downloads
🔗 Quelle: github.com
Use query-device dummy philox seed and offset tensors when CUDA efficient attention runs without dropout, so activation checkpoint recomputation sees stable metadata across CUDA graph capture state.... [Weiterlesen]
🔧 Project goals update — April 2026 (end of 2025H2)
📈 201.48 Punkte
🔧 Programmierung
🔧 Flash Attention: what it does and why it matters
📈 194.41 Punkte
🔧 Programmierung
🔧 The Day Transformers Stared Back at Me😂
📈 163.62 Punkte
🔧 Programmierung
🔧 Microsoft SQL Server: Architecture
📈 131.05 Punkte
🔧 Programmierung
🔧 Multi-Head Latent Attention (MLA)
📈 116.43 Punkte
🔧 Programmierung
🔧 Vision Transform
📈 116.43 Punkte
🔧 Programmierung
🔧 Understanding the KV Cache (feat. Self-Attention)
📈 102.11 Punkte
🔧 Programmierung