Lädt...

💾 trunk/1a6e4c556bd4e5e00234719aa8e3e4ca4ac3a205: [ATen] SM carveout for cublas (#183330) (#183330)


Nachrichtenbereich: 💾 Downloads
🔗 Quelle: github.com

Summary:
The torch._C._get_sm_carveout_experimental() currently does not affect the standard cublas path, only cublaslt. This PR adds support for default cublas as well.
Test Plan:... [Weiterlesen]

🔧 How GPU-Powered Coding Agents Can Assist in Development of GPU-Accelerated Software


📈 194.22 Punkte
🔧 Programmierung

💾 viable/strict/1778356993: Fix mkldnn_rnn_layer_backward meta dtype and GRU bias shape (#179367)


📈 177.35 Punkte
💾 Downloads

🔧 NVIDIA CUTLASS: High-Performance CUDA Templates for AI Linear Algebra


📈 172.64 Punkte
🔧 Programmierung

💾 viable/strict/1780891878: Preserve aten.hardtanh meta semantics for export (#185298)


📈 159.62 Punkte
💾 Downloads

💾 trunk/f75a3b132520d11656ceb1703c0ab8a423dd55fe: Preserve aten.hardtanh meta semantics for export (#185298)


📈 159.62 Punkte
💾 Downloads

🔧 Gemma4 Tool Calling Fixes in llama.cpp, RTX cuBLAS MatMul Bug, & Local Ollama + Whisper UI


📈 151.06 Punkte
🔧 Programmierung

🔧 Part 9: Generating Simba Network with Rust


📈 151.06 Punkte
🔧 Programmierung

🐧 Ryzen igpu UMA carveout, VRAM allocation on linux, finally found how to change it


📈 138.7 Punkte
🐧 Linux Tipps

🔧 His AI Said 'Swap the PSU.' He Said 'One More Test.'


📈 129.48 Punkte
🔧 Programmierung

💾 trunk/78949058ba4dcda51dfd1f05be67e98bf5628a99: [ATen] SM carveout for cublas (#183330) (#183330)


📈 110.22 Punkte
💾 Downloads

💾 trunk/1a6e4c556bd4e5e00234719aa8e3e4ca4ac3a205: [ATen] SM carveout for cublas (#183330) (#183330)


📈 110.22 Punkte
💾 Downloads

🔧 96% of cuBLAS, no `unsafe`: what cuTile Rust proves


📈 107.9 Punkte
🔧 Programmierung

🔧 I wrote a custom CUDA inference engine to run Qwen3.5-27B on $130 mining cards


📈 86.32 Punkte
🔧 Programmierung

💾 viable/strict/1780997283: Add oneDNN LSTM primitive support for XPU inference (#185531)


📈 70.94 Punkte
💾 Downloads

💾 ciflow/trunk/186653: Make topk deterministic under deterministic algorithms


📈 70.94 Punkte
💾 Downloads

💾 ciflow/torchtitan/186653: Make topk deterministic under deterministic algorithms


📈 70.94 Punkte
💾 Downloads

💾 trunk/5626fdca7af8043cf2095aaec6a12e7853031131: Fix FakeTensor embedding with meta indices (#185060)


📈 70.94 Punkte
💾 Downloads

💾 trunk/277d9b355a190b3e495f82b3f151d1ea1f4de338: [ROCm] Enable Python 3.15 wheel builds (#185409)


📈 70.94 Punkte
💾 Downloads

🔧 I Exported HT-Demucs FT to ONNX in 2026 (4 Blockers Everyone Else Gave Up On)


📈 70.94 Punkte
🔧 Programmierung

🔧 How to Read GPU Profiling Logs: A Ground-Up Guide


📈 64.74 Punkte
🔧 Programmierung

🕵️ CVE-2026-9774 | ATEN Unizon 2.7.262.002 updateLicense path traversal (ZDI-26-378)


📈 53.21 Punkte
🕵️ Sicherheitslücken

💾 trunk/1b392e4a7dde975322e568643c639e7f71df6594: [xpu][test] Fix XPU CI failure (#185895)


📈 53.21 Punkte
💾 Downloads

💾 trunk/9c6b1aa07d56296da76db2b6c429d7b32efbe2eb: Triton backward convolution kernel (#178945)


📈 53.21 Punkte
💾 Downloads

🔧 The Ghost in the Batch: How vLLM Silently Switches Algorithms


📈 43.16 Punkte
🔧 Programmierung

🔧 Building a CUDA-Accelerated Neural Network Library in Rust


📈 43.16 Punkte
🔧 Programmierung

🔧 Profiling a CUDA Python Program with GPUFlight


📈 43.16 Punkte
🔧 Programmierung