- formatting
- images
- links
- math
- code
- blockquotes
- external-services
•
•
•
•
•
•
-
Can Mamba Learn, Unlearn, and Retain Noise?
Extending the SLM Noise Study to State Space Models — Mamba 1.4B vs Transformers across 4 noise types
-
Multi-Hop Reasoning in Transformers
A Journey From Confidence to Confusion to Clarity: How RoPE Position Embeddings Enable Length Generalization in Transformer Reasoning
-
ReLU vs GELU
Impact of Activation Curvature on Transformer Stability: GELU vs. ReLU
-
hello blog
initial blog post