- Manifold-Constrained Hyper-Connections
- Lecture - Alignment in NLP
- Lecture - Attention and Transformers
- Why Work on Misinformation Detection and Generalization?
- Comparing Machine-Learning Models using Ratios
- Efficient Vocabulary Generation for Very Large Corpora
- Better and Faster Non-Autoregressive Transformers via the Policy Gradient
- Machine Learning II Notes
- Sampling Maximally Diverse Subsets
- The Beta Distribution