Local Models Related Papers

/lmg/	Abstracts Search (Current as end of 06/2025)Links
	Google Papers Blog
12/2017	Attention Is All You Need (Transformers)
10/2018	BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
10/2019	Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (T5)
11/2019	Fast Transformer Decoding: One Write-Head is All You Need
02/2020	GLU Variants Improve Transformer
03/2020	Talking-Heads Attention
05/2020	Conformer: Convolution-augmented Transformer for Speech Recognition
09/2020	Efficient Transformers: A Survey
12/2020	RealFormer: Transformer Likes Residual Attention
01/2021	Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
09/2021	Finetuned Language Models Are Zero-Shot Learners (Flan)
09/2021	Primer: Searching for Efficient Transformers for Language Modeling
11/2021	Sparse is Enough in Scaling Transformers
12/2021	GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
01/2022	LaMDA: Language Models for Dialog Applications
01/2022	Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
04/2022	PaLM: Scaling Language Modeling with Pathways
07/2022	Confident Adaptive Language Modeling
10/2022	Scaling Instruction-Finetuned Language Models (Flan-Palm)
10/2022	Towards Better Few-Shot and Finetuning Performance with Forgetful Causal Language Models
10/2022	Large Language Models Can Self-Improve
11/2022	Efficiently Scaling Transformer Inference
11/2022	Fast Inference from Transformers via Speculative Decoding
02/2023	Symbolic Discovery of Optimization Algorithms (Lion)
03/2023	PaLM-E: An Embodied Multimodal Language Model
04/2023	Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference
05/2023	Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
05/2023	FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction
05/2023	PaLM 2 Technical Report
05/2023	Symbol tuning improves in-context learning in language models
05/2023	Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models
05/2023	Towards Expert-Level Medical Question Answering with Large Language Models (Med-Palm 2)
05/2023	DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
05/2023	How Does Generative Retrieval Scale to Millions of Passages?
05/2023	GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoint
05/2023	Small Language Models Improve Giants by Rewriting Their Outputs
06/2023	StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners
06/2023	AudioPaLM: A Large Language Model That Can Speak and Listen
06/2023	Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting
07/2023	HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models
09/2023	Uncovering mesa-optimization algorithms in Transformers
10/2023	Think before you speak: Training Language Models With Pause Tokens
10/2023	SpecTr: Fast Speculative Decoding via Optimal Transport
11/2023	UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
11/2023	Automatic Engineering of Long Prompts
12/2023	Beyond ChatBots: ExploreLLM for Structured Thoughts and Personalized Model Responses
12/2023	Style Aligned Image Generation via Shared Attention
01/2024	A Minimaximalist Approach to Reinforcement Learning from Human Feedback (SPO)
02/2024	Time-, Memory- and Parameter-Efficient Visual Adaptation (LoSA)
02/2024	Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
03/2024	PERL: Parameter Efficient Reinforcement Learning from Human Feedback
04/2024	TransformerFAM: Feedback attention is working memory
05/2024	eXmY: A Data Type and Technique for Arbitrary Bit Precision Quantization
05/2024	Faster Cascades via Speculative Decoding
06/2024	Proofread: Fixes All Errors with One Tap
08/2024	Natural Language Outlines for Code: Literate Programming in the LLM Era
08/2024	Diffusion Models Are Real-Time Game Engines
11/2024	LAUREL: Learned Augmented Residual Layer
11/2024	Active Data Curation Effectively Distills Large-Scale Multimodal Models
12/2024	Zero-Shot Mono-to-Binaural Speech Synthesis
03/2025	Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo
04/2025	It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization

	Deepmind (Google Deepmind as of 4/2023) Papers Blog
10/2019	Stabilizing Transformers for Reinforcement Learning
12/2021	Scaling Language Models: Methods, Analysis & Insights from Training Gopher
12/2021	Improving language models by retrieving from trillions of tokens (RETRO)
02/2022	Competition-Level Code Generation with AlphaCode
02/2022	Unified Scaling Laws for Routed Language Models
03/2022	Training Compute-Optimal Large Language Models (Chinchilla)
04/2022	Flamingo: a Visual Language Model for Few-Shot Learning
05/2022	A Generalist Agent (GATO)
07/2022	Formal Algorithms for Transformers
02/2023	Accelerating Large Language Model Decoding with Speculative Sampling
05/2023	Tree of Thoughts: Deliberate Problem Solving with Large Language Models
05/2023	Block-State Transformer
05/2023	Randomized Positional Encodings Boost Length Generalization of Transformers
08/2023	From Sparse to Soft Mixtures of Experts
09/2023	Large Language Models as Optimizers
09/2023	MADLAD-400: A Multilingual And Document-Level Large Audited Dataset (MT Model)
09/2023	Scaling Laws for Sparsely-Connected Foundation Models
09/2023	Language Modeling Is Compression
09/2023	Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution
10/2023	Large Language Models as Analogical Reasoners
10/2023	Controlled Decoding from Language Models
10/2023	A General Theoretical Paradigm to Understand Learning from Human Preferences
11/2023	DiLoCo: Distributed Low-Communication Training of Language Models
12/2023	Gemini: A Family of Highly Capable Multimodal Models
12/2023	AlphaCode 2 Technical Report
12/2023	Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
12/2023	Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
12/2023	Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding
01/2024	Solving olympiad geometry without human demonstrations
02/2024	LiPO: Listwise Preference Optimization through Learning-to-Rank
02/2024	Grandmaster-Level Chess Without Search
02/2024	How to Train Data-Efficient LLMs
02/2024	A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
02/2024	Gemma: Open Models Based on Gemini Research and Technology
02/2024	Genie: Generative Interactive Environments
02/2024	Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
03/2024	DiPaCo: Distributed Path Composition
04/2024	Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
05/2024	Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities
06/2024	Transformers meet Neural Algorithmic Reasoners
06/2024	Gemma 2: Improving Open Language Models at a Practical Size
06/2024	Data curation via joint example selection further accelerates multimodal learning
07/2024	PaliGemma: A versatile 3B VLM for transfer
07/2024	LookupViT: Compressing visual information to a limited number of tokens
07/2024	Mixture of Nested Experts: Adaptive Processing of Visual Tokens
08/2024	Generative Verifiers: Reward Modeling as Next-Token Prediction
09/2024	Imitating Language via Scalable Inverse Reinforcement Learning
10/2024	Preference Optimization as Probabilistic Inference
10/2024	Round and Round We Go! What makes Rotary Positional Encodings useful?
10/2024	Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
12/2024	Language-Guided Image Tokenization for Generation (TexTok)
12/2024	TRecViT: A Recurrent Video Transformer
12/2024	Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models
12/2024	Deliberation in Latent Space via Differentiable Cache Augmentation
02/2025	Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch
02/2025	GiGL: Large-Scale Graph Neural Networks at Snapchat
03/2025	Gemma 3 Technical Report
05/2025	AlphaEvolve: A coding agent for scientific and algorithmic discovery
06/2025	Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

	Meta (Facebook AI Research) Papers Blog
04/2019	fairseq: A Fast, Extensible Toolkit for Sequence Modeling
07/2019	Augmenting Self-attention with Persistent Memory
11/2019	Improving Transformer Models by Reordering their Sublayers
08/2021	Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
03/2022	Training Logbook for OPT-175B
05/2022	OPT: Open Pre-trained Transformer Language Models
07/2022	Beyond neural scaling laws: beating power law scaling via data pruning
11/2022	Galactica: A Large Language Model for Science
01/2023	Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture (I-JEPA)
02/2023	LLaMA: Open and Efficient Foundation Language Models
02/2023	Toolformer: Language Models Can Teach Themselves to Use Tools
03/2023	Scaling Expert Language Models with Unsupervised Domain Discovery
03/2023	SemDeDup: Data-efficient learning at web-scale through semantic deduplication
04/2023	Segment Anything (SAM)
04/2023	A Cookbook of Self-Supervised Learning
05/2023	Learning to Reason and Memorize with Self-Notes
05/2023	ImageBind: One Embedding Space To Bind Them All
05/2023	MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
05/2023	LIMA: Less Is More for Alignment
05/2023	Scaling Speech Technology to 1,000+ Languages
05/2023	READ: Recurrent Adaptation of Large Transformers
05/2023	LLM-QAT: Data-Free Quantization Aware Training for Large Language Models
05/2023	Physics of Language Models: Part 1, Learning Hierarchical Language Structures
06/2023	Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
06/2023	Simple and Controllable Music Generation (MusicGen)
06/2023	Improving Open Language Models by Learning from Organic Interactions (BlenderBot 3x)
06/2023	Extending Context Window of Large Language Models via Positional Interpolation
06/2023	Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
07/2023	Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning (CM3leon)
07/2023	Llama 2: Open Foundation and Fine-Tuned Chat Models
08/2023	SeamlessM4T—Massively Multilingual & Multimodal Machine Translation
08/2023	D4: Improving LLM Pretraining via Document De-Duplication and Diversification
08/2023	Code Llama: Open Foundation Models for Code
08/2023	Nougat: Neural Optical Understanding for Academic Documents
09/2023	Contrastive Decoding Improves Reasoning in Large Language Models
09/2023	Effective Long-Context Scaling of Foundation Models
09/2023	AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
09/2023	Vision Transformers Need Registers
09/2023	Physics of Language Models: Part 3.1, Knowledge Storage and Extraction
09/2023	Physics of Language Models: Part 3.2, Knowledge Manipulation
10/2023	RA-DIT: Retrieval-Augmented Dual Instruction Tuning
10/2023	Branch-Solve-Merge Improves Large Language Model Evaluation and Generation
10/2023	Generative Pre-training for Speech with Flow Matching
11/2023	Emu Edit: Precise Image Editing via Recognition and Generation Tasks
12/2023	Audiobox: Unified Audio Generation with Natural Language Prompts
12/2023	Universal Pyramid Adversarial Training for Improved ViT Performance
01/2024	Self-Rewarding Language Models
02/2024	Revisiting Feature Prediction for Learning Visual Representations from Video (V-JEPA)
02/2024	MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
03/2024	Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
03/2024	Reverse Training to Nurse the Reversal Curse
04/2024	Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws
04/2024	Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
04/2024	TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
04/2024	Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding
04/2024	MoDE: CLIP Data Experts via Clustering
04/2024	Iterative Reasoning Preference Optimization
04/2024	Better & Faster Large Language Models via Multi-token Prediction
05/2024	Modeling Caption Diversity in Contrastive Vision-Language Pretraining (LLIP)
05/2024	Chameleon: Mixed-Modal Early-Fusion Foundation Models
05/2024	SpinQuant -- LLM quantization with learned rotations
05/2024	Contextual Position Encoding: Learning to Count What's Important
06/2024	The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More
06/2024	Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcemen
07/2024	The Llama 3 Herd of Models
07/2024	SAM 2: Segment Anything in Images and Videos
07/2024	Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process
07/2024	MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts
08/2024	Self-Taught Evaluators
08/2024	Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems
08/2024	Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
10/2024	The Perfect Blend: Redefining RLHF with Mixture of Judges (CGPO)
10/2024	Movie Gen: A Cast of Media Foundation Models
10/2024	Thinking LLMs: General Instruction Following with Thought Generation
11/2024	Context Parallelism for Scalable Million-Token Inference
11/2024	Adaptive Decoding via Latent Preference Optimization
11/2024	Self-Generated Critiques Boost Reward Modeling for Language Models
11/2024	Efficient Track Anything
12/2024	CompCap: Improving Multimodal Large Language Models with Composite Captions
12/2024	Training Large Language Models to Reason in a Continuous Latent Space
12/2024	Byte Latent Transformer: Patches Scale Better Than Tokens
01/2025	Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback
01/2025	LLMs can see and hear without any training
03/2025	Transformers without Normalization
04/2024	Multi-Token Attention
04/2025	GaLore 2: Large-Scale LLM Pre-Training by Gradient Low-Rank Projection

	Microsoft Papers Blog
12/2015	Deep Residual Learning for Image Recognition
05/2021	EL-Attention: Memory Efficient Lossless Attention for Generation
01/2022	DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
03/2022	DeepNet: Scaling Transformers to 1,000 Layers
12/2022	A Length-Extrapolatable Transformer
01/2023	Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
02/2023	Language Is Not All You Need: Aligning Perception with Language Models (Kosmos-1)
03/2023	Sparks of Artificial General Intelligence: Early experiments with GPT-4
03/2023	TaskMatrix. AI: Completing Tasks by Connecting Foundation Models with Millions of APIs
04/2023	Instruction Tuning with GPT-4
04/2023	Inference with Reference: Lossless Acceleration of Large Language Models
04/2023	Low-code LLM: Visual Programming over LLMs
04/2023	WizardLM: Empowering Large Language Models to Follow Complex Instructions
04/2023	MLCopilot: Unleashing the Power of Large Language Models in Solving Machine Learning Tasks
04/2023	ResiDual: Transformer with Dual Residual Connections
05/2023	Code Execution with Pre-trained Language Models
05/2023	Small Models are Valuable Plug-ins for Large Language Models
05/2023	CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing
06/2023	Orca: Progressive Learning from Complex Explanation Traces of GPT-4
06/2023	Augmenting Language Models with Long-Term Memory
06/2023	WizardCoder: Empowering Code Large Language Models with Evol-Instruct
06/2023	Textbooks Are All You Need (phi-1)
07/2023	In-context Autoencoder for Context Compression in a Large Language Model
07/2023	Retentive Network: A Successor to Transformer for Large Language Models
08/2023	Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference
09/2023	Efficient RLHF: Reducing the Memory Usage of PPO
09/2023	DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
09/2023	Textbooks Are All You Need II (phi-1.5)
09/2023	PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training
09/2023	A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models
09/2023	Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models
10/2023	Sparse Backpropagation for MoE Training
10/2023	Nugget 2D: Dynamic Contextual Compression for Scaling Decoder-only Language Models
10/2023	Mixture of Quantized Experts (MoQE): Complementary Effect of Low-bit Quantization and Robustness
10/2023	Augmented Embeddings for Custom Retrievals
10/2023	Guiding Language Model Reasoning with Planning Tokens
10/2023	Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
10/2023	CodeFusion: A Pre-trained Diffusion Model for Code Generation
10/2023	LoRAShear: Efficient Large Language Model Structured Pruning and Knowledge Recovery
10/2023	FP8-LM: Training FP8 Large Language Models
11/2023	Orca 2: Teaching Small Language Models How to Reason
12/2023	ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
12/2023	The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
01/2024	SliceGPT: Compress Large Language Models by Deleting Rows and Columns
01/2024	RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture
02/2024	LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
02/2024	The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits (BitNet)
02/2024	ResLoRA: Identity Residual Mapping in Low-Rank Adaption
03/2024	LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
03/2024	SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series
04/2024	LongEmbed: Extending Embedding Models for Long Context Retrieval
04/2024	Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
05/2024	You Only Cache Once: Decoder-Decoder Architectures for Language Models (YOCO)
06/2024	Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
06/2024	E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
06/2024	Automatic Instruction Evolving for Large Language Models
07/2024	Arena Learning : Build Data Flywheel for LLMs Post-training via Simulated Chatbot Arena
07/2024	Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
09/2024	VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models
10/2024	Differential Transformer
11/2024	BitNet a4.8: 4-bit Activations for 1-bit LLMs
12/2024	Multimodal Latent Language Modeling with Next-Token Diffusion
12/2024	Phi-4 Technical Report
01/2025	Imagine while Reasoning in Space: Multimodal Visualization-of-Thought
02/2025	Bitnet.cpp: Efficient Edge Inference for Ternary LLMs
04/2025	Muon Optimizer Accelerates Grokking
04/2025	BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs
06/2025	Reinforcement Pre-Training

	OpenAI Papers Blog
07/2017	Proximal Policy Optimization Algorithms
04/2019	Generating Long Sequences with Sparse Transformers
01/2020	Scaling Laws for Neural Language Models
05/2020	Language Models are Few-Shot Learners (GPT-3)
01/2022	Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
03/2022	Training language models to follow instructions with human feedback (InstructGPT)
07/2022	Efficient Training of Language Models to Fill in the Middle
03/2023	GPT-4 Technical Report
04/2023	Consistency Models
05/2023	Let's Verify Step by Step
10/2023	Improving Image Generation with Better Captions (DALL·E 3)
10/2024	MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering

	Hazy Research (Stanford) Papers Blog
10/2021	Efficiently Modeling Long Sequences with Structured State Spaces (S4)
04/2022	Monarch: Expressive Structured Matrices for Efficient and Accurate Training
05/2022	FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
12/2022	Hungry Hungry Hippos: Towards Language Modeling with State Space Models
02/2023	Simple Hardware-Efficient Long Convolutions for Sequence Modeling
02/2023	Hyena Hierarchy: Towards Larger Convolutional Language Models
06/2023	TART: A plug-and-play Transformer module for task-agnostic reasoning
07/2023	FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
11/2023	FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores

	DeepSeek Github
01/2024	DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
01/2024	DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
02/2024	DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
03/2024	DeepSeek-VL: Towards Real-World Vision-Language Understanding
05/2024	DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
06/2024	DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
07/2024	Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models
08/2024	DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
08/2024	Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning
08/2024	Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts
10/2024	Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
12/2024	DeepSeek-V3 Technical Report
01/2025	DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
01/2025	Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling
02/2025	Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
04/2025	Inference-Time Scaling for Generalist Reward Modeling
05/2025	Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

	THUDM (Tsinghua University) Papers Github
10/2022	GLM-130B: An Open Bilingual Pre-Trained Model
03/2023	CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X
04/2023	DoctorGLM: Fine-tuning your Chinese Doctor is not a Herculean Task
06/2023	WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human Preferences
09/2023	GPT Can Solve Mathematical Problems Without a Calculator (MathGLM)
10/2023	AgentTuning: Enabling Generalized Agent Abilities for LLMs (AgentLM)
11/2023	CogVLM: Visual Expert for Pretrained Language Models
12/2023	CogAgent: A Visual Language Model for GUI Agents
01/2024	APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding
01/2024	LongAlign: A Recipe for Long Context Alignment of Large Language Models
06/2024	ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools
08/2024	LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
11/2024	AutoGLM: Autonomous Foundation Agents for GUIs
01/2025	Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling (T1)
07/2025	GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

	Articles
05/2015	Andrej Karpathy - The Unreasonable Effectiveness of Recurrent Neural Networks
03/2019	Rich Sutton - The Bitter Lesson
06/2022	Yann LeCun - A Path Towards Autonomous Machine Intelligence
01/2023	Lilian Weng - The Transformer Family Version 2.0
01/2023	Lilian Weng - Large Transformer Model Inference Optimization
03/2023	Stanford - Alpaca: A Strong, Replicable Instruction-Following Model
05/2023	OpenAI - Language models can explain neurons in language models
05/2023	Alex Turner - Steering GPT-2-XL by adding an activation vector
06/2023	YyWang - Do We Really Need the KVCache for All Large Language Models
06/2023	kaiokendev - Extending Context is Hard…but not Impossible
06/2023	bloc97 - NTK-Aware Scaled RoPE
07/2023	oobabooga - A direct comparison between llama.cpp, AutoGPTQ, ExLlama, and transformers perplexities
07/2023	Jianlin Su - Carrying the beta position to the end (better NTK RoPe method)
08/2023	Charles Goddard - On Frankenllama
10/2023	Tri Dao - Flash-Decoding for Long-Context Inference
10/2023	Evan Armstrong - Human-Sourced, AI-Augmented: a promising solution for open source conversational data
12/2023	Anthropic - Long context prompting for Claude 2.1
12/2023	Andrej Karpathy - On the "hallucination problem" (tweet.jpg)
12/2023	HuggingFace - Mixture of Experts Explained
01/2024	Vgel - Representation Engineering
01/2024	Alex Alemi - KL is All You Need
02/2024	Lilian Weng - Thinking about High-Quality Human Data
03/2024	rayliuca - T-Ragx Project Write Up (Translation RAG)
04/2024	Answer.Ai - Efficient finetuning of Llama 3 with FSDP QDoRA
04/2024	Sam Paech - Creating MAGI: A hard subset of MMLU and AGIEval
05/2024	LLaVA Team - LLaVA-NeXT: Stronger LLMs Supercharge Multimodal Capabilities in the Wild
05/2024	Hazy Research - GPUs Go Brrr (ThunderKittens)
05/2024	Anthropic - Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
06/2024	CharacterAI - Optimizing AI Inference
07/2024	Lilian Weng - Extrinsic Hallucinations in LLMs
07/2024	Andrej Karpathy - Let's reproduce GPT-2 (1.6B)
07/2024	Pierre-Carl Langlais - Announcing Finance Commons and the Bad Data Toolbox
07/2024	Zeyuan Allen-Zhu - Physics of Language Models ICML Talk (Video)
11/2024	Lilian Weng - Reward Hacking in Reinforcement Learning
12/2024	Keller Jordan - Muon: An optimizer for hidden layers in neural networks
01/2025	Ganqu Cui + Lifan Yuan - Process Reinforcement through Implicit Rewards
03/2025	Jeremy Bernste - Deriving Muon
03/2025	Stochasm - Contextualization Machines
05/2025	Lilian Weng - Why We Think
06/2025	Yiding Jiang - The Era of Exploration

	Open Models
06/2021	GPT-J-6B: 6B JAX-Based Transformer
09/2021	Pythia: A Customizable Hardware Prefetching Framework Using Online Reinforcement Learning
03/2022	CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis
04/2022	GPT-NeoX-20B: An Open-Source Autoregressive Language Model
11/2022	BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
12/2022	DDColor: Towards Photo-Realistic Image Colorization via Dual Decoders
04/2023	Visual Instruction Tuning (LLaVA)
05/2023	StarCoder: May the source be with you!
05/2023	CodeGen2: Lessons for Training LLMs on Programming and Natural Languages
05/2023	Otter: A Multi-Modal Model with In-Context Instruction Tuning
05/2023	InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
05/2023	CodeT5+: Open Code Large Language Models for Code Understanding and Generation
05/2023	ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
05/2023	RWKV: Reinventing RNNs for the Transformer Era
05/2023	Lion: Adversarial Distillation of Closed-Source Large Language Model
05/2023	MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training
06/2023	Segment Anything in High Quality
06/2023	Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
06/2023	High-Fidelity Audio Compression with Improved RVQGAN (DAC)
06/2023	StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
06/2023	Anticipatory Music Transformer
06/2023	RepoFusion: Training Code Models to Understand Your Repository
06/2023	MPT-30B: Raising the bar for open-source foundation models
06/2023	Vec2Vec: A Compact Neural Network Approach for Transforming Text Embeddings with High Fidelity
06/2023	ViNT: A Foundation Model for Visual Navigation
06/2023	How Long Can Open-Source LLMs Truly Promise on Context Length? (LongChat)
07/2023	Hierarchical Open-vocabulary Universal Image Segmentation
07/2023	Focused Transformer: Contrastive Training for Context Scaling (LongLLaMA
07/2023	Rhythm Modeling for Voice Conversion (Urhythmic)
07/2023	Scaling TransNormer to 175 Billion Parameters
08/2023	Separate Anything You Describe
08/2023	StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data
09/2023	RADIO: Reference-Agnostic Dubbing Video Synthesis
09/2023	Matcha-TTS: A fast TTS architecture with conditional flow matching
09/2023	DreamLLM: Synergistic Multimodal Comprehension and Creation
09/2023	Baichuan 2: Open Large-scale Language Models
09/2023	Qwen Technical Report
09/2023	Mistral 7B
10/2023	MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning
10/2023	Improved Baselines with Visual Instruction Tuning (LLaVA 1.5)
10/2023	LLark: A Multimodal Foundation Model for Music
10/2023	SALMONN: Towards Generic Hearing Abilities for Large Language Models
10/2023	Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents
11/2023	Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
11/2023	UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
11/2023	YUAN 2.0: A Large Language Model with Localized Filtering-based Attention
12/2023	Making Large Multimodal Models Understand Arbitrary Visual Prompts (ViP-LLaVA)
12/2023	Mamba: Linear-Time Sequence Modeling with Selective State Spaces
12/2023	OpenVoice: Versatile Instant Voice Cloning
12/2023	Sequential Modeling Enables Scalable Learning for Large Vision Models (LVM)
12/2023	Magicoder: Source Code Is All You Need
12/2023	StripedHyena-7B, open source models offering a glimpse into a world beyond Transformers
12/2023	MMM: Generative Masked Motion Model
12/2023	4M: Massively Multimodal Masked Modeling
12/2023	LLM360: Towards Fully Transparent Open-Source LLMs
12/2023	SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
01/2024	Mixtral of Experts
01/2024	EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
01/2024	Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications
01/2024	Scalable Pre-training of Large Autoregressive Image Models
01/2024	Orion-14B: Open-source Multilingual Large Language Models
01/2024	Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
01/2024	VMamba: Visual State Space Model
01/2024	MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
01/2024	LLaVA-1.6: Improved reasoning, OCR, and world knowledge
01/2024	MiniCPM: Unveiling the Potential of End-side Large Language Models
01/2024	Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
02/2024	Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State Spaces
02/2024	Introducing Qwen1.5
02/2024	BlackMamba: Mixture of Experts for State-Space Models
02/2024	EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
02/2024	GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators
02/2024	Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion
02/2024	Brant-2: Foundation Model for Brain Signals
02/2024	CLLMs: Consistency Large Language Models
03/2024	Scaling Rectified Flow Transformers for High-Resolution Image Synthesis (SD3)
03/2024	TripoSR: Fast 3D Object Reconstruction from a Single Image
03/2024	Yi: Open Foundation Models by 01.AI
03/2024	VideoMamba: State Space Model for Efficient Video Understanding
03/2024	VOICECRAFT: Zero-Shot Speech Editing and Text-to-Speech in the Wild
03/2024	GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation
03/2024	DBRX: A New State-of-the-Art Open LLM
03/2024	AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
03/2024	Jamba: A Hybrid Transformer-Mamba Language Model
04/2024	Advancing LLM Reasoning Generalists with Preference Trees (Eurus)
04/2024	Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction (VAR)
04/2024	Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
04/2024	Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
05/2024	Language-Image Models with 3D Understanding (Cube-LLM)
05/2024	AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding
05/2024	Pandora : Towards General World Model with Natural Language Actions and Video State
05/2024	TerDiT: Ternary Diffusion Models with Transformers
05/2024	NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models
05/2024	Phased Consistency Model
05/2024	MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series
05/2024	YOLOv10: Real-Time End-to-End Object Detection
05/2024	MegActor: Harness the Power of Raw Video for Vivid Portrait Animation
06/2024	Bootstrap3D: Improving 3D Content Creation with Synthetic Data
06/2024	EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture
06/2024	ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec
06/2024	GrootVL: Tree Topology is All You Need in State Space Model
06/2024	An Independence-promoting Loss for Music Generation with Language Models (MusicGen-MMD)
06/2024	Matching Anything by Segmenting Anything
06/2024	Nemotron-4 340B Technical Report
06/2024	TroL: Traversal of Layers for Large Language and Vision Models
06/2024	Depth Anything V2
06/2024	HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale
06/2024	Network Bending of Diffusion Models for Audio-Visual Generation
06/2024	Less is More: Accurate Speech Recognition & Translation without Web-Scale Data (Canary)
07/2024	LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control
07/2024	Qwen2 Technical Report
07/2024	Qwen2-Audio Technical Report
07/2024	ColPali: Efficient Document Retrieval with Vision Language Models
07/2024	Compact Language Models via Pruning and Knowledge Distillation (Minitron)
08/2024	mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
08/2024	Jamba-1.5: Hybrid Transformer-Mamba Models at Scale
08/2024	SelectTTS: Synthesizing Anyone's Voice via Discrete Unit-Based Frame Selection
09/2024	OLMoE: Open Mixture-of-Experts Language Models
09/2024	Sample-Efficient Diffusion for Text-To-Speech Synthesis (SESD)
09/2024	Multi-Source Music Generation with Latent Diffusion (MSLDM)
09/2024	Prithvi WxC: Foundation Model for Weather and Climate
09/2024	DiffEditor: Enhancing Speech Editing with Semantic Enrichment and Acoustic Consistency
09/2024	Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
09/2024	MIO: A Foundation Model on Multimodal Tokens
09/2024	Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts
10/2024	UniMuMo: Unified Text, Music and Motion Generation
10/2024	RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation
10/2024	Aria: An Open Multimodal Native Mixture-of-Experts Model
10/2024	Taipan: Efficient and Expressive State Space Language Models with Selective Attention
10/2024	DreamCraft3D++: Efficient Hierarchical 3D Generation with Multi-Plane Reconstruction Model
10/2024	EMMA: End-to-End Multimodal Model for Autonomous Driving
11/2024	Hymba: A Hybrid-head Architecture for Small Language Models
11/2024	Multimodal Autoregressive Pre-training of Large Vision Encoders (AIMv2)
11/2024	High-Resolution Image Synthesis via Next-Token Prediction (D-JEPA⋅T2I)
11/2024	MambaIRv2: Attentive State Space Restoration
11/2024	CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation
12/2024	Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Diffusion Transformer Networks
12/2024	INTELLECT-1 Technical Report
12/2024	HunyuanVideo: A Systematic Framework For Large Video Generation Model Training
12/2024	LatentSpeech: Latent Diffusion for Text-To-Speech Generation
12/2024	CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models
12/2024	AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling
12/2024	Qwen2.5 Technical Report
12/2024	1.58-bit FLUX
01/2025	MiniMax-01: Scaling Foundation Models with Lightning Attention
02/2025	Metis: A Foundation Speech Generation Model with Masked Generative Pre-training
02/2025	Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance
02/2025	Large Language Diffusion Models
02/2025	High-Fidelity Music Vocoder using Neural Audio Codecs
03/2025	Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs
03/2025	Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
03/2025	Atlas: Multi-Scale Attention Improves Long Context Image Modeling
03/2025	TULIP: Towards Unified Language-Image Pretraining
03/2025	Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
03/2025	Every Sample Matters: Leveraging Mixture-of-Experts and High-Quality Data for Efficient and Accurate Code LLM (Ling Coder Lite)
04/2025	Kimi-VL Technical Report
04/2025	Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning
04/2025	M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
04/2025	AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset
04/2025	MAGI-1: Autoregressive Video Generation at Scale
04/2025	Kimi-Audio Technical Report
05/2025	Llama-Nemotron: Efficient Reasoning Models
05/2025	UniVLA: Learning to Act Anywhere with Task-centric Latent Actions
05/2025	INTELLECT-2: A Reasoning Model Trained Through Globally Decentralized Reinforcement Learning
05/2025	MMaDA: Multimodal Large Diffusion Language Models
05/2025	AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning
05/2025	Music Source Restoration
06/2025	AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy
06/2025	MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
07/2025	Kimi K2: Open Agentic Intelligence
07/2025	Voxtral

	Various
09/2014	Neural Machine Translation by Jointly Learning to Align and Translate
06/2019	Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View
10/2019	Root Mean Square Layer Normalization
10/2019	Transformers without Tears: Improving the Normalization of Self-Attention
12/2019	Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection
02/2020	On Layer Normalization in the Transformer Architecture
04/2020	Longformer: The Long-Document Transformer
04/2020	Improved Natural Language Generation via Loss Truncation
06/2020	Memory Transformer
07/2020	Mirostat: A Neural Text Decoding Algorithm that Directly Controls Perplexity
12/2020	ERNIE-Doc: A Retrospective Long-Document Modeling Transformer
01/2021	Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks
03/2021	The Low-Rank Simplicity Bias in Deep Networks
04/2021	RoFormer: Enhanced Transformer with Rotary Position Embedding
06/2021	LoRA: Low-Rank Adaptation of Large Language Models
07/2023	CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention
03/2022	Memorizing Transformers
04/2022	UL2: Unifying Language Learning Paradigms
05/2022	Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning (IA3)
06/2022	nuQmm: Quantized MatMul for Efficient Inference of Large-Scale Generative Language Models
07/2022	Language Models (Mostly) Know What They Know
08/2022	LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
09/2022	Petals: Collaborative Inference and Fine-tuning of Large Models
10/2022	GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
10/2022	Recurrent Memory Transformer
10/2022	Truncation Sampling as Language Model Desmoothing
10/2022	DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation
11/2022	An Algorithm for Routing Vectors in Sequences
11/2022	MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
12/2022	Self-Instruct: Aligning Language Model with Self Generated Instructions
12/2022	Parallel Context Windows Improve In-Context Learning of Large Language Models
12/2022	Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor
12/2022	Pretraining Without Attention
12/2022	The case for 4-bit precision: k-bit Inference Scaling Laws
12/2022	Prompting Is Programming: A Query Language for Large Language Models
01/2023	SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
01/2023	SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
01/2023	Memory Augmented Large Language Models are Computationally Universal
01/2023	Progress measures for grokking via mechanistic interpretability
01/2023	Adaptive Computation with Elastic Input Sequence
02/2023	Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models
02/2023	The Wisdom of Hindsight Makes Language Models Better Instruction Followers
02/2023	The Stable Entropy Hypothesis and Entropy-Aware Decoding: An Analysis and Algorithm for Robust Natural Language Generation
03/2023	COLT5: Faster Long-Range Transformers with Conditional Computation
03/2023	High-throughput Generative Inference of Large Language Models with a Single GPU
03/2023	Meet in the Middle: A New Pre-training Paradigm
03/2023	Reflexion: an autonomous agent with dynamic memory and self-reflection
03/2023	Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning
03/2023	FP8 versus INT8 for efficient deep learning inference
03/2023	Self-Refine: Iterative Refinement with Self-Feedback
04/2023	RPTQ: Reorder-based Post-training Quantization for Large Language Models
04/2023	REFINER: Reasoning Feedback on Intermediate Representations
04/2023	Generative Agents: Interactive Simulacra of Human Behavior
04/2023	Compressed Regression over Adaptive Networks
04/2023	A Cheaper and Better Diffusion Language Model with Soft-Masked Noise
04/2023	RRHF: Rank Responses to Align Language Models with Human Feedback without tears
04/2023	CAMEL: Communicative Agents for "Mind" Exploration of Large Scale Language Model Society
04/2023	Automatic Gradient Descent: Deep Learning without Hyperparameters
04/2023	SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models
04/2023	Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study
04/2023	Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling
04/2023	Scaling Transformer to 1M tokens and beyond with RMT
04/2023	Answering Questions by Meta-Reasoning over Multiple Chains of Thought
04/2023	Towards Multi-Modal DBMSs for Seamless Querying of Texts and Tables
04/2023	We're Afraid Language Models Aren't Modeling Ambiguity
04/2023	The Internal State of an LLM Knows When its Lying
04/2023	Search-in-the-Chain: Towards the Accurate, Credible and Traceable Content Generation for Complex Knowledge-intensive Tasks
05/2023	Towards Unbiased Training in Federated Open-world Semi-supervised Learning
05/2023	Unlimiformer: Long-Range Transformers with Unlimited Length Input
05/2023	FreeLM: Fine-Tuning-Free Language Model
05/2023	Cuttlefish: Low-rank Model Training without All The Tuning
05/2023	AttentionViz: A Global View of Transformer Attention
05/2023	Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models
05/2023	A Frustratingly Easy Improvement for Position Embeddings via Random Padding
05/2023	Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
05/2023	Explanation-based Finetuning Makes Models More Robust to Spurious Cues
05/2023	An automatically discovered chain-of-thought prompt generalizes to novel models and datasets
05/2023	Recommender Systems with Generative Retrieval
05/2023	Fast Distributed Inference Serving for Large Language Models
05/2023	Chain-of-Dictionary Prompting Elicits Translation in Large Language Models
05/2023	Recommendation as Instruction Following: A Large Language Model Empowered Recommendation Approach
05/2023	Active Retrieval Augmented Generation
05/2023	Scalable Coupling of Deep Learning with Logical Reasoning
05/2023	Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
05/2023	StructGPT: A General Framework for Large Language Model to Reason over Structured Data
05/2023	Pre-Training to Learn in Context
05/2023	ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings
05/2023	Accelerating Transformer Inference for Translation via Parallel Decoding
05/2023	Cooperation Is All You Need
05/2023	PTQD: Accurate Post-Training Quantization for Diffusion Models
05/2023	LLM-Pruner: On the Structural Pruning of Large Language Models
05/2023	SelfzCoT: a Self-Prompt Zero-shot CoT from Semantic-level to Code-level for a Better Utilization of LLMs
05/2023	QLoRA: Efficient Finetuning of Quantized LLMs
05/2023	"According to ..." Prompting Language Models Improves Quoting from Pre-Training Data
05/2023	Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training
05/2023	Landmark Attention: Random-Access Infinite Context Length for Transformers
05/2023	Scaling Data-Constrained Language Models
05/2023	Fine-Tuning Language Models with Just Forward Passes
05/2023	Intriguing Properties of Quantization at Scale
05/2023	Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time
05/2023	Blockwise Parallel Transformer for Long Context Large Models
05/2023	The Impact of Positional Encoding on Length Generalization in Transformers
05/2023	Adapting Language Models to Compress Contexts
05/2023	Direct Preference Optimization: Your Language Model is Secretly a Reward Model
06/2023	AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
06/2023	Faster Causal Attention Over Large Sequences Through Sparse Flash Attention
06/2023	Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
06/2023	SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression
06/2023	Fine-Tuning Language Models with Advantage-Induced Policy Alignment
06/2023	Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards
06/2023	Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
06/2023	Mixture-of-Domain-Adapters: Decoupling and Injecting Domain Knowledge to Pre-trained Language Models Memories
06/2023	Boosting Fast and High-Quality Speech Synthesis with Linear Diffusion
06/2023	Word sense extension
06/2023	Mitigating Transformer Overconfidence via Lipschitz Regularization
06/2023	Recurrent Attention Networks for Long-text Modeling
06/2023	One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning
06/2023	SqueezeLLM: Dense-and-Sparse Quantization
06/2023	Tune As You Scale: Hyperparameter Optimization For Compute Efficient Training
06/2023	Propagating Knowledge Updates to LMs Through Distillation
06/2023	Full Parameter Fine-tuning for Large Language Models with Limited Resources
06/2023	A Simple and Effective Pruning Approach for Large Language Models
06/2023	InRank: Incremental Low-Rank Learning
06/2023	Evaluating the Zero-shot Robustness of Instruction-tuned Language Models
06/2023	Learning to Generate Better Than Your LLM (RLGF)
06/2023	Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing
06/2023	H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Model
06/2023	FLuRKA: Fast fused Low-Rank & Kernel Attention
06/2023	Stay on topic with Classifier-Free Guidance
07/2023	AutoST: Training-free Neural Architecture Search for Spiking Transformers
07/2023	Single Sequence Prediction over Reasoning Graphs for Multi-hop QA
07/2023	Shifting Attention to Relevance: Towards the Uncertainty Estimation of Large Language Models
07/2023	Facing off World Model Backbones: RNNs, Transformers, and S4
07/2023	Improving Retrieval-Augmented Large Language Models via Data Importance Learning
07/2023	Teaching Arithmetic to Small Transformers
07/2023	QIGen: Generating Efficient Kernels for Quantized Inference on Large Language Models
07/2023	Stack More Layers Differently: High-Rank Training Through Low-Rank Updates
07/2023	Copy Is All You Need (CoG)
07/2023	Multi-Method Self-Training: Improving Code Generation With Text, And Vice Versa
07/2023	Divide & Bind Your Attention for Improved Generative Semantic Nursing
07/2023	Challenges and Applications of Large Language Models
07/2023	Soft Prompt Tuning for Augmenting Dense Retrieval with Large Language Models
07/2023	QuIP: 2-Bit Quantization of Large Language Models With Guarantees
07/2023	CoRe Optimizer: An All-in-One Solution for Machine Learning
07/2023	Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
08/2023	ReCLIP: Refine Contrastive Language Image Pre-Training with Source Free Domain Adaptation
08/2023	EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language Models
08/2023	Activation Addition: Steering Language Models Without Optimization
08/2023	OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models
08/2023	Accelerating LLM Inference with Staged Speculative Decoding
08/2023	YaRN: Efficient Context Window Extension of Large Language Models
08/2023	LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models
09/2023	Making Large Language Models Better Reasoners with Alignment
09/2023	Data-Juicer: A One-Stop Data Processing System for Large Language Models
09/2023	Delta-LoRA: Fine-Tuning High-Rank Parameters with the Delta of Low-Rank Matrices
09/2023	SLiMe: Segment Like Me
09/2023	Norm Tweaking: High-performance Low-bit Quantization of Large Language Models
09/2023	When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale
09/2023	Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
09/2023	Efficient Memory Management for Large Language Model Serving with PagedAttention
09/2023	Cure the headache of Transformers via Collinear Constrained Attention
09/2023	Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
09/2023	LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
09/2023	MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation
09/2023	Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models
09/2023	Improving Code Generation by Dynamic Temperature Sampling
09/2023	Efficient Streaming Language Models with Attention Sinks
10/2023	DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models
10/2023	GrowLength: Accelerating LLMs Pretraining by Progressively Growing Training Length
10/2023	Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image Diffusion Models
10/2023	Elephant Neural Networks: Born to Be a Continual Learner
10/2023	Ring Attention with Blockwise Transformers for Near-Infinite Context
10/2023	Retrieval meets Long Context Large Language Models
10/2023	DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
10/2023	LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers
10/2023	Amortizing intractable inference in large language models (GFlowNet Tuning)
10/2023	SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF
10/2023	Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity
10/2023	Let Models Speak Ciphers: Multiagent Debate through Embeddings
10/2023	InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining
10/2023	CacheGen: Fast Context Loading for Language Model Applications
10/2023	MatFormer: Nested Transformer for Elastic Inference
10/2023	LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
10/2023	Towards End-to-end 4-Bit Inference on Generative Large Language Models (QUIK)
10/2023	Microscaling Data Formats for Deep Learning
10/2023	xVal: A Continuous Number Encoding for Large Language Models
10/2023	An Emulator for Fine-Tuning Large Language Models using Small Language Models
10/2023	Frozen Transformers in Language Models Are Effective Visual Encoder Layers
10/2023	LoBaSS: Gauging Learnability in Supervised Fine-tuning Data
10/2023	Quality-Diversity through AI Feedback
10/2023	Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution (SEDD)
10/2023	DoGE: Domain Reweighting with Generalization Estimation
10/2023	E-Sparse: Boosting the Large Language Model Inference through Entropy-based N:M Sparsity
10/2023	Mixture of Tokens: Efficient LLMs through Cross-Example Aggregation
10/2023	Personas as a Way to Model Truthfulness in Language Models
10/2023	Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
10/2023	QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
11/2023	AWEQ: Post-Training Quantization with Activation-Weight Equalization for Large Language Models
11/2023	FlashDecoding++: Faster Large Language Model Inference on GPUs
11/2023	Divergent Token Metrics: Measuring degradation to prune away LLM components -- and optimize quantization
11/2023	Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs
11/2023	REST: Retrieval-Based Speculative Decoding
11/2023	DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines
11/2023	Token-level Adaptation of LoRA Adapters for Downstream Task Generalization
11/2023	Exponentially Faster Language Modelling
11/2023	MultiLoRA: Democratizing LoRA for Better Multi-Task Learning
11/2023	LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model Finetuning
11/2023	Token Recycling for Efficient Sequential Inference with Vision Transformers
11/2023	Enabling Fast 2-bit LLM on GPUs: Memory Alignment, Sparse Outlier, and Asynchronous Dequantization
12/2023	GIFT: Generative Interpretable Fine-Tuning Transformers
12/2023	PEFA: Parameter-Free Adapters for Large-scale Embedding-based Retrieval Models
12/2023	Improving Activation Steering in Language Models with Mean-Centring
12/2023	A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA
12/2023	SparQ Attention: Bandwidth-Efficient LLM Inference
12/2023	ESPN: Memory-Efficient Multi-Vector Information Retrieval
12/2023	Aligner: One Global Token is Worth Millions of Parameters When Aligning Large Language Models
12/2023	CBQ: Cross-Block Quantization for Large Language Models
12/2023	SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
12/2023	Weight subcloning: direct initialization of transformers using larger pretrained ones
12/2023	Cascade Speculative Drafting for Even Faster LLM Inference
12/2023	ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference
12/2023	Lookahead: An Inference Acceleration Framework for Large Language Model with Lossless Generation Accuracy
12/2023	A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Descriptive Properties
12/2023	Algebraic Positional Encodings
12/2023	Preference as Reward, Maximum Preference Optimization with Importance Sampling
01/2024	LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
01/2024	Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
01/2024	LLaMA Pro: Progressive LLaMA with Block Expansion
01/2024	Fast and Optimal Weight Update for Pruned Large Language Models
01/2024	Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
01/2024	MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
01/2024	Chain of LoRA: Efficient Fine-tuning of Language Models via Residual Learning
01/2024	RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation
01/2024	Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
01/2024	AUTOACT: Automatic Agent Learning from Scratch via Self-Planning
01/2024	Extreme Compression of Large Language Models via Additive Quantization (AQLM)
01/2024	Knowledge Translation: A New Pathway for Model Compression
01/2024	Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
01/2024	Transformers are Multi-State RNNs
01/2024	Extending LLMs' Context Window with 100 Samples (Entropy-ABF)
01/2024	ChatQA: Building GPT-4 Level Conversational QA Models
01/2024	AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence Inference
01/2024	Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
01/2024	Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation
01/2024	BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language Models
01/2024	Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment
01/2024	Dynamic Layer Tying for Parameter-Efficient Transformers
01/2024	MambaByte: Token-free Selective State Space Model
01/2024	FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design
01/2024	Accelerating Retrieval-Augmented Language Model Serving with Speculation
01/2024	Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
01/2024	EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
01/2024	With Greater Text Comes Greater Necessity: Inference-Time Training Helps Long Text Generation (Temp LoRA)
01/2024	YODA: Teacher-Student Progressive Learning for Language Models
01/2024	KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
01/2024	LOCOST: State-Space Models for Long Document Abstractive Summarization
01/2024	Convolution Meets LoRA: Parameter Efficient Finetuning for Segment Anything Model
01/2024	RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
02/2024	EE-Tuning: An Economical yet Scalable Solution for Tuning Early-Exit Large Language Models
02/2024	MoDE: A Mixture-of-Experts Model with Mutual Distillation among the Experts
02/2024	Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
02/2024	Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
02/2024	HiQA: A Hierarchical Contextual Augmentation RAG for Massive Documents QA
02/2024	KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
02/2024	DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing
02/2024	QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks
02/2024	Hydragen: High-Throughput LLM Inference with Shared Prefixes
02/2024	Hydra: Sequentially-Dependent Draft Heads for Medusa Decoding
02/2024	LESS: Selecting Influential Data for Targeted Instruction Tuning
02/2024	Accurate LoRA-Finetuning Quantization of LLMs via Information Retention
02/2024	AttnLRP: Attention-Aware Layer-wise Relevance Propagation for Transformers
02/2024	X-LoRA: Mixture of Low-Rank Adapter Experts, a Flexible Framework for Large Language Models with Applications in Protein Mechanics
02/2024	BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
02/2024	Mitigating Object Hallucination in Large Vision-Language Models via Classifier-Free Guidance
02/2024	Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference
02/2024	Uncertainty Decomposition and Quantification for In-Context Learning of Large Language Models
02/2024	RS-DPO: A Hybrid Rejection Sampling and Direct Preference Optimization Method for Alignment of Large Language Models
02/2024	BitDelta: Your Fine-Tune May Only Be Worth One Bit
02/2024	DoRA: Weight-Decomposed Low-Rank Adaptation
02/2024	In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss
02/2024	Aligning Modalities in Vision Large Language Models via Preference Fine-tuning
02/2024	Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct Decoding
02/2024	Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts
02/2024	WKVQuant: Quantizing Weight and Key/Value Cache for Large Language Models Gains More
02/2024	DB-LLM: Accurate Dual-Binarization for Efficient LLMs
02/2024	Data Engineering for Scaling Language Models to 128K Context
02/2024	EBFT: Effective and Block-Wise Fine-Tuning for Sparse LLMs
02/2024	HyperMoE: Towards Better Mixture of Experts via Transferring Among Experts
02/2024	Turn Waste into Worth: Rectifying Top
02/2024	Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive
02/2024	Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
02/2024	Take the Bull by the Horns: Hard Sample-Reweighted Continual Training Improves LLM Generalization
02/2024	MemoryPrompt: A Light Wrapper to Improve Context Tracking in Pre-trained Language Models
02/2024	Fine-tuning CLIP Text Encoders with Two-step Paraphrasing
02/2024	BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation
02/2024	No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization
02/2024	DropBP: Accelerating Fine-Tuning of Large Language Models by Dropping Backward Propagation
02/2024	CoDream: Exchanging dreams instead of models for federated aggregation with heterogeneous models
02/2024	Humanoid Locomotion as Next Token Prediction
02/2024	KTO: Model Alignment as Prospect Theoretic Optimization
02/2024	Noise Contrastive Alignment of Language Models with Explicit Rewards (NCA)
02/2024	ReLU2 Wins: Discovering Efficient Activation Functions for Sparse LLMs
02/2024	Training-Free Long-Context Scaling of Large Language Models (DCA)
03/2024	Not all Layers of LLMs are Necessary during Inference
03/2024	Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models
03/2024	DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models
03/2024	GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
03/2024	Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding
03/2024	Scattered Mixture-of-Experts Implementation
03/2024	AutoLoRA: Automatically Tuning Matrix Ranks in Low-Rank Adaptation Based on Meta Learning
03/2024	BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences
03/2024	Bifurcated Attention for Single-Context Large-Batch Sampling
03/2024	Keyformer: KV Cache Reduction through Key Tokens Selection for Efficient Generative Inference
03/2024	Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering
03/2024	Recurrent Drafter for Fast Speculative Decoding in Large Language Models
03/2024	Arcee's MergeKit: A Toolkit for Merging Large Language Models
03/2024	Rotary Position Embedding for Vision Transformer
03/2024	BiLoRA: A Bi-level Optimization Framework for Overfitting-Resilient Low-Rank Adaptation of Large Pre-trained Models
03/2024	Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition
03/2024	DreamReward: Text-to-3D Generation with Human Preference
03/2024	Evolutionary Optimization of Model Merging Recipes
03/2024	Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance
03/2024	When Do We Not Need Larger Vision Models?
03/2024	FeatUp: A Model-Agnostic Framework for Features at Any Resolution
03/2024	ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching
03/2024	The Unreasonable Ineffectiveness of the Deeper Layers
03/2024	QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs
04/2024	LLM-ABR: Designing Adaptive Bitrate Algorithms via Large Language Models
04/2024	Prompt-prompted Mixture of Experts for Efficient LLM Generation (GRIFFIN)
04/2024	BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models
04/2024	SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget
04/2024	CodecLM: Aligning Language Models with Tailored Synthetic Data
04/2024	Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation
04/2024	Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs
04/2024	Continuous Language Model Interpolation for Dynamic and Controllable Text Generation
04/2024	RULER: What's the Real Context Size of Your Long-Context Language Models?
04/2024	Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models
04/2024	On Speculative Decoding for Multimodal Large Language Models
04/2024	CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models
04/2024	Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs
04/2024	Fewer Truncations Improve Language Modeling
04/2024	When LLMs are Unfit Use FastFit: Fast and Effective Text Classification with Many Classes
04/2024	Learn2Talk: 3D Talking Face Learns from 2D Talking Face
04/2024	Weak-to-Strong Extrapolation Expedites Alignment (EXPO)
04/2024	decoupleQ: Towards 2-bit Post-Training Uniform Quantization via decoupling Parameters into Integer and Floating Points
04/2024	RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation
04/2024	Lossless Acceleration of Large Language Model via Adaptive N-gram Parallel Decoding
04/2024	Mixture of LoRA Experts
04/2024	MARVEL: Multidimensional Abstraction and Reasoning through Visual Evaluation and Learning
04/2024	XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts
04/2024	Retrieval Head Mechanistically Explains Long-Context Factuality
04/2024	Let's Think Dot by Dot: Hidden Computation in Transformer Language Models
04/2024	Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting
05/2024	When to Retrieve: Teaching LLMs to Utilize Information Retrieval Effectively
05/2024	A Careful Examination of Large Language Model Performance on Grade School Arithmetic
05/2024	Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge
05/2024	Parameter-Efficient Fine-Tuning with Discrete Fourier Transform
05/2024	COPAL: Continual Pruning in Large Language Generative Models
05/2024	Revisiting a Pain in the Neck: Semantic Phrase Processing Benchmark for Language Models
05/2024	AlphaMath Almost Zero: process Supervision without process
05/2024	QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
05/2024	xLSTM: Extended Long Short-Term Memory
05/2024	FlashBack:Efficient Retrieval-Augmented Language Modeling for Long Context Inference
05/2024	SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
05/2024	HMT: Hierarchical Memory Transformer for Long Context Language Processing
05/2024	The Future of Large Language Model Pre-training is Federated
05/2024	Layer-Condensed KV Cache for Efficient Inference of Large Language Models
05/2024	MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
05/2024	SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model
05/2024	Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
05/2024	Bagging Improves Generalization Exponentially
05/2024	Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models
05/2024	Unchosen Experts Can Contribute Too: Unleashing MoE Models' Power by Self-Contrast
05/2024	Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum
05/2024	T2 of Thoughts: Temperature Tree Elicits Reasoning in Large Language Models
05/2024	ReALLM: A general framework for LLM compression and fine-tuning
05/2024	SimPO: Simple Preference Optimization with a Reference-Free Reward
05/2024	PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression
05/2024	Removing Bias from Maximum Likelihood Estimation with Model Autophagy
05/2024	RE-Adapt: Reverse Engineered Adaptation of Large Language Models
05/2024	MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence
05/2024	Data Mixing Made Efficient: A Bivariate Scaling Law for Language Model Pretraining
05/2024	Accelerating Transformers with Spectrum-Preserving Token Merging
05/2024	A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training
05/2024	MoEUT: Mixture-of-Experts Universal Transformers
05/2024	Exploring Context Window of Large Language Models via Decomposed Positional Vectors
05/2024	Transformers Can Do Arithmetic with the Right Embeddings
05/2024	OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning
05/2024	MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification
05/2024	Self-Play Preference Optimization for Language Model Alignment
05/2024	The Road Less Scheduled(Schedule-Free)
06/2024	FineWeb: decanting the web for the finest text data at scale
06/2024	Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality (Mamba-2)
06/2024	Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization
06/2024	DeCoOp: Robust Prompt Tuning with Out-of-Distribution Detection
06/2024	MultiMax: Sparse and Multi-Modal Attention Learning
06/2024	MagR: Weight Magnitude Reduction for Enhancing Post-Training Quantization
06/2024	Mix-of-Granularity: Optimize the Chunking Granularity for Retrieval-Augmented Generation
06/2024	QuanTA: Efficient High-Rank Fine-Tuning of LLMs with Quantum-Informed Tensor Adaptation
06/2024	SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining
06/2024	Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
06/2024	VCR: Visual Caption Restoration
06/2024	LoCoCo: Dropping In Convolutions for Long Context Compression
06/2024	Low-Rank Quantization-Aware Training for LLMs
06/2024	Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters
06/2024	DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion
06/2024	TernaryLLM: Ternarized Large Language Model
06/2024	Image and Video Tokenization with Binary Spherical Quantization
06/2024	Discovering Preference Optimization Algorithms with and for Large Language Models
06/2024	ProTrain: Efficient LLM Training via Memory-Aware Techniques
06/2024	PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling
06/2024	Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
06/2024	Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs
06/2024	HiP Attention: Sparse Sub-Quadratic Attention with Hierarchical Attention Pruning
06/2024	LieRE: Generalizing Rotary Position Encodings
06/2024	DiTTo-TTS: Efficient and Scalable Zero-Shot Text-to-Speech with Diffusion Transformer
06/2024	Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
06/2024	Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies
06/2024	mDPO: Conditional Preference Optimization for Multimodal Large Language Models
06/2024	QTIP: Quantization with Trellises and Incoherence Processing
06/2024	Mixture-of-Subspaces in Low-Rank Adaptation (MoSLoRA)
06/2024	Prefixing Attention Sinks can Mitigate Activation Outliers for Large Language Model Quantization
06/2024	Mixture of Scales: Memory-Efficient Token-Adaptive Binarization for Large Language Models
06/2024	DeciMamba: Exploring the Length Extrapolation Potential of Mamba
06/2024	Optimised Grouped-Query Attention Mechanism for Transformers
06/2024	MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression
06/2024	Unsupervised Morphological Tree Tokenizer
06/2024	Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation
06/2024	What Matters in Transformers? Not All Attention is Needed
06/2024	Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention
06/2024	ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models
06/2024	Adam-mini: Use Fewer Learning Rates To Gain More
06/2024	Large Language Models are Interpretable Learners
06/2024	Selective Prompting Tuning for Personalized Conversations with LLMs
06/2024	Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs
06/2024	Unsupervised Morphological Tree Tokenizer (TreeTok)
07/2024	Eliminating Position Bias of Language Models: A Mechanistic Approach
07/2024	Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion
07/2024	Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs
07/2024	LoCo: Low-Bit Communication Adaptor for Large-scale Model Training
07/2024	Code Less, Align More: Efficient LLM Fine-tuning for Code Generation with Data Pruning
07/2024	Learning to (Learn at Test Time): RNNs with Expressive Hidden States (TTT)
07/2024	Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps
07/2024	Mixture-of-Modules: Reinventing Transformers as Dynamic Assemblies of Modules
07/2024	OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training
07/2024	Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization
07/2024	Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients
07/2024	FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
07/2024	Lite-SAM Is Actually What You Need for Segment Everything
07/2024	BitNet b1.58 Reloaded: State-of-the-art Performance Also on Smaller Networks
07/2024	Tiled Bit Networks: Sub-Bit Neural Network Compression Through Reuse of Learnable Binary Vectors
07/2024	Patch-Level Training for Large Language Models
07/2024	Correcting the Mythos of KL-Regularization: Direct Alignment without Overparameterization via Chi-squared Preference Optimization
07/2024	LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference
07/2024	Hi-EF: Benchmarking Emotion Forecasting in Human-interaction
07/2024	RazorAttention: Efficient KV Cache Compression Through Retrieval Heads
07/2024	MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning
07/2024	Palu: Compressing KV-Cache with Low-Rank Projection
07/2024	AI-Assisted Generation of Difficult Math Questions (MATH^2)
07/2024	EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
07/2024	Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies
08/2024	POA: Pre-training Once for Models of All Sizes
08/2024	An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion
08/2024	Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
08/2024	Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion
08/2024	Eigen Attention: Attention in Low-Rank Space for KV Cache Compression
08/2024	The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
08/2024	Post-Training Sparse Attention with Double Sparsity
08/2024	A Spitting Image: Modular Superpixel Tokenization in Vision Transformers (SPiT)
08/2024	JPEG-LM: LLMs as Image Generators with Canonical Codec Representations
08/2024	SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models
08/2024	HMoE: Heterogeneous Mixture of Experts for Language Modeling
08/2024	First Activations Matter: Training-Free Methods for Dynamic Activation in Large Language Models
08/2024	LLM Pruning and Distillation in Practice: The Minitron Approach
08/2024	FocusLLM: Scaling LLM's Context by Parallel Decoding
08/2024	Memory-Efficient LLM Training with Online Subspace Descent
08/2024	MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
08/2024	Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer
09/2024	FedModule: A Modular Federated Learning Framework
09/2024	Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers
09/2024	STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning
09/2024	Length Desensitization in Directed Preference Optimization (LD-DPO)
09/2024	CPL: Critical Planning Step Learning Boosts LLM Generalization in Reasoning Tasks
09/2024	RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
09/2024	SOAP: Improving and Stabilizing Shampoo using Adam
09/2024	A Controlled Study on Long Context Extension and Generalization in LLMs
09/2024	Scaling FP8 training to trillion-token LLMs
09/2024	INT-FlashAttention: Enabling Flash Attention for INT8 Quantization
09/2024	Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction
09/2024	SATA: Spatial Autocorrelation Token Analysis for Enhancing the Robustness of Vision Transformers
10/2024	VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment
10/2024	FlashMask: Efficient and Rich Mask Extension of FlashAttention
10/2024	OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
10/2024	Parameter Competition Balancing for Model Merging
10/2024	SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
10/2024	ARB-LLM: Alternating Refined Binarizations for Large Language Models
10/2024	Contextual Document Embeddings
10/2024	SparsePO: Controlling Preference Alignment of LLMs via Sparse Token Masks
10/2024	Accelerating Diffusion Transformers with Token-wise Feature Caching
10/2024	Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling
10/2024	Restructuring Vector Quantization with the Rotation Trick
10/2024	Upcycling Large Language Models into Mixture of Experts
10/2024	Parameter-Efficient Fine-Tuning of State Space Models (SDLoRA)
10/2024	ElasticTok: Adaptive Tokenization for Image and Video
10/2024	LeanAgent: Lifelong Learning for Formal Theorem Proving
10/2024	LoLCATs: On Low-Rank Linearizing of Large Language Models
10/2024	DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
10/2024	SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction
10/2024	A Little Human Data Goes A Long Way
10/2024	SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training
10/2024	Mesa-Extrapolation: A Weave Position Encoding Method for Enhanced Extrapolation in LLMs
10/2024	FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs
10/2024	LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging
10/2024	AdaRankGrad: Adaptive Gradient-Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning
10/2024	Stick-breaking Attention
10/2024	COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
10/2024	HoPE: A Novel Positional Encoding Without Long-Term Decay for Enhanced Context Awareness and Extrapolation
10/2024	UFT: Unifying Fine-Tuning of SFT and RLHF/DPO/UNA through a Generalized Implicit Reward Function
10/2024	ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
10/2024	TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
11/2024	PatternBoost: Constructions in Mathematics with a Little Help from AI
11/2024	Inference Optimal VLMs Need Only One Visual Token but Larger Models
11/2024	LASER: Attention with Exponential Transformation
11/2024	LSHBloom: Memory-efficient, Extreme-scale Document Deduplication
11/2024	Aioli: A Unified Optimization Framework for Language Model Data Mixing
11/2024	Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning
11/2024	More Expressive Attention with Negative Weights (Cog Attention)
11/2024	The Surprising Effectiveness of Test-Time Training for Abstract Reasoning
11/2024	Entropy Controllable Direct Preference Optimization
11/2024	Cut Your Losses in Large-Vocabulary Language Models
11/2024	Everything is a Video: Unifying Modalities through Next-Frame Prediction
11/2024	SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration
11/2024	ComfyGI: Automatic Improvement of Image Generation Workflows
11/2024	DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models
11/2024	Efficient Long Video Tokenization via Coordinated-based Patch Reconstruction
11/2024	Cautious Optimizers: Improving Training with One Line of Code
11/2024	Star Attention: Efficient LLM Inference over Long Sequences
11/2024	Pushing the Limits of Large Language Model Quantization via the Linearity Theorem (HIGGS)
11/2024	Reverse Thinking Makes LLMs Stronger Reasoners
11/2024	DeMo: Decoupled Momentum Optimization
12/2024	COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection
12/2024	3D Face Reconstruction From Radar Images
12/2024	APOLLO: SGD-like Memory, AdamW-level Performance
12/2024	Flex Attention: A Programming Model for Generating Optimized Attention Kernels
12/2024	Mixture-of-PageRanks: Replacing Long-Context with Real-Time, Sparse GraphRAG
12/2024	No More Adam: Learning Rate Scaling at Initialization is All You Need
12/2024	Entropy-Regularized Process Reward Model
12/2024	Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN
12/2024	Multi-matrix Factorization Attention
12/2024	Gradient Weight-normalized Low-rank Projection for Efficient LLM Training
01/2025	BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning
01/2025	HALO: Hadamard-Assisted Lossless Optimization for Efficient Low-Precision LLM Training and Fine-Tuning
01/2025	Scaling Laws for Floating Point Quantization Training
01/2025	Tensor-GaLore: Memory-Efficient Training via Gradient Tensor Decomposition
01/2025	Tensor Product Attention Is All You Need
01/2025	SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
01/2025	Transformer2: Self-adaptive LLMs
01/2025	Physics of Skill Learning
01/2025	OstQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting
01/2025	TimeHF: Billion-Scale Time Series Models Guided by Human Feedback
01/2025	FBQuant: FeedBack Quantization for Large Language Models
01/2025	Optimizing Large Language Model Training Using FP4 Quantization
01/2025	RotateKV: Accurate and Robust 2-Bit KV Cache Quantization for LLMs via Outlier-Aware Adaptive Rotations
02/2025	s1: Simple test-time scaling
02/2025	Efficient Reasoning with Hidden Thinking
02/2025	Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models
02/2025	Beyond Limited Data: Self-play LLM Theorem Provers with Iterative Conjecturing and Proving
02/2025	Reward-aware Preference Optimization: A Unified Mathematical Framework for Model Alignment
02/2025	FastKV: KV Cache Compression for Fast Long-Context Processing with Token-Selective Propagation
02/2025	Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity
02/2025	QuEST: Stable Training of LLMs with 1-Bit Weights and Activations
02/2025	LIMR: Less is More for RL Scaling
02/2025	CoLA: Compute-Efficient Pre-Training of LLMs via Low-Rank Activation
02/2025	Diffusion Models without Classifier-free Guidance
02/2025	MERGE3: Efficient Evolutionary Merging on Consumer-grade GPUs
02/2025	S*: Test Time Scaling for Code Generation
02/2025	More for Keys, Less for Values: Adaptive KV Cache Quantization
02/2025	Muon is Scalable for LLM Training
02/2025	BigMac: A Communication-Efficient Mixture-of-Experts Model Structure for Fast Training and Inference
02/2025	Linear Attention for Efficient Bidirectional Sequence Modeling (LION)
02/2025	Slamming: Training a Speech Language Model on One GPU in a Day
02/2025	Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
02/2025	Fractal Generative Models
02/2025	COSMOS: A Hybrid Adaptive Optimizer for Memory-Efficient Training of LLMs
02/2025	ByteScale: Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000 GPUs
03/2025	FlowDec: A flow-based full-band general audio codec with high perceptual quality
03/2025	PowerAttention: Exponentially Scaling of Receptive Fields for Effective Sparse Attention
03/2025	HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
03/2025	DAPO: An Open-Source LLM Reinforcement Learning System at Scale
03/2025	Measuring AI Ability to Complete Long Tasks
03/2025	Enhancing Code LLM Training with Programmer Attention
03/2025	ϕ-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation
03/2025	Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model
04/2025	MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism
04/2025	Affordable AI Assistants with Knowledge Graph of Thoughts
04/2025	RaanA: A Fast, Flexible, and Data-Efficient Post-Training Quantization Algorithm
04/2025	Dion: A Communication-Efficient Optimizer for Large Models
04/2025	Hogwild! Inference: Parallel LLM Generation via Concurrent Attention
04/2025	KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference
04/2025	70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
04/2025	EasyEdit2: An Easy-to-use Steering Framework for Editing Large Language Models
04/2025	TTRL: Test-Time Reinforcement Learning
04/2025	Process Reward Models That Think
05/2025	NeMo-Inspector: A Visualization Tool for LLM Generation Analysis
05/2025	FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models
05/2025	RM-R1: Reward Modeling as Reasoning
05/2025	Absolute Zero: Reinforced Self-play Reasoning with Zero Data
05/2025	Rewriting Pre-Training Data Boosts LLM Performance in Math and Code
05/2025	HuB: Learning Extreme Humanoid Balance
05/2025	MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production
05/2025	SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training
05/2025	Model Merging in Pre-training of Large Language Models
05/2025	DreamGen: Unlocking Generalization in Robot Learning through Neural Trajectories
05/2025	Scaling Diffusion Transformers Efficiently via μP
05/2025	PaTH Attention: Position Encoding via Accumulating Householder Transformations
05/2025	FP4 All the Way: Fully Quantized Training of LLMs
05/2025	Accelerating AllReduce with a Persistent Straggler
05/2025	Hardware-Efficient Attention for Fast Decoding
06/2025	Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
06/2025	ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
06/2025	Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library
06/2025	dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching
06/2025	Recipes for Pre-training LLMs with MXFP8
06/2025	NoLoCo: No-all-reduce Low Communication Training Method for Large Models
06/2025	Farseer: A Refined Scaling Law in Large Language Models
06/2025	SwiftSpec: Ultra-Low Latency LLM Decoding by Scaling Asynchronous Speculative Decoding
06/2025	Truncated Proximal Policy Optimization
06/2025	LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning
06/2025	Prover Agent: An Agent-based Framework for Formal Mathematical Proofs
06/2025	DiLoCoX: A Low-Communication Large-Scale Training Framework for Decentralized Cluster
06/2025	GPAS: Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling
07/2025	Pre-Trained Policy Discriminators are General Reward Models
07/2025	Dynamic Chunking for End-to-End Hierarchical Sequence Modeling
07/2025	AdaMuon: Adaptive Muon Optimizer
07/2025	Mixture of Raytraced Experts
07/2025	Group Sequence Policy Optimization
07/2025	TTS-1 Technical Report
07/2025

Local Models Related Papers

Warning