|
Google Papers Blog |
12/2017 |
Attention Is All You Need (Transformers) |
10/2018 |
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding |
10/2019 |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (T5) |
11/2019 |
Fast Transformer Decoding: One Write-Head is All You Need |
02/2020 |
GLU Variants Improve Transformer |
03/2020 |
Talking-Heads Attention |
05/2020 |
Conformer: Convolution-augmented Transformer for Speech Recognition |
09/2020 |
Efficient Transformers: A Survey |
12/2020 |
RealFormer: Transformer Likes Residual Attention |
01/2021 |
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity |
09/2021 |
Finetuned Language Models Are Zero-Shot Learners (Flan) |
09/2021 |
Primer: Searching for Efficient Transformers for Language Modeling |
11/2021 |
Sparse is Enough in Scaling Transformers |
12/2021 |
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts |
01/2022 |
LaMDA: Language Models for Dialog Applications |
01/2022 |
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models |
04/2022 |
PaLM: Scaling Language Modeling with Pathways |
07/2022 |
Confident Adaptive Language Modeling |
10/2022 |
Scaling Instruction-Finetuned Language Models (Flan-Palm) |
10/2022 |
Towards Better Few-Shot and Finetuning Performance with Forgetful Causal Language Models |
10/2022 |
Large Language Models Can Self-Improve |
11/2022 |
Efficiently Scaling Transformer Inference |
11/2022 |
Fast Inference from Transformers via Speculative Decoding |
02/2023 |
Symbolic Discovery of Optimization Algorithms (Lion) |
03/2023 |
PaLM-E: An Embodied Multimodal Language Model |
04/2023 |
Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference |
05/2023 |
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes |
05/2023 |
FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction |
05/2023 |
PaLM 2 Technical Report |
05/2023 |
Symbol tuning improves in-context learning in language models |
05/2023 |
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models |
05/2023 |
Towards Expert-Level Medical Question Answering with Large Language Models (Med-Palm 2) |
05/2023 |
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining |
05/2023 |
How Does Generative Retrieval Scale to Millions of Passages? |
05/2023 |
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoint |
05/2023 |
Small Language Models Improve Giants by Rewriting Their Outputs |
06/2023 |
StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners |
06/2023 |
AudioPaLM: A Large Language Model That Can Speak and Listen |
06/2023 |
Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting |
07/2023 |
HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models |
09/2023 |
Uncovering mesa-optimization algorithms in Transformers |
10/2023 |
Think before you speak: Training Language Models With Pause Tokens |
10/2023 |
SpecTr: Fast Speculative Decoding via Optimal Transport |
11/2023 |
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs |
11/2023 |
Automatic Engineering of Long Prompts |
12/2023 |
Beyond ChatBots: ExploreLLM for Structured Thoughts and Personalized Model Responses |
12/2023 |
Style Aligned Image Generation via Shared Attention |
01/2024 |
A Minimaximalist Approach to Reinforcement Learning from Human Feedback (SPO) |
02/2024 |
Time-, Memory- and Parameter-Efficient Visual Adaptation (LoSA) |
02/2024 |
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context |
03/2024 |
PERL: Parameter Efficient Reinforcement Learning from Human Feedback |
04/2024 |
TransformerFAM: Feedback attention is working memory |
|
|
|
Deepmind (Google Deepmind as of 4/2023) Papers Blog |
10/2019 |
Stabilizing Transformers for Reinforcement Learning |
12/2021 |
Scaling Language Models: Methods, Analysis & Insights from Training Gopher |
12/2021 |
Improving language models by retrieving from trillions of tokens (RETRO) |
02/2022 |
Competition-Level Code Generation with AlphaCode |
02/2022 |
Unified Scaling Laws for Routed Language Models |
03/2022 |
Training Compute-Optimal Large Language Models (Chinchilla) |
04/2022 |
Flamingo: a Visual Language Model for Few-Shot Learning |
05/2022 |
A Generalist Agent (GATO) |
07/2022 |
Formal Algorithms for Transformers |
02/2023 |
Accelerating Large Language Model Decoding with Speculative Sampling |
05/2023 |
Tree of Thoughts: Deliberate Problem Solving with Large Language Models |
05/2023 |
Block-State Transformer |
05/2023 |
Randomized Positional Encodings Boost Length Generalization of Transformers |
08/2023 |
From Sparse to Soft Mixtures of Experts |
09/2023 |
Large Language Models as Optimizers |
09/2023 |
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset (MT Model) |
09/2023 |
Scaling Laws for Sparsely-Connected Foundation Models |
09/2023 |
Language Modeling Is Compression |
09/2023 |
Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution |
10/2023 |
Large Language Models as Analogical Reasoners |
10/2023 |
Controlled Decoding from Language Models |
10/2023 |
A General Theoretical Paradigm to Understand Learning from Human Preferences |
12/2023 |
Gemini: A Family of Highly Capable Multimodal Models |
12/2023 |
AlphaCode 2 Technical Report |
12/2023 |
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator |
12/2023 |
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models |
12/2023 |
Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding |
01/2024 |
Solving olympiad geometry without human demonstrations |
02/2024 |
LiPO: Listwise Preference Optimization through Learning-to-Rank |
02/2024 |
Grandmaster-Level Chess Without Search |
02/2024 |
How to Train Data-Efficient LLMs |
02/2024 |
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts |
02/2024 |
Gemma: Open Models Based on Gemini Research and Technology |
02/2024 |
Genie: Generative Interactive Environments |
02/2024 |
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models |
03/2024 |
DiPaCo: Distributed Path Composition |
04/2024 |
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models |
|
|
|
Meta (Facebook AI Research) Papers Blog |
04/2019 |
fairseq: A Fast, Extensible Toolkit for Sequence Modeling |
07/2019 |
Augmenting Self-attention with Persistent Memory |
11/2019 |
Improving Transformer Models by Reordering their Sublayers |
08/2021 |
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation |
03/2022 |
Training Logbook for OPT-175B |
05/2022 |
OPT: Open Pre-trained Transformer Language Models |
07/2022 |
Beyond neural scaling laws: beating power law scaling via data pruning |
11/2022 |
Galactica: A Large Language Model for Science |
01/2023 |
Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture (I-JEPA) |
02/2023 |
LLaMA: Open and Efficient Foundation Language Models |
02/2023 |
Toolformer: Language Models Can Teach Themselves to Use Tools |
03/2023 |
Scaling Expert Language Models with Unsupervised Domain Discovery |
03/2023 |
SemDeDup: Data-efficient learning at web-scale through semantic deduplication |
04/2023 |
Segment Anything (SAM) |
04/2023 |
A Cookbook of Self-Supervised Learning |
05/2023 |
Learning to Reason and Memorize with Self-Notes |
05/2023 |
ImageBind: One Embedding Space To Bind Them All |
05/2023 |
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers |
05/2023 |
LIMA: Less Is More for Alignment |
05/2023 |
Scaling Speech Technology to 1,000+ Languages |
05/2023 |
READ: Recurrent Adaptation of Large Transformers |
05/2023 |
LLM-QAT: Data-Free Quantization Aware Training for Large Language Models |
06/2023 |
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles |
06/2023 |
Simple and Controllable Music Generation (MusicGen) |
06/2023 |
Improving Open Language Models by Learning from Organic Interactions (BlenderBot 3x) |
06/2023 |
Extending Context Window of Large Language Models via Positional Interpolation |
06/2023 |
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale |
07/2023 |
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning (CM3leon) |
07/2023 |
Llama 2: Open Foundation and Fine-Tuned Chat Models |
08/2023 |
SeamlessM4T—Massively Multilingual & Multimodal Machine Translation |
08/2023 |
D4: Improving LLM Pretraining via Document De-Duplication and Diversification |
08/2023 |
Code Llama: Open Foundation Models for Code |
08/2023 |
Nougat: Neural Optical Understanding for Academic Documents |
09/2023 |
Contrastive Decoding Improves Reasoning in Large Language Models |
09/2023 |
Effective Long-Context Scaling of Foundation Models |
09/2023 |
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model |
09/2023 |
Vision Transformers Need Registers |
10/2023 |
RA-DIT: Retrieval-Augmented Dual Instruction Tuning |
10/2023 |
Branch-Solve-Merge Improves Large Language Model Evaluation and Generation |
10/2023 |
Generative Pre-training for Speech with Flow Matching |
11/2023 |
Emu Edit: Precise Image Editing via Recognition and Generation Tasks |
12/2023 |
Audiobox: Unified Audio Generation with Natural Language Prompts |
12/2023 |
Universal Pyramid Adversarial Training for Improved ViT Performance |
01/2024 |
Self-Rewarding Language Models |
02/2024 |
Revisiting Feature Prediction for Learning Visual Representations from Video (V-JEPA) |
03/2024 |
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM |
03/2024 |
Reverse Training to Nurse the Reversal Curse |
04/2024 |
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws |
04/2024 |
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length |
04/2024 |
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding |
04/2024 |
Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding |
04/2024 |
MoDE: CLIP Data Experts via Clustering |
|
|
|
Microsoft Papers Blog |
12/2015 |
Deep Residual Learning for Image Recognition |
05/2021 |
EL-Attention: Memory Efficient Lossless Attention for Generation |
01/2022 |
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale |
03/2022 |
DeepNet: Scaling Transformers to 1,000 Layers |
12/2022 |
A Length-Extrapolatable Transformer |
01/2023 |
Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases |
02/2023 |
Language Is Not All You Need: Aligning Perception with Language Models (Kosmos-1) |
03/2023 |
Sparks of Artificial General Intelligence: Early experiments with GPT-4 |
03/2023 |
TaskMatrix. AI: Completing Tasks by Connecting Foundation Models with Millions of APIs |
04/2023 |
Instruction Tuning with GPT-4 |
04/2023 |
Inference with Reference: Lossless Acceleration of Large Language Models |
04/2023 |
Low-code LLM: Visual Programming over LLMs |
04/2023 |
WizardLM: Empowering Large Language Models to Follow Complex Instructions |
04/2023 |
MLCopilot: Unleashing the Power of Large Language Models in Solving Machine Learning Tasks |
04/2023 |
ResiDual: Transformer with Dual Residual Connections |
05/2023 |
Code Execution with Pre-trained Language Models |
05/2023 |
Small Models are Valuable Plug-ins for Large Language Models |
05/2023 |
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing |
06/2023 |
Orca: Progressive Learning from Complex Explanation Traces of GPT-4 |
06/2023 |
Augmenting Language Models with Long-Term Memory |
06/2023 |
WizardCoder: Empowering Code Large Language Models with Evol-Instruct |
06/2023 |
Textbooks Are All You Need (phi-1) |
07/2023 |
In-context Autoencoder for Context Compression in a Large Language Model |
07/2023 |
Retentive Network: A Successor to Transformer for Large Language Models |
08/2023 |
Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference |
09/2023 |
Efficient RLHF: Reducing the Memory Usage of PPO |
09/2023 |
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models |
09/2023 |
Textbooks Are All You Need II (phi-1.5) |
09/2023 |
PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training |
09/2023 |
A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models |
09/2023 |
Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models |
10/2023 |
Sparse Backpropagation for MoE Training |
10/2023 |
Nugget 2D: Dynamic Contextual Compression for Scaling Decoder-only Language Models |
10/2023 |
Mixture of Quantized Experts (MoQE): Complementary Effect of Low-bit Quantization and Robustness |
10/2023 |
Augmented Embeddings for Custom Retrievals |
10/2023 |
Guiding Language Model Reasoning with Planning Tokens |
10/2023 |
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V |
10/2023 |
CodeFusion: A Pre-trained Diffusion Model for Code Generation |
10/2023 |
LoRAShear: Efficient Large Language Model Structured Pruning and Knowledge Recovery |
10/2023 |
FP8-LM: Training FP8 Large Language Models |
11/2023 |
Orca 2: Teaching Small Language Models How to Reason |
12/2023 |
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks |
12/2023 |
The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction |
01/2024 |
SliceGPT: Compress Large Language Models by Deleting Rows and Columns |
01/2024 |
RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture |
02/2024 |
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens |
02/2024 |
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits |
02/2024 |
ResLoRA: Identity Residual Mapping in Low-Rank Adaption |
03/2024 |
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression |
03/2024 |
SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series |
04/2024 |
LongEmbed: Extending Embedding Models for Long Context Retrieval |
04/2024 |
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone |
|
|
|
OpenAI Papers Blog |
07/2017 |
Proximal Policy Optimization Algorithms |
04/2019 |
Generating Long Sequences with Sparse Transformers |
01/2020 |
Scaling Laws for Neural Language Models |
05/2020 |
Language Models are Few-Shot Learners (GPT-3) |
01/2022 |
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets |
03/2022 |
Training language models to follow instructions with human feedback (InstructGPT) |
07/2022 |
Efficient Training of Language Models to Fill in the Middle |
03/2023 |
GPT-4 Technical Report |
04/2023 |
Consistency Models |
05/2023 |
Let's Verify Step by Step |
10/2023 |
Improving Image Generation with Better Captions (DALL·E 3) |
|
|
|
Hazy Research (Stanford) Papers Blog |
10/2021 |
Efficiently Modeling Long Sequences with Structured State Spaces (S4) |
04/2022 |
Monarch: Expressive Structured Matrices for Efficient and Accurate Training |
05/2022 |
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness |
12/2022 |
Hungry Hungry Hippos: Towards Language Modeling with State Space Models |
02/2023 |
Simple Hardware-Efficient Long Convolutions for Sequence Modeling |
02/2023 |
Hyena Hierarchy: Towards Larger Convolutional Language Models |
06/2023 |
TART: A plug-and-play Transformer module for task-agnostic reasoning |
07/2023 |
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning |
11/2023 |
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores |
|
|
|
THUDM (Tsinghua University) Papers Github |
10/2022 |
GLM-130B: An Open Bilingual Pre-Trained Model |
03/2023 |
CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X |
04/2023 |
DoctorGLM: Fine-tuning your Chinese Doctor is not a Herculean Task |
06/2023 |
WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human Preferences |
09/2023 |
GPT Can Solve Mathematical Problems Without a Calculator (MathGLM) |
10/2023 |
AgentTuning: Enabling Generalized Agent Abilities for LLMs (AgentLM) |
11/2023 |
CogVLM: Visual Expert for Pretrained Language Models |
12/2023 |
CogAgent: A Visual Language Model for GUI Agents |
01/2024 |
APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding |
01/2024 |
LongAlign: A Recipe for Long Context Alignment of Large Language Models |
|
|
|
Open Models |
06/2021 |
GPT-J-6B: 6B JAX-Based Transformer |
09/2021 |
Pythia: A Customizable Hardware Prefetching Framework Using Online Reinforcement Learning |
03/2022 |
CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis |
04/2022 |
GPT-NeoX-20B: An Open-Source Autoregressive Language Model |
11/2022 |
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model |
12/2022 |
DDColor: Towards Photo-Realistic Image Colorization via Dual Decoders |
04/2023 |
Visual Instruction Tuning (LLaVA) |
05/2023 |
StarCoder: May the source be with you! |
05/2023 |
CodeGen2: Lessons for Training LLMs on Programming and Natural Languages |
05/2023 |
Otter: A Multi-Modal Model with In-Context Instruction Tuning |
05/2023 |
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning |
05/2023 |
CodeT5+: Open Code Large Language Models for Code Understanding and Generation |
05/2023 |
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities |
05/2023 |
RWKV: Reinventing RNNs for the Transformer Era |
05/2023 |
Lion: Adversarial Distillation of Closed-Source Large Language Model |
05/2023 |
MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training |
06/2023 |
Segment Anything in High Quality |
06/2023 |
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding |
06/2023 |
High-Fidelity Audio Compression with Improved RVQGAN |
06/2023 |
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models |
06/2023 |
Anticipatory Music Transformer |
06/2023 |
RepoFusion: Training Code Models to Understand Your Repository |
06/2023 |
MPT-30B: Raising the bar for open-source foundation models |
06/2023 |
Vec2Vec: A Compact Neural Network Approach for Transforming Text Embeddings with High Fidelity |
06/2023 |
ViNT: A Foundation Model for Visual Navigation |
06/2023 |
How Long Can Open-Source LLMs Truly Promise on Context Length? (LongChat) |
07/2023 |
Hierarchical Open-vocabulary Universal Image Segmentation |
07/2023 |
Focused Transformer: Contrastive Training for Context Scaling (LongLLaMA |
07/2023 |
Rhythm Modeling for Voice Conversion (Urhythmic) |
07/2023 |
Scaling TransNormer to 175 Billion Parameters |
08/2023 |
Separate Anything You Describe |
08/2023 |
StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data |
09/2023 |
RADIO: Reference-Agnostic Dubbing Video Synthesis |
09/2023 |
Matcha-TTS: A fast TTS architecture with conditional flow matching |
09/2023 |
DreamLLM: Synergistic Multimodal Comprehension and Creation |
09/2023 |
Baichuan 2: Open Large-scale Language Models |
09/2023 |
Qwen Technical Report |
09/2023 |
Mistral 7B |
10/2023 |
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning |
10/2023 |
Improved Baselines with Visual Instruction Tuning (LLaVA 1.5) |
10/2023 |
LLark: A Multimodal Foundation Model for Music |
10/2023 |
SALMONN: Towards Generic Hearing Abilities for Large Language Models |
10/2023 |
Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents |
11/2023 |
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models |
11/2023 |
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition |
11/2023 |
YUAN 2.0: A Large Language Model with Localized Filtering-based Attention |
12/2023 |
Making Large Multimodal Models Understand Arbitrary Visual Prompts (ViP-LLaVA) |
12/2023 |
Mamba: Linear-Time Sequence Modeling with Selective State Spaces |
12/2023 |
OpenVoice: Versatile Instant Voice Cloning |
12/2023 |
Sequential Modeling Enables Scalable Learning for Large Vision Models (LVM) |
12/2023 |
Magicoder: Source Code Is All You Need |
12/2023 |
StripedHyena-7B, open source models offering a glimpse into a world beyond Transformers |
12/2023 |
MMM: Generative Masked Motion Model |
12/2023 |
4M: Massively Multimodal Masked Modeling |
12/2023 |
LLM360: Towards Fully Transparent Open-Source LLMs |
12/2023 |
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling |
01/2024 |
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism |
01/2024 |
Mixtral of Experts |
01/2024 |
EAT: Self-Supervised Pre-Training with Efficient Audio Transformer |
01/2024 |
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications |
01/2024 |
Scalable Pre-training of Large Autoregressive Image Models |
01/2024 |
Orion-14B: Open-source Multilingual Large Language Models |
01/2024 |
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data |
01/2024 |
VMamba: Visual State Space Model |
01/2024 |
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence |
01/2024 |
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models |
01/2024 |
LLaVA-1.6: Improved reasoning, OCR, and world knowledge |
01/2024 |
MiniCPM: Unveiling the Potential of End-side Large Language Models |
01/2024 |
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild |
02/2024 |
Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State Spaces |
02/2024 |
Introducing Qwen1.5 |
02/2024 |
BlackMamba: Mixture of Experts for State-Space Models |
02/2024 |
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models |
02/2024 |
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss |
02/2024 |
GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators |
02/2024 |
Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion |
02/2024 |
Brant-2: Foundation Model for Brain Signals |
03/2024 |
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis (SD3) |
03/2024 |
TripoSR: Fast 3D Object Reconstruction from a Single Image |
03/2024 |
Yi: Open Foundation Models by 01.AI |
03/2024 |
DeepSeek-VL: Towards Real-World Vision-Language Understanding |
03/2024 |
VideoMamba: State Space Model for Efficient Video Understanding |
03/2024 |
VOICECRAFT: Zero-Shot Speech Editing and Text-to-Speech in the Wild |
03/2024 |
GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation |
03/2024 |
DBRX: A New State-of-the-Art Open LLM |
03/2024 |
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation |
03/2024 |
Jamba: A Hybrid Transformer-Mamba Language Model |
04/2024 |
Advancing LLM Reasoning Generalists with Preference Trees (Eurus) |
04/2024 |
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction (VAR) |
04/2024 |
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence |
04/2024 |
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models |
|
|
|
Various |
09/2014 |
Neural Machine Translation by Jointly Learning to Align and Translate |
06/2019 |
Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View |
10/2019 |
Root Mean Square Layer Normalization |
10/2019 |
Transformers without Tears: Improving the Normalization of Self-Attention |
12/2019 |
Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection |
02/2020 |
On Layer Normalization in the Transformer Architecture |
04/2020 |
Longformer: The Long-Document Transformer |
06/2020 |
Memory Transformer |
07/2020 |
Mirostat: A Neural Text Decoding Algorithm that Directly Controls Perplexity |
12/2020 |
ERNIE-Doc: A Retrospective Long-Document Modeling Transformer |
01/2021 |
Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks |
03/2021 |
The Low-Rank Simplicity Bias in Deep Networks |
04/2021 |
RoFormer: Enhanced Transformer with Rotary Position Embedding |
06/2021 |
LoRA: Low-Rank Adaptation of Large Language Models |
07/2023 |
CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention |
03/2022 |
Memorizing Transformers |
04/2022 |
UL2: Unifying Language Learning Paradigms |
05/2022 |
Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning (IA3) |
06/2022 |
nuQmm: Quantized MatMul for Efficient Inference of Large-Scale Generative Language Models |
07/2022 |
Language Models (Mostly) Know What They Know |
08/2022 |
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale |
09/2022 |
Petals: Collaborative Inference and Fine-tuning of Large Models |
10/2022 |
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers |
10/2022 |
Recurrent Memory Transformer |
10/2022 |
Truncation Sampling as Language Model Desmoothing |
10/2022 |
DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation |
11/2022 |
An Algorithm for Routing Vectors in Sequences |
11/2022 |
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts |
12/2022 |
Self-Instruct: Aligning Language Model with Self Generated Instructions |
12/2022 |
Parallel Context Windows Improve In-Context Learning of Large Language Models |
12/2022 |
Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor |
12/2022 |
Pretraining Without Attention |
12/2022 |
The case for 4-bit precision: k-bit Inference Scaling Laws |
12/2022 |
Prompting Is Programming: A Query Language for Large Language Models |
01/2023 |
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient |
01/2023 |
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot |
01/2023 |
Memory Augmented Large Language Models are Computationally Universal |
01/2023 |
Progress measures for grokking via mechanistic interpretability |
02/2023 |
Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models |
02/2023 |
The Wisdom of Hindsight Makes Language Models Better Instruction Followers |
02/2023 |
The Stable Entropy Hypothesis and Entropy-Aware Decoding: An Analysis and Algorithm for Robust Natural Language Generation |
03/2023 |
COLT5: Faster Long-Range Transformers with Conditional Computation |
03/2023 |
High-throughput Generative Inference of Large Language Models with a Single GPU |
03/2023 |
Meet in the Middle: A New Pre-training Paradigm |
03/2023 |
Reflexion: an autonomous agent with dynamic memory and self-reflection |
03/2023 |
Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning |
03/2023 |
FP8 versus INT8 for efficient deep learning inference |
03/2023 |
Self-Refine: Iterative Refinement with Self-Feedback |
04/2023 |
RPTQ: Reorder-based Post-training Quantization for Large Language Models |
04/2023 |
REFINER: Reasoning Feedback on Intermediate Representations |
04/2023 |
Generative Agents: Interactive Simulacra of Human Behavior |
04/2023 |
Compressed Regression over Adaptive Networks |
04/2023 |
A Cheaper and Better Diffusion Language Model with Soft-Masked Noise |
04/2023 |
RRHF: Rank Responses to Align Language Models with Human Feedback without tears |
04/2023 |
CAMEL: Communicative Agents for "Mind" Exploration of Large Scale Language Model Society |
04/2023 |
Automatic Gradient Descent: Deep Learning without Hyperparameters |
04/2023 |
SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models |
04/2023 |
Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study |
04/2023 |
Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling |
04/2023 |
Scaling Transformer to 1M tokens and beyond with RMT |
04/2023 |
Answering Questions by Meta-Reasoning over Multiple Chains of Thought |
04/2023 |
Towards Multi-Modal DBMSs for Seamless Querying of Texts and Tables |
04/2023 |
We're Afraid Language Models Aren't Modeling Ambiguity |
04/2023 |
The Internal State of an LLM Knows When its Lying |
04/2023 |
Search-in-the-Chain: Towards the Accurate, Credible and Traceable Content Generation for Complex Knowledge-intensive Tasks |
05/2023 |
Towards Unbiased Training in Federated Open-world Semi-supervised Learning |
05/2023 |
Unlimiformer: Long-Range Transformers with Unlimited Length Input |
05/2023 |
FreeLM: Fine-Tuning-Free Language Model |
05/2023 |
Cuttlefish: Low-rank Model Training without All The Tuning |
05/2023 |
AttentionViz: A Global View of Transformer Attention |
05/2023 |
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models |
05/2023 |
A Frustratingly Easy Improvement for Position Embeddings via Random Padding |
05/2023 |
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision |
05/2023 |
Explanation-based Finetuning Makes Models More Robust to Spurious Cues |
05/2023 |
An automatically discovered chain-of-thought prompt generalizes to novel models and datasets |
05/2023 |
Recommender Systems with Generative Retrieval |
05/2023 |
Fast Distributed Inference Serving for Large Language Models |
05/2023 |
Chain-of-Dictionary Prompting Elicits Translation in Large Language Models |
05/2023 |
Recommendation as Instruction Following: A Large Language Model Empowered Recommendation Approach |
05/2023 |
Active Retrieval Augmented Generation |
05/2023 |
Scalable Coupling of Deep Learning with Logical Reasoning |
05/2023 |
Interpretability at Scale: Identifying Causal Mechanisms in Alpaca |
05/2023 |
StructGPT: A General Framework for Large Language Model to Reason over Structured Data |
05/2023 |
Pre-Training to Learn in Context |
05/2023 |
ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings |
05/2023 |
Accelerating Transformer Inference for Translation via Parallel Decoding |
05/2023 |
Cooperation Is All You Need |
05/2023 |
PTQD: Accurate Post-Training Quantization for Diffusion Models |
05/2023 |
LLM-Pruner: On the Structural Pruning of Large Language Models |
05/2023 |
SelfzCoT: a Self-Prompt Zero-shot CoT from Semantic-level to Code-level for a Better Utilization of LLMs |
05/2023 |
QLoRA: Efficient Finetuning of Quantized LLMs |
05/2023 |
"According to ..." Prompting Language Models Improves Quoting from Pre-Training Data |
05/2023 |
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training |
05/2023 |
Landmark Attention: Random-Access Infinite Context Length for Transformers |
05/2023 |
Scaling Data-Constrained Language Models |
05/2023 |
Fine-Tuning Language Models with Just Forward Passes |
05/2023 |
Intriguing Properties of Quantization at Scale |
05/2023 |
Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time |
05/2023 |
Blockwise Parallel Transformer for Long Context Large Models |
05/2023 |
The Impact of Positional Encoding on Length Generalization in Transformers |
05/2023 |
Adapting Language Models to Compress Contexts |
05/2023 |
Direct Preference Optimization: Your Language Model is Secretly a Reward Model |
06/2023 |
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration |
06/2023 |
Faster Causal Attention Over Large Sequences Through Sparse Flash Attention |
06/2023 |
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training |
06/2023 |
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression |
06/2023 |
Fine-Tuning Language Models with Advantage-Induced Policy Alignment |
06/2023 |
Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards |
06/2023 |
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model |
06/2023 |
Mixture-of-Domain-Adapters: Decoupling and Injecting Domain Knowledge to Pre-trained Language Models Memories |
06/2023 |
Boosting Fast and High-Quality Speech Synthesis with Linear Diffusion |
06/2023 |
Word sense extension |
06/2023 |
Mitigating Transformer Overconfidence via Lipschitz Regularization |
06/2023 |
Recurrent Attention Networks for Long-text Modeling |
06/2023 |
One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning |
06/2023 |
SqueezeLLM: Dense-and-Sparse Quantization |
06/2023 |
Tune As You Scale: Hyperparameter Optimization For Compute Efficient Training |
06/2023 |
Propagating Knowledge Updates to LMs Through Distillation |
06/2023 |
Full Parameter Fine-tuning for Large Language Models with Limited Resources |
06/2023 |
A Simple and Effective Pruning Approach for Large Language Models |
06/2023 |
InRank: Incremental Low-Rank Learning |
06/2023 |
Evaluating the Zero-shot Robustness of Instruction-tuned Language Models |
06/2023 |
Learning to Generate Better Than Your LLM (RLGF) |
06/2023 |
Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing |
06/2023 |
H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Model |
06/2023 |
FLuRKA: Fast fused Low-Rank & Kernel Attention |
06/2023 |
Stay on topic with Classifier-Free Guidance |
07/2023 |
AutoST: Training-free Neural Architecture Search for Spiking Transformers |
07/2023 |
Single Sequence Prediction over Reasoning Graphs for Multi-hop QA |
07/2023 |
Shifting Attention to Relevance: Towards the Uncertainty Estimation of Large Language Models |
07/2023 |
Facing off World Model Backbones: RNNs, Transformers, and S4 |
07/2023 |
Improving Retrieval-Augmented Large Language Models via Data Importance Learning |
07/2023 |
Teaching Arithmetic to Small Transformers |
07/2023 |
QIGen: Generating Efficient Kernels for Quantized Inference on Large Language Models |
07/2023 |
Stack More Layers Differently: High-Rank Training Through Low-Rank Updates |
07/2023 |
Copy Is All You Need (CoG) |
07/2023 |
Multi-Method Self-Training: Improving Code Generation With Text, And Vice Versa |
07/2023 |
Divide & Bind Your Attention for Improved Generative Semantic Nursing |
07/2023 |
Challenges and Applications of Large Language Models |
07/2023 |
Soft Prompt Tuning for Augmenting Dense Retrieval with Large Language Models |
07/2023 |
QuIP: 2-Bit Quantization of Large Language Models With Guarantees |
07/2023 |
CoRe Optimizer: An All-in-One Solution for Machine Learning |
07/2023 |
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time |
08/2023 |
ReCLIP: Refine Contrastive Language Image Pre-Training with Source Free Domain Adaptation |
08/2023 |
EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language Models |
08/2023 |
Activation Addition: Steering Language Models Without Optimization |
08/2023 |
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models |
08/2023 |
Accelerating LLM Inference with Staged Speculative Decoding |
08/2023 |
YaRN: Efficient Context Window Extension of Large Language Models |
08/2023 |
LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models |
09/2023 |
Making Large Language Models Better Reasoners with Alignment |
09/2023 |
Data-Juicer: A One-Stop Data Processing System for Large Language Models |
09/2023 |
Delta-LoRA: Fine-Tuning High-Rank Parameters with the Delta of Low-Rank Matrices |
09/2023 |
SLiMe: Segment Like Me |
09/2023 |
Norm Tweaking: High-performance Low-bit Quantization of Large Language Models |
09/2023 |
When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale |
09/2023 |
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs |
09/2023 |
Efficient Memory Management for Large Language Model Serving with PagedAttention |
09/2023 |
Cure the headache of Transformers via Collinear Constrained Attention |
09/2023 |
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity |
09/2023 |
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models |
09/2023 |
MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation |
09/2023 |
Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models |
09/2023 |
Improving Code Generation by Dynamic Temperature Sampling |
09/2023 |
Efficient Streaming Language Models with Attention Sinks |
10/2023 |
DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models |
10/2023 |
GrowLength: Accelerating LLMs Pretraining by Progressively Growing Training Length |
10/2023 |
Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image Diffusion Models |
10/2023 |
Elephant Neural Networks: Born to Be a Continual Learner |
10/2023 |
Ring Attention with Blockwise Transformers for Near-Infinite Context |
10/2023 |
Retrieval meets Long Context Large Language Models |
10/2023 |
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines |
10/2023 |
LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers |
10/2023 |
Amortizing intractable inference in large language models (GFlowNet Tuning) |
10/2023 |
SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF |
10/2023 |
Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity |
10/2023 |
Let Models Speak Ciphers: Multiagent Debate through Embeddings |
10/2023 |
InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining |
10/2023 |
CacheGen: Fast Context Loading for Language Model Applications |
10/2023 |
MatFormer: Nested Transformer for Elastic Inference |
10/2023 |
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models |
10/2023 |
Towards End-to-end 4-Bit Inference on Generative Large Language Models (QUIK) |
10/2023 |
Microscaling Data Formats for Deep Learning |
10/2023 |
xVal: A Continuous Number Encoding for Large Language Models |
10/2023 |
An Emulator for Fine-Tuning Large Language Models using Small Language Models |
10/2023 |
Frozen Transformers in Language Models Are Effective Visual Encoder Layers |
10/2023 |
LoBaSS: Gauging Learnability in Supervised Fine-tuning Data |
10/2023 |
Quality-Diversity through AI Feedback |
10/2023 |
DoGE: Domain Reweighting with Generalization Estimation |
10/2023 |
E-Sparse: Boosting the Large Language Model Inference through Entropy-based N:M Sparsity |
10/2023 |
Mixture of Tokens: Efficient LLMs through Cross-Example Aggregation |
10/2023 |
Personas as a Way to Model Truthfulness in Language Models |
10/2023 |
Atom: Low-bit Quantization for Efficient and Accurate LLM Serving |
10/2023 |
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models |
11/2023 |
AWEQ: Post-Training Quantization with Activation-Weight Equalization for Large Language Models |
11/2023 |
FlashDecoding++: Faster Large Language Model Inference on GPUs |
11/2023 |
Divergent Token Metrics: Measuring degradation to prune away LLM components -- and optimize quantization |
11/2023 |
Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs |
11/2023 |
REST: Retrieval-Based Speculative Decoding |
11/2023 |
DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines |
11/2023 |
Token-level Adaptation of LoRA Adapters for Downstream Task Generalization |
11/2023 |
Exponentially Faster Language Modelling |
11/2023 |
MultiLoRA: Democratizing LoRA for Better Multi-Task Learning |
11/2023 |
LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model Finetuning |
11/2023 |
Token Recycling for Efficient Sequential Inference with Vision Transformers |
11/2023 |
Enabling Fast 2-bit LLM on GPUs: Memory Alignment, Sparse Outlier, and Asynchronous Dequantization |
12/2023 |
GIFT: Generative Interpretable Fine-Tuning Transformers |
12/2023 |
PEFA: Parameter-Free Adapters for Large-scale Embedding-based Retrieval Models |
12/2023 |
Improving Activation Steering in Language Models with Mean-Centring |
12/2023 |
A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA |
12/2023 |
SparQ Attention: Bandwidth-Efficient LLM Inference |
12/2023 |
ESPN: Memory-Efficient Multi-Vector Information Retrieval |
12/2023 |
Aligner: One Global Token is Worth Millions of Parameters When Aligning Large Language Models |
12/2023 |
CBQ: Cross-Block Quantization for Large Language Models |
12/2023 |
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention |
12/2023 |
Weight subcloning: direct initialization of transformers using larger pretrained ones |
12/2023 |
Cascade Speculative Drafting for Even Faster LLM Inference |
12/2023 |
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU |
12/2023 |
ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference |
12/2023 |
Lookahead: An Inference Acceleration Framework for Large Language Model with Lossless Generation Accuracy |
12/2023 |
A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Descriptive Properties |
12/2023 |
Algebraic Positional Encodings |
12/2023 |
Preference as Reward, Maximum Preference Optimization with Importance Sampling |
01/2024 |
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning |
01/2024 |
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models |
01/2024 |
LLaMA Pro: Progressive LLaMA with Block Expansion |
01/2024 |
Fast and Optimal Weight Update for Pruned Large Language Models |
01/2024 |
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon |
01/2024 |
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts |
01/2024 |
Chain of LoRA: Efficient Fine-tuning of Language Models via Residual Learning |
01/2024 |
RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation |
01/2024 |
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models |
01/2024 |
AUTOACT: Automatic Agent Learning from Scratch via Self-Planning |
01/2024 |
Extreme Compression of Large Language Models via Additive Quantization (AQLM) |
01/2024 |
Knowledge Translation: A New Pathway for Model Compression |
01/2024 |
Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks |
01/2024 |
Transformers are Multi-State RNNs |
01/2024 |
Extending LLMs' Context Window with 100 Samples (Entropy-ABF) |
01/2024 |
ChatQA: Building GPT-4 Level Conversational QA Models |
01/2024 |
AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence Inference |
01/2024 |
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads |
01/2024 |
Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation |
01/2024 |
BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language Models |
01/2024 |
Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment |
01/2024 |
Dynamic Layer Tying for Parameter-Efficient Transformers |
01/2024 |
MambaByte: Token-free Selective State Space Model |
01/2024 |
FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design |
01/2024 |
Accelerating Retrieval-Augmented Language Model Serving with Speculation |
01/2024 |
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities |
01/2024 |
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty |
01/2024 |
With Greater Text Comes Greater Necessity: Inference-Time Training Helps Long Text Generation (Temp LoRA) |
01/2024 |
YODA: Teacher-Student Progressive Learning for Language Models |
01/2024 |
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization |
01/2024 |
LOCOST: State-Space Models for Long Document Abstractive Summarization |
01/2024 |
Convolution Meets LoRA: Parameter Efficient Finetuning for Segment Anything Model |
01/2024 |
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval |
02/2024 |
EE-Tuning: An Economical yet Scalable Solution for Tuning Early-Exit Large Language Models |
02/2024 |
MoDE: A Mixture-of-Experts Model with Mutual Distillation among the Experts |
02/2024 |
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding |
02/2024 |
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities |
02/2024 |
HiQA: A Hierarchical Contextual Augmentation RAG for Massive Documents QA |
02/2024 |
KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache |
02/2024 |
DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing |
02/2024 |
QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks |
02/2024 |
Hydragen: High-Throughput LLM Inference with Shared Prefixes |
02/2024 |
Hydra: Sequentially-Dependent Draft Heads for Medusa Decoding |
02/2024 |
LESS: Selecting Influential Data for Targeted Instruction Tuning |
02/2024 |
Accurate LoRA-Finetuning Quantization of LLMs via Information Retention |
02/2024 |
AttnLRP: Attention-Aware Layer-wise Relevance Propagation for Transformers |
02/2024 |
X-LoRA: Mixture of Low-Rank Adapter Experts, a Flexible Framework for Large Language Models with Applications in Protein Mechanics |
02/2024 |
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data |
02/2024 |
Mitigating Object Hallucination in Large Vision-Language Models via Classifier-Free Guidance |
02/2024 |
Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference |
02/2024 |
Uncertainty Decomposition and Quantification for In-Context Learning of Large Language Models |
02/2024 |
RS-DPO: A Hybrid Rejection Sampling and Direct Preference Optimization Method for Alignment of Large Language Models |
02/2024 |
BitDelta: Your Fine-Tune May Only Be Worth One Bit |
02/2024 |
DoRA: Weight-Decomposed Low-Rank Adaptation |
02/2024 |
In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss |
02/2024 |
Aligning Modalities in Vision Large Language Models via Preference Fine-tuning |
02/2024 |
Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct Decoding |
02/2024 |
Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts |
02/2024 |
WKVQuant: Quantizing Weight and Key/Value Cache for Large Language Models Gains More |
02/2024 |
DB-LLM: Accurate Dual-Binarization for Efficient LLMs |
02/2024 |
Data Engineering for Scaling Language Models to 128K Context |
02/2024 |
EBFT: Effective and Block-Wise Fine-Tuning for Sparse LLMs |
02/2024 |
HyperMoE: Towards Better Mixture of Experts via Transferring Among Experts |
02/2024 |
Turn Waste into Worth: Rectifying Top |
02/2024 |
Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive |
02/2024 |
Q-Probe: A Lightweight Approach to Reward Maximization for Language Models |
02/2024 |
Take the Bull by the Horns: Hard Sample-Reweighted Continual Training Improves LLM Generalization |
02/2024 |
MemoryPrompt: A Light Wrapper to Improve Context Tracking in Pre-trained Language Models |
02/2024 |
Fine-tuning CLIP Text Encoders with Two-step Paraphrasing |
02/2024 |
BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation |
02/2024 |
No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization |
02/2024 |
DropBP: Accelerating Fine-Tuning of Large Language Models by Dropping Backward Propagation |
02/2024 |
CoDream: Exchanging dreams instead of models for federated aggregation with heterogeneous models |
02/2024 |
Humanoid Locomotion as Next Token Prediction |
02/2024 |
KTO: Model Alignment as Prospect Theoretic Optimization |
02/2024 |
Noise Contrastive Alignment of Language Models with Explicit Rewards (NCA) |
03/2024 |
Not all Layers of LLMs are Necessary during Inference |
03/2024 |
Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models |
03/2024 |
DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models |
03/2024 |
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection |
03/2024 |
Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding |
03/2024 |
Scattered Mixture-of-Experts Implementation |
03/2024 |
AutoLoRA: Automatically Tuning Matrix Ranks in Low-Rank Adaptation Based on Meta Learning |
03/2024 |
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences |
03/2024 |
Bifurcated Attention for Single-Context Large-Batch Sampling |
03/2024 |
Keyformer: KV Cache Reduction through Key Tokens Selection for Efficient Generative Inference |
03/2024 |
Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering |
03/2024 |
Recurrent Drafter for Fast Speculative Decoding in Large Language Models |
03/2024 |
Arcee's MergeKit: A Toolkit for Merging Large Language Models |
03/2024 |
Rotary Position Embedding for Vision Transformer |
03/2024 |
BiLoRA: A Bi-level Optimization Framework for Overfitting-Resilient Low-Rank Adaptation of Large Pre-trained Models |
03/2024 |
Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition |
03/2024 |
DreamReward: Text-to-3D Generation with Human Preference |
03/2024 |
Evolutionary Optimization of Model Merging Recipes |
03/2024 |
When Do We Not Need Larger Vision Models? |
03/2024 |
FeatUp: A Model-Agnostic Framework for Features at Any Resolution |
03/2024 |
ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching |
03/2024 |
The Unreasonable Ineffectiveness of the Deeper Layers |
03/2024 |
QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs |
04/2024 |
LLM-ABR: Designing Adaptive Bitrate Algorithms via Large Language Models |
04/2024 |
Prompt-prompted Mixture of Experts for Efficient LLM Generation (GRIFFIN) |
04/2024 |
BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models |
04/2024 |
SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget |
04/2024 |
CodecLM: Aligning Language Models with Tailored Synthetic Data |
04/2024 |
Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation |
04/2024 |
Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs |
04/2024 |
Continuous Language Model Interpolation for Dynamic and Controllable Text Generation |
04/2024 |
RULER: What's the Real Context Size of Your Long-Context Language Models? |
04/2024 |
Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models |
04/2024 |
On Speculative Decoding for Multimodal Large Language Models |
04/2024 |
CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models |
04/2024 |
Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs |
04/2024 |
Fewer Truncations Improve Language Modeling |
04/2024 |
When LLMs are Unfit Use FastFit: Fast and Effective Text Classification with Many Classes |
04/2024 |
Learn2Talk: 3D Talking Face Learns from 2D Talking Face |
04/2024 |
decoupleQ: Towards 2-bit Post-Training Uniform Quantization via decoupling Parameters into Integer and Floating Points |
04/2024 |
RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation |
04/2024 |
Lossless Acceleration of Large Language Model via Adaptive N-gram Parallel Decoding |
04/2024 |
Mixture of LoRA Experts |
04/2024 |
MARVEL: Multidimensional Abstraction and Reasoning through Visual Evaluation and Learning |
04/2024 |
XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts |
04/2024 |
Retrieval Head Mechanistically Explains Long-Context Factuality |
|
|
|
Articles |
03/2019 |
Rich Sutton - The Bitter Lesson |
06/2022 |
Yann LeCun - A Path Towards Autonomous Machine Intelligence |
01/2023 |
Lilian Weng - The Transformer Family Version 2.0 |
01/2023 |
Lilian Weng - Large Transformer Model Inference Optimization |
03/2023 |
Stanford - Alpaca: A Strong, Replicable Instruction-Following Model |
05/2023 |
OpenAI - Language models can explain neurons in language models |
05/2023 |
Alex Turner - Steering GPT-2-XL by adding an activation vector |
06/2023 |
YyWang - Do We Really Need the KVCache for All Large Language Models |
06/2023 |
kaiokendev - Extending Context is Hard…but not Impossible |
06/2023 |
bloc97 - NTK-Aware Scaled RoPE |
07/2023 |
oobabooga - A direct comparison between llama.cpp, AutoGPTQ, ExLlama, and transformers perplexities |
07/2023 |
Jianlin Su - Carrying the beta position to the end (better NTK RoPe method) |
08/2023 |
Charles Goddard - On Frankenllama |
10/2023 |
Tri Dao - Flash-Decoding for Long-Context Inference |
10/2023 |
Evan Armstrong - Human-Sourced, AI-Augmented: a promising solution for open source conversational data |
12/2023 |
Anthropic - Long context prompting for Claude 2.1 |
12/2023 |
Andrej Karpathy - On the "hallucination problem" (tweet.jpg) |
12/2023 |
HuggingFace - Mixture of Experts Explained |
01/2024 |
Vgel - Representation Engineering |
02/2024 |
Lilian Weng - Thinking about High-Quality Human Data |
03/2024 |
rayliuca - T-Ragx Project Write Up (Translation RAG) |
04/2024 |
Answer.Ai - Efficient finetuning of Llama 3 with FSDP QDoRA |