Local Models Related Papers

/lmg/ Abstracts Search (Current as end of 09/2024)Links
Google Papers Blog
12/2017 Attention Is All You Need (Transformers)
10/2018 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
10/2019 Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (T5)
11/2019 Fast Transformer Decoding: One Write-Head is All You Need
02/2020 GLU Variants Improve Transformer
03/2020 Talking-Heads Attention
05/2020 Conformer: Convolution-augmented Transformer for Speech Recognition
09/2020 Efficient Transformers: A Survey
12/2020 RealFormer: Transformer Likes Residual Attention
01/2021 Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
09/2021 Finetuned Language Models Are Zero-Shot Learners (Flan)
09/2021 Primer: Searching for Efficient Transformers for Language Modeling
11/2021 Sparse is Enough in Scaling Transformers
12/2021 GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
01/2022 LaMDA: Language Models for Dialog Applications
01/2022 Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
04/2022 PaLM: Scaling Language Modeling with Pathways
07/2022 Confident Adaptive Language Modeling
10/2022 Scaling Instruction-Finetuned Language Models (Flan-Palm)
10/2022 Towards Better Few-Shot and Finetuning Performance with Forgetful Causal Language Models
10/2022 Large Language Models Can Self-Improve
11/2022 Efficiently Scaling Transformer Inference
11/2022 Fast Inference from Transformers via Speculative Decoding
02/2023 Symbolic Discovery of Optimization Algorithms (Lion)
03/2023 PaLM-E: An Embodied Multimodal Language Model
04/2023 Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference
05/2023 Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
05/2023 FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction
05/2023 PaLM 2 Technical Report
05/2023 Symbol tuning improves in-context learning in language models
05/2023 Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models
05/2023 Towards Expert-Level Medical Question Answering with Large Language Models (Med-Palm 2)
05/2023 DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
05/2023 How Does Generative Retrieval Scale to Millions of Passages?
05/2023 GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoint
05/2023 Small Language Models Improve Giants by Rewriting Their Outputs
06/2023 StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners
06/2023 AudioPaLM: A Large Language Model That Can Speak and Listen
06/2023 Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting
07/2023 HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models
09/2023 Uncovering mesa-optimization algorithms in Transformers
10/2023 Think before you speak: Training Language Models With Pause Tokens
10/2023 SpecTr: Fast Speculative Decoding via Optimal Transport
11/2023 UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
11/2023 Automatic Engineering of Long Prompts
12/2023 Beyond ChatBots: ExploreLLM for Structured Thoughts and Personalized Model Responses
12/2023 Style Aligned Image Generation via Shared Attention
01/2024 A Minimaximalist Approach to Reinforcement Learning from Human Feedback (SPO)
02/2024 Time-, Memory- and Parameter-Efficient Visual Adaptation (LoSA)
02/2024 Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
03/2024 PERL: Parameter Efficient Reinforcement Learning from Human Feedback
04/2024 TransformerFAM: Feedback attention is working memory
05/2024 eXmY: A Data Type and Technique for Arbitrary Bit Precision Quantization
05/2024 Faster Cascades via Speculative Decoding
06/2024 Proofread: Fixes All Errors with One Tap
08/2024 Natural Language Outlines for Code: Literate Programming in the LLM Era
08/2024 Diffusion Models Are Real-Time Game Engines
Deepmind (Google Deepmind as of 4/2023) Papers Blog
10/2019 Stabilizing Transformers for Reinforcement Learning
12/2021 Scaling Language Models: Methods, Analysis & Insights from Training Gopher
12/2021 Improving language models by retrieving from trillions of tokens (RETRO)
02/2022 Competition-Level Code Generation with AlphaCode
02/2022 Unified Scaling Laws for Routed Language Models
03/2022 Training Compute-Optimal Large Language Models (Chinchilla)
04/2022 Flamingo: a Visual Language Model for Few-Shot Learning
05/2022 A Generalist Agent (GATO)
07/2022 Formal Algorithms for Transformers
02/2023 Accelerating Large Language Model Decoding with Speculative Sampling
05/2023 Tree of Thoughts: Deliberate Problem Solving with Large Language Models
05/2023 Block-State Transformer
05/2023 Randomized Positional Encodings Boost Length Generalization of Transformers
08/2023 From Sparse to Soft Mixtures of Experts
09/2023 Large Language Models as Optimizers
09/2023 MADLAD-400: A Multilingual And Document-Level Large Audited Dataset (MT Model)
09/2023 Scaling Laws for Sparsely-Connected Foundation Models
09/2023 Language Modeling Is Compression
09/2023 Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution
10/2023 Large Language Models as Analogical Reasoners
10/2023 Controlled Decoding from Language Models
10/2023 A General Theoretical Paradigm to Understand Learning from Human Preferences
11/2023 DiLoCo: Distributed Low-Communication Training of Language Models
12/2023 Gemini: A Family of Highly Capable Multimodal Models
12/2023 AlphaCode 2 Technical Report
12/2023 Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
12/2023 Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
12/2023 Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding
01/2024 Solving olympiad geometry without human demonstrations
02/2024 LiPO: Listwise Preference Optimization through Learning-to-Rank
02/2024 Grandmaster-Level Chess Without Search
02/2024 How to Train Data-Efficient LLMs
02/2024 A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
02/2024 Gemma: Open Models Based on Gemini Research and Technology
02/2024 Genie: Generative Interactive Environments
02/2024 Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
03/2024 DiPaCo: Distributed Path Composition
04/2024 Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
05/2024 Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities
06/2024 Transformers meet Neural Algorithmic Reasoners
06/2024 Gemma 2: Improving Open Language Models at a Practical Size
06/2024 Data curation via joint example selection further accelerates multimodal learning
07/2024 PaliGemma: A versatile 3B VLM for transfer
07/2024 LookupViT: Compressing visual information to a limited number of tokens
07/2024 Mixture of Nested Experts: Adaptive Processing of Visual Tokens
08/2024 Generative Verifiers: Reward Modeling as Next-Token Prediction
09/2024 Imitating Language via Scalable Inverse Reinforcement Learning
10/2024 Preference Optimization as Probabilistic Inference
10/2024 Round and Round We Go! What makes Rotary Positional Encodings useful?
Meta (Facebook AI Research) Papers Blog
04/2019 fairseq: A Fast, Extensible Toolkit for Sequence Modeling
07/2019 Augmenting Self-attention with Persistent Memory
11/2019 Improving Transformer Models by Reordering their Sublayers
08/2021 Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
03/2022 Training Logbook for OPT-175B
05/2022 OPT: Open Pre-trained Transformer Language Models
07/2022 Beyond neural scaling laws: beating power law scaling via data pruning
11/2022 Galactica: A Large Language Model for Science
01/2023 Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture (I-JEPA)
02/2023 LLaMA: Open and Efficient Foundation Language Models
02/2023 Toolformer: Language Models Can Teach Themselves to Use Tools
03/2023 Scaling Expert Language Models with Unsupervised Domain Discovery
03/2023 SemDeDup: Data-efficient learning at web-scale through semantic deduplication
04/2023 Segment Anything (SAM)
04/2023 A Cookbook of Self-Supervised Learning
05/2023 Learning to Reason and Memorize with Self-Notes
05/2023 ImageBind: One Embedding Space To Bind Them All
05/2023 MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
05/2023 LIMA: Less Is More for Alignment
05/2023 Scaling Speech Technology to 1,000+ Languages
05/2023 READ: Recurrent Adaptation of Large Transformers
05/2023 LLM-QAT: Data-Free Quantization Aware Training for Large Language Models
05/2023 Physics of Language Models: Part 1, Learning Hierarchical Language Structures
06/2023 Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
06/2023 Simple and Controllable Music Generation (MusicGen)
06/2023 Improving Open Language Models by Learning from Organic Interactions (BlenderBot 3x)
06/2023 Extending Context Window of Large Language Models via Positional Interpolation
06/2023 Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
07/2023 Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning (CM3leon)
07/2023 Llama 2: Open Foundation and Fine-Tuned Chat Models
08/2023 SeamlessM4T—Massively Multilingual & Multimodal Machine Translation
08/2023 D4: Improving LLM Pretraining via Document De-Duplication and Diversification
08/2023 Code Llama: Open Foundation Models for Code
08/2023 Nougat: Neural Optical Understanding for Academic Documents
09/2023 Contrastive Decoding Improves Reasoning in Large Language Models
09/2023 Effective Long-Context Scaling of Foundation Models
09/2023 AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
09/2023 Vision Transformers Need Registers
09/2023 Physics of Language Models: Part 3.1, Knowledge Storage and Extraction
09/2023 Physics of Language Models: Part 3.2, Knowledge Manipulation
10/2023 RA-DIT: Retrieval-Augmented Dual Instruction Tuning
10/2023 Branch-Solve-Merge Improves Large Language Model Evaluation and Generation
10/2023 Generative Pre-training for Speech with Flow Matching
11/2023 Emu Edit: Precise Image Editing via Recognition and Generation Tasks
12/2023 Audiobox: Unified Audio Generation with Natural Language Prompts
12/2023 Universal Pyramid Adversarial Training for Improved ViT Performance
01/2024 Self-Rewarding Language Models
02/2024 Revisiting Feature Prediction for Learning Visual Representations from Video (V-JEPA)
02/2024 MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
03/2024 Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
03/2024 Reverse Training to Nurse the Reversal Curse
04/2024 Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws
04/2024 Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
04/2024 TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
04/2024 Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding
04/2024 MoDE: CLIP Data Experts via Clustering
04/2024 Iterative Reasoning Preference Optimization
04/2024 Better & Faster Large Language Models via Multi-token Prediction
05/2024 Modeling Caption Diversity in Contrastive Vision-Language Pretraining (LLIP)
05/2024 Chameleon: Mixed-Modal Early-Fusion Foundation Models
05/2024 SpinQuant -- LLM quantization with learned rotations
05/2024 Contextual Position Encoding: Learning to Count What's Important
06/2024 The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More
06/2024 Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcemen
07/2024 The Llama 3 Herd of Models
07/2024 SAM 2: Segment Anything in Images and Videos
07/2024 Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process
07/2024 MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts
08/2024 Self-Taught Evaluators
08/2024 Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems
08/2024 Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
10/2024 The Perfect Blend: Redefining RLHF with Mixture of Judges (CGPO)
10/2024 Movie Gen: A Cast of Media Foundation Models
Microsoft Papers Blog
12/2015 Deep Residual Learning for Image Recognition
05/2021 EL-Attention: Memory Efficient Lossless Attention for Generation
01/2022 DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
03/2022 DeepNet: Scaling Transformers to 1,000 Layers
12/2022 A Length-Extrapolatable Transformer
01/2023 Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
02/2023 Language Is Not All You Need: Aligning Perception with Language Models (Kosmos-1)
03/2023 Sparks of Artificial General Intelligence: Early experiments with GPT-4
03/2023 TaskMatrix. AI: Completing Tasks by Connecting Foundation Models with Millions of APIs
04/2023 Instruction Tuning with GPT-4
04/2023 Inference with Reference: Lossless Acceleration of Large Language Models
04/2023 Low-code LLM: Visual Programming over LLMs
04/2023 WizardLM: Empowering Large Language Models to Follow Complex Instructions
04/2023 MLCopilot: Unleashing the Power of Large Language Models in Solving Machine Learning Tasks
04/2023 ResiDual: Transformer with Dual Residual Connections
05/2023 Code Execution with Pre-trained Language Models
05/2023 Small Models are Valuable Plug-ins for Large Language Models
05/2023 CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing
06/2023 Orca: Progressive Learning from Complex Explanation Traces of GPT-4
06/2023 Augmenting Language Models with Long-Term Memory
06/2023 WizardCoder: Empowering Code Large Language Models with Evol-Instruct
06/2023 Textbooks Are All You Need (phi-1)
07/2023 In-context Autoencoder for Context Compression in a Large Language Model
07/2023 Retentive Network: A Successor to Transformer for Large Language Models
08/2023 Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference
09/2023 Efficient RLHF: Reducing the Memory Usage of PPO
09/2023 DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
09/2023 Textbooks Are All You Need II (phi-1.5)
09/2023 PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training
09/2023 A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models
09/2023 Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models
10/2023 Sparse Backpropagation for MoE Training
10/2023 Nugget 2D: Dynamic Contextual Compression for Scaling Decoder-only Language Models
10/2023 Mixture of Quantized Experts (MoQE): Complementary Effect of Low-bit Quantization and Robustness
10/2023 Augmented Embeddings for Custom Retrievals
10/2023 Guiding Language Model Reasoning with Planning Tokens
10/2023 Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
10/2023 CodeFusion: A Pre-trained Diffusion Model for Code Generation
10/2023 LoRAShear: Efficient Large Language Model Structured Pruning and Knowledge Recovery
10/2023 FP8-LM: Training FP8 Large Language Models
11/2023 Orca 2: Teaching Small Language Models How to Reason
12/2023 ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
12/2023 The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
01/2024 SliceGPT: Compress Large Language Models by Deleting Rows and Columns
01/2024 RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture
02/2024 LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
02/2024 The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits (BitNet)
02/2024 ResLoRA: Identity Residual Mapping in Low-Rank Adaption
03/2024 LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
03/2024 SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series
04/2024 LongEmbed: Extending Embedding Models for Long Context Retrieval
04/2024 Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
05/2024 You Only Cache Once: Decoder-Decoder Architectures for Language Models (YOCO)
06/2024 Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
06/2024 E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
06/2024 Automatic Instruction Evolving for Large Language Models
07/2024 Arena Learning : Build Data Flywheel for LLMs Post-training via Simulated Chatbot Arena
07/2024 Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
09/2024 VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models
10/2024 Differential Transformer
OpenAI Papers Blog
07/2017 Proximal Policy Optimization Algorithms
04/2019 Generating Long Sequences with Sparse Transformers
01/2020 Scaling Laws for Neural Language Models
05/2020 Language Models are Few-Shot Learners (GPT-3)
01/2022 Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
03/2022 Training language models to follow instructions with human feedback (InstructGPT)
07/2022 Efficient Training of Language Models to Fill in the Middle
03/2023 GPT-4 Technical Report
04/2023 Consistency Models
05/2023 Let's Verify Step by Step
10/2023 Improving Image Generation with Better Captions (DALL·E 3)
10/2024 MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering
Hazy Research (Stanford) Papers Blog
10/2021 Efficiently Modeling Long Sequences with Structured State Spaces (S4)
04/2022 Monarch: Expressive Structured Matrices for Efficient and Accurate Training
05/2022 FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
12/2022 Hungry Hungry Hippos: Towards Language Modeling with State Space Models
02/2023 Simple Hardware-Efficient Long Convolutions for Sequence Modeling
02/2023 Hyena Hierarchy: Towards Larger Convolutional Language Models
06/2023 TART: A plug-and-play Transformer module for task-agnostic reasoning
07/2023 FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
11/2023 FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
DeepSeek Github
01/2024 DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
01/2024 DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
02/2024 DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
03/2024 DeepSeek-VL: Towards Real-World Vision-Language Understanding
05/2024 DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
06/2024 DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
07/2024 Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models
08/2024 DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
08/2024 Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning
THUDM (Tsinghua University) Papers Github
10/2022 GLM-130B: An Open Bilingual Pre-Trained Model
03/2023 CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X
04/2023 DoctorGLM: Fine-tuning your Chinese Doctor is not a Herculean Task
06/2023 WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human Preferences
09/2023 GPT Can Solve Mathematical Problems Without a Calculator (MathGLM)
10/2023 AgentTuning: Enabling Generalized Agent Abilities for LLMs (AgentLM)
11/2023 CogVLM: Visual Expert for Pretrained Language Models
12/2023 CogAgent: A Visual Language Model for GUI Agents
01/2024 APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding
01/2024 LongAlign: A Recipe for Long Context Alignment of Large Language Models
06/2024 ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools
08/2024 LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
Articles
03/2019 Rich Sutton - The Bitter Lesson
06/2022 Yann LeCun - A Path Towards Autonomous Machine Intelligence
01/2023 Lilian Weng - The Transformer Family Version 2.0
01/2023 Lilian Weng - Large Transformer Model Inference Optimization
03/2023 Stanford - Alpaca: A Strong, Replicable Instruction-Following Model
05/2023 OpenAI - Language models can explain neurons in language models
05/2023 Alex Turner - Steering GPT-2-XL by adding an activation vector
06/2023 YyWang - Do We Really Need the KVCache for All Large Language Models
06/2023 kaiokendev - Extending Context is Hard…but not Impossible
06/2023 bloc97 - NTK-Aware Scaled RoPE
07/2023 oobabooga - A direct comparison between llama.cpp, AutoGPTQ, ExLlama, and transformers perplexities
07/2023 Jianlin Su - Carrying the beta position to the end (better NTK RoPe method)
08/2023 Charles Goddard - On Frankenllama
10/2023 Tri Dao - Flash-Decoding for Long-Context Inference
10/2023 Evan Armstrong - Human-Sourced, AI-Augmented: a promising solution for open source conversational data
12/2023 Anthropic - Long context prompting for Claude 2.1
12/2023 Andrej Karpathy - On the "hallucination problem" (tweet.jpg)
12/2023 HuggingFace - Mixture of Experts Explained
01/2024 Vgel - Representation Engineering
01/2024 Alex Alemi - KL is All You Need
02/2024 Lilian Weng - Thinking about High-Quality Human Data
03/2024 rayliuca - T-Ragx Project Write Up (Translation RAG)
04/2024 Answer.Ai - Efficient finetuning of Llama 3 with FSDP QDoRA
04/2024 Sam Paech - Creating MAGI: A hard subset of MMLU and AGIEval
05/2024 LLaVA Team - LLaVA-NeXT: Stronger LLMs Supercharge Multimodal Capabilities in the Wild
05/2024 Hazy Research - GPUs Go Brrr (ThunderKittens)
05/2024 Anthropic - Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
06/2024 CharacterAI - Optimizing AI Inference
07/2024 Lilian Weng - Extrinsic Hallucinations in LLMs
07/2024 Andrej Karpathy - Let's reproduce GPT-2 (1.6B)
07/2024 Pierre-Carl Langlais - Announcing Finance Commons and the Bad Data Toolbox
07/2024 Zeyuan Allen-Zhu - Physics of Language Models ICML Talk (Video)
Open Models
06/2021 GPT-J-6B: 6B JAX-Based Transformer
09/2021 Pythia: A Customizable Hardware Prefetching Framework Using Online Reinforcement Learning
03/2022 CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis
04/2022 GPT-NeoX-20B: An Open-Source Autoregressive Language Model
11/2022 BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
12/2022 DDColor: Towards Photo-Realistic Image Colorization via Dual Decoders
04/2023 Visual Instruction Tuning (LLaVA)
05/2023 StarCoder: May the source be with you!
05/2023 CodeGen2: Lessons for Training LLMs on Programming and Natural Languages
05/2023 Otter: A Multi-Modal Model with In-Context Instruction Tuning
05/2023 InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
05/2023 CodeT5+: Open Code Large Language Models for Code Understanding and Generation
05/2023 ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
05/2023 RWKV: Reinventing RNNs for the Transformer Era
05/2023 Lion: Adversarial Distillation of Closed-Source Large Language Model
05/2023 MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training
06/2023 Segment Anything in High Quality
06/2023 Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
06/2023 High-Fidelity Audio Compression with Improved RVQGAN (DAC)
06/2023 StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
06/2023 Anticipatory Music Transformer
06/2023 RepoFusion: Training Code Models to Understand Your Repository
06/2023 MPT-30B: Raising the bar for open-source foundation models
06/2023 Vec2Vec: A Compact Neural Network Approach for Transforming Text Embeddings with High Fidelity
06/2023 ViNT: A Foundation Model for Visual Navigation
06/2023 How Long Can Open-Source LLMs Truly Promise on Context Length? (LongChat)
07/2023 Hierarchical Open-vocabulary Universal Image Segmentation
07/2023 Focused Transformer: Contrastive Training for Context Scaling (LongLLaMA
07/2023 Rhythm Modeling for Voice Conversion (Urhythmic)
07/2023 Scaling TransNormer to 175 Billion Parameters
08/2023 Separate Anything You Describe
08/2023 StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data
09/2023 RADIO: Reference-Agnostic Dubbing Video Synthesis
09/2023 Matcha-TTS: A fast TTS architecture with conditional flow matching
09/2023 DreamLLM: Synergistic Multimodal Comprehension and Creation
09/2023 Baichuan 2: Open Large-scale Language Models
09/2023 Qwen Technical Report
09/2023 Mistral 7B
10/2023 MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning
10/2023 Improved Baselines with Visual Instruction Tuning (LLaVA 1.5)
10/2023 LLark: A Multimodal Foundation Model for Music
10/2023 SALMONN: Towards Generic Hearing Abilities for Large Language Models
10/2023 Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents
11/2023 Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
11/2023 UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
11/2023 YUAN 2.0: A Large Language Model with Localized Filtering-based Attention
12/2023 Making Large Multimodal Models Understand Arbitrary Visual Prompts (ViP-LLaVA)
12/2023 Mamba: Linear-Time Sequence Modeling with Selective State Spaces
12/2023 OpenVoice: Versatile Instant Voice Cloning
12/2023 Sequential Modeling Enables Scalable Learning for Large Vision Models (LVM)
12/2023 Magicoder: Source Code Is All You Need
12/2023 StripedHyena-7B, open source models offering a glimpse into a world beyond Transformers
12/2023 MMM: Generative Masked Motion Model
12/2023 4M: Massively Multimodal Masked Modeling
12/2023 LLM360: Towards Fully Transparent Open-Source LLMs
12/2023 SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
01/2024 Mixtral of Experts
01/2024 EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
01/2024 Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications
01/2024 Scalable Pre-training of Large Autoregressive Image Models
01/2024 Orion-14B: Open-source Multilingual Large Language Models
01/2024 Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
01/2024 VMamba: Visual State Space Model
01/2024 MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
01/2024 LLaVA-1.6: Improved reasoning, OCR, and world knowledge
01/2024 MiniCPM: Unveiling the Potential of End-side Large Language Models
01/2024 Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
02/2024 Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State Spaces
02/2024 Introducing Qwen1.5
02/2024 BlackMamba: Mixture of Experts for State-Space Models
02/2024 EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
02/2024 GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators
02/2024 Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion
02/2024 Brant-2: Foundation Model for Brain Signals
02/2024 CLLMs: Consistency Large Language Models
03/2024 Scaling Rectified Flow Transformers for High-Resolution Image Synthesis (SD3)
03/2024 TripoSR: Fast 3D Object Reconstruction from a Single Image
03/2024 Yi: Open Foundation Models by 01.AI
03/2024 VideoMamba: State Space Model for Efficient Video Understanding
03/2024 VOICECRAFT: Zero-Shot Speech Editing and Text-to-Speech in the Wild
03/2024 GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation
03/2024 DBRX: A New State-of-the-Art Open LLM
03/2024 AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
03/2024 Jamba: A Hybrid Transformer-Mamba Language Model
04/2024 Advancing LLM Reasoning Generalists with Preference Trees (Eurus)
04/2024 Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction (VAR)
04/2024 Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
04/2024 Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
05/2024 Language-Image Models with 3D Understanding (Cube-LLM)
05/2024 AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding
05/2024 Pandora : Towards General World Model with Natural Language Actions and Video State
05/2024 TerDiT: Ternary Diffusion Models with Transformers
05/2024 NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models
05/2024 Phased Consistency Model
05/2024 MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series
05/2024 YOLOv10: Real-Time End-to-End Object Detection
05/2024 MegActor: Harness the Power of Raw Video for Vivid Portrait Animation
06/2024 Bootstrap3D: Improving 3D Content Creation with Synthetic Data
06/2024 EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture
06/2024 ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec
06/2024 GrootVL: Tree Topology is All You Need in State Space Model
06/2024 An Independence-promoting Loss for Music Generation with Language Models (MusicGen-MMD)
06/2024 Matching Anything by Segmenting Anything
06/2024 Nemotron-4 340B Technical Report
06/2024 TroL: Traversal of Layers for Large Language and Vision Models
06/2024 Depth Anything V2
06/2024 HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale
06/2024 Network Bending of Diffusion Models for Audio-Visual Generation
06/2024 Less is More: Accurate Speech Recognition & Translation without Web-Scale Data (Canary)
07/2024 LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control
07/2024 Qwen2 Technical Report
07/2024 Qwen2-Audio Technical Report
07/2024 ColPali: Efficient Document Retrieval with Vision Language Models
07/2024 Compact Language Models via Pruning and Knowledge Distillation (Minitron)
08/2024 mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
08/2024 Jamba-1.5: Hybrid Transformer-Mamba Models at Scale
08/2024 SelectTTS: Synthesizing Anyone's Voice via Discrete Unit-Based Frame Selection
09/2024 OLMoE: Open Mixture-of-Experts Language Models
09/2024 Sample-Efficient Diffusion for Text-To-Speech Synthesis (SESD)
09/2024 Multi-Source Music Generation with Latent Diffusion (MSLDM)
09/2024 Prithvi WxC: Foundation Model for Weather and Climate
09/2024 DiffEditor: Enhancing Speech Editing with Semantic Enrichment and Acoustic Consistency
09/2024 Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
09/2024 MIO: A Foundation Model on Multimodal Tokens
10/2024 UniMuMo: Unified Text, Music and Motion Generation
10/2024 RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation
10/2024 Aria: An Open Multimodal Native Mixture-of-Experts Model
Various
09/2014 Neural Machine Translation by Jointly Learning to Align and Translate
06/2019 Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View
10/2019 Root Mean Square Layer Normalization
10/2019 Transformers without Tears: Improving the Normalization of Self-Attention
12/2019 Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection
02/2020 On Layer Normalization in the Transformer Architecture
04/2020 Longformer: The Long-Document Transformer
04/2020 Improved Natural Language Generation via Loss Truncation
06/2020 Memory Transformer
07/2020 Mirostat: A Neural Text Decoding Algorithm that Directly Controls Perplexity
12/2020 ERNIE-Doc: A Retrospective Long-Document Modeling Transformer
01/2021 Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks
03/2021 The Low-Rank Simplicity Bias in Deep Networks
04/2021 RoFormer: Enhanced Transformer with Rotary Position Embedding
06/2021 LoRA: Low-Rank Adaptation of Large Language Models
07/2023 CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention
03/2022 Memorizing Transformers
04/2022 UL2: Unifying Language Learning Paradigms
05/2022 Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning (IA3)
06/2022 nuQmm: Quantized MatMul for Efficient Inference of Large-Scale Generative Language Models
07/2022 Language Models (Mostly) Know What They Know
08/2022 LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
09/2022 Petals: Collaborative Inference and Fine-tuning of Large Models
10/2022 GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
10/2022 Recurrent Memory Transformer
10/2022 Truncation Sampling as Language Model Desmoothing
10/2022 DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation
11/2022 An Algorithm for Routing Vectors in Sequences
11/2022 MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
12/2022 Self-Instruct: Aligning Language Model with Self Generated Instructions
12/2022 Parallel Context Windows Improve In-Context Learning of Large Language Models
12/2022 Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor
12/2022 Pretraining Without Attention
12/2022 The case for 4-bit precision: k-bit Inference Scaling Laws
12/2022 Prompting Is Programming: A Query Language for Large Language Models
01/2023 SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
01/2023 SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
01/2023 Memory Augmented Large Language Models are Computationally Universal
01/2023 Progress measures for grokking via mechanistic interpretability
01/2023 Adaptive Computation with Elastic Input Sequence
02/2023 Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models
02/2023 The Wisdom of Hindsight Makes Language Models Better Instruction Followers
02/2023 The Stable Entropy Hypothesis and Entropy-Aware Decoding: An Analysis and Algorithm for Robust Natural Language Generation
03/2023 COLT5: Faster Long-Range Transformers with Conditional Computation
03/2023 High-throughput Generative Inference of Large Language Models with a Single GPU
03/2023 Meet in the Middle: A New Pre-training Paradigm
03/2023 Reflexion: an autonomous agent with dynamic memory and self-reflection
03/2023 Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning
03/2023 FP8 versus INT8 for efficient deep learning inference
03/2023 Self-Refine: Iterative Refinement with Self-Feedback
04/2023 RPTQ: Reorder-based Post-training Quantization for Large Language Models
04/2023 REFINER: Reasoning Feedback on Intermediate Representations
04/2023 Generative Agents: Interactive Simulacra of Human Behavior
04/2023 Compressed Regression over Adaptive Networks
04/2023 A Cheaper and Better Diffusion Language Model with Soft-Masked Noise
04/2023 RRHF: Rank Responses to Align Language Models with Human Feedback without tears
04/2023 CAMEL: Communicative Agents for "Mind" Exploration of Large Scale Language Model Society
04/2023 Automatic Gradient Descent: Deep Learning without Hyperparameters
04/2023 SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models
04/2023 Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study
04/2023 Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling
04/2023 Scaling Transformer to 1M tokens and beyond with RMT
04/2023 Answering Questions by Meta-Reasoning over Multiple Chains of Thought
04/2023 Towards Multi-Modal DBMSs for Seamless Querying of Texts and Tables
04/2023 We're Afraid Language Models Aren't Modeling Ambiguity
04/2023 The Internal State of an LLM Knows When its Lying
04/2023 Search-in-the-Chain: Towards the Accurate, Credible and Traceable Content Generation for Complex Knowledge-intensive Tasks
05/2023 Towards Unbiased Training in Federated Open-world Semi-supervised Learning
05/2023 Unlimiformer: Long-Range Transformers with Unlimited Length Input
05/2023 FreeLM: Fine-Tuning-Free Language Model
05/2023 Cuttlefish: Low-rank Model Training without All The Tuning
05/2023 AttentionViz: A Global View of Transformer Attention
05/2023 Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models
05/2023 A Frustratingly Easy Improvement for Position Embeddings via Random Padding
05/2023 Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
05/2023 Explanation-based Finetuning Makes Models More Robust to Spurious Cues
05/2023 An automatically discovered chain-of-thought prompt generalizes to novel models and datasets
05/2023 Recommender Systems with Generative Retrieval
05/2023 Fast Distributed Inference Serving for Large Language Models
05/2023 Chain-of-Dictionary Prompting Elicits Translation in Large Language Models
05/2023 Recommendation as Instruction Following: A Large Language Model Empowered Recommendation Approach
05/2023 Active Retrieval Augmented Generation
05/2023 Scalable Coupling of Deep Learning with Logical Reasoning
05/2023 Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
05/2023 StructGPT: A General Framework for Large Language Model to Reason over Structured Data
05/2023 Pre-Training to Learn in Context
05/2023 ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings
05/2023 Accelerating Transformer Inference for Translation via Parallel Decoding
05/2023 Cooperation Is All You Need
05/2023 PTQD: Accurate Post-Training Quantization for Diffusion Models
05/2023 LLM-Pruner: On the Structural Pruning of Large Language Models
05/2023 SelfzCoT: a Self-Prompt Zero-shot CoT from Semantic-level to Code-level for a Better Utilization of LLMs
05/2023 QLoRA: Efficient Finetuning of Quantized LLMs
05/2023 "According to ..." Prompting Language Models Improves Quoting from Pre-Training Data
05/2023 Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training
05/2023 Landmark Attention: Random-Access Infinite Context Length for Transformers
05/2023 Scaling Data-Constrained Language Models
05/2023 Fine-Tuning Language Models with Just Forward Passes
05/2023 Intriguing Properties of Quantization at Scale
05/2023 Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time
05/2023 Blockwise Parallel Transformer for Long Context Large Models
05/2023 The Impact of Positional Encoding on Length Generalization in Transformers
05/2023 Adapting Language Models to Compress Contexts
05/2023 Direct Preference Optimization: Your Language Model is Secretly a Reward Model
06/2023 AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
06/2023 Faster Causal Attention Over Large Sequences Through Sparse Flash Attention
06/2023 Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
06/2023 SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression
06/2023 Fine-Tuning Language Models with Advantage-Induced Policy Alignment
06/2023 Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards
06/2023 Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
06/2023 Mixture-of-Domain-Adapters: Decoupling and Injecting Domain Knowledge to Pre-trained Language Models Memories
06/2023 Boosting Fast and High-Quality Speech Synthesis with Linear Diffusion
06/2023 Word sense extension
06/2023 Mitigating Transformer Overconfidence via Lipschitz Regularization
06/2023 Recurrent Attention Networks for Long-text Modeling
06/2023 One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning
06/2023 SqueezeLLM: Dense-and-Sparse Quantization
06/2023 Tune As You Scale: Hyperparameter Optimization For Compute Efficient Training
06/2023 Propagating Knowledge Updates to LMs Through Distillation
06/2023 Full Parameter Fine-tuning for Large Language Models with Limited Resources
06/2023 A Simple and Effective Pruning Approach for Large Language Models
06/2023 InRank: Incremental Low-Rank Learning
06/2023 Evaluating the Zero-shot Robustness of Instruction-tuned Language Models
06/2023 Learning to Generate Better Than Your LLM (RLGF)
06/2023 Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing
06/2023 H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Model
06/2023 FLuRKA: Fast fused Low-Rank & Kernel Attention
06/2023 Stay on topic with Classifier-Free Guidance
07/2023 AutoST: Training-free Neural Architecture Search for Spiking Transformers
07/2023 Single Sequence Prediction over Reasoning Graphs for Multi-hop QA
07/2023 Shifting Attention to Relevance: Towards the Uncertainty Estimation of Large Language Models
07/2023 Facing off World Model Backbones: RNNs, Transformers, and S4
07/2023 Improving Retrieval-Augmented Large Language Models via Data Importance Learning
07/2023 Teaching Arithmetic to Small Transformers
07/2023 QIGen: Generating Efficient Kernels for Quantized Inference on Large Language Models
07/2023 Stack More Layers Differently: High-Rank Training Through Low-Rank Updates
07/2023 Copy Is All You Need (CoG)
07/2023 Multi-Method Self-Training: Improving Code Generation With Text, And Vice Versa
07/2023 Divide & Bind Your Attention for Improved Generative Semantic Nursing
07/2023 Challenges and Applications of Large Language Models
07/2023 Soft Prompt Tuning for Augmenting Dense Retrieval with Large Language Models
07/2023 QuIP: 2-Bit Quantization of Large Language Models With Guarantees
07/2023 CoRe Optimizer: An All-in-One Solution for Machine Learning
07/2023 Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
08/2023 ReCLIP: Refine Contrastive Language Image Pre-Training with Source Free Domain Adaptation
08/2023 EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language Models
08/2023 Activation Addition: Steering Language Models Without Optimization
08/2023 OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models
08/2023 Accelerating LLM Inference with Staged Speculative Decoding
08/2023 YaRN: Efficient Context Window Extension of Large Language Models
08/2023 LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models
09/2023 Making Large Language Models Better Reasoners with Alignment
09/2023 Data-Juicer: A One-Stop Data Processing System for Large Language Models
09/2023 Delta-LoRA: Fine-Tuning High-Rank Parameters with the Delta of Low-Rank Matrices
09/2023 SLiMe: Segment Like Me
09/2023 Norm Tweaking: High-performance Low-bit Quantization of Large Language Models
09/2023 When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale
09/2023 Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
09/2023 Efficient Memory Management for Large Language Model Serving with PagedAttention
09/2023 Cure the headache of Transformers via Collinear Constrained Attention
09/2023 Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
09/2023 LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
09/2023 MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation
09/2023 Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models
09/2023 Improving Code Generation by Dynamic Temperature Sampling
09/2023 Efficient Streaming Language Models with Attention Sinks
10/2023 DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models
10/2023 GrowLength: Accelerating LLMs Pretraining by Progressively Growing Training Length
10/2023 Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image Diffusion Models
10/2023 Elephant Neural Networks: Born to Be a Continual Learner
10/2023 Ring Attention with Blockwise Transformers for Near-Infinite Context
10/2023 Retrieval meets Long Context Large Language Models
10/2023 DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
10/2023 LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers
10/2023 Amortizing intractable inference in large language models (GFlowNet Tuning)
10/2023 SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF
10/2023 Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity
10/2023 Let Models Speak Ciphers: Multiagent Debate through Embeddings
10/2023 InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining
10/2023 CacheGen: Fast Context Loading for Language Model Applications
10/2023 MatFormer: Nested Transformer for Elastic Inference
10/2023 LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
10/2023 Towards End-to-end 4-Bit Inference on Generative Large Language Models (QUIK)
10/2023 Microscaling Data Formats for Deep Learning
10/2023 xVal: A Continuous Number Encoding for Large Language Models
10/2023 An Emulator for Fine-Tuning Large Language Models using Small Language Models
10/2023 Frozen Transformers in Language Models Are Effective Visual Encoder Layers
10/2023 LoBaSS: Gauging Learnability in Supervised Fine-tuning Data
10/2023 Quality-Diversity through AI Feedback
10/2023 Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution (SEDD)
10/2023 DoGE: Domain Reweighting with Generalization Estimation
10/2023 E-Sparse: Boosting the Large Language Model Inference through Entropy-based N:M Sparsity
10/2023 Mixture of Tokens: Efficient LLMs through Cross-Example Aggregation
10/2023 Personas as a Way to Model Truthfulness in Language Models
10/2023 Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
10/2023 QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
11/2023 AWEQ: Post-Training Quantization with Activation-Weight Equalization for Large Language Models
11/2023 FlashDecoding++: Faster Large Language Model Inference on GPUs
11/2023 Divergent Token Metrics: Measuring degradation to prune away LLM components -- and optimize quantization
11/2023 Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs
11/2023 REST: Retrieval-Based Speculative Decoding
11/2023 DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines
11/2023 Token-level Adaptation of LoRA Adapters for Downstream Task Generalization
11/2023 Exponentially Faster Language Modelling
11/2023 MultiLoRA: Democratizing LoRA for Better Multi-Task Learning
11/2023 LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model Finetuning
11/2023 Token Recycling for Efficient Sequential Inference with Vision Transformers
11/2023 Enabling Fast 2-bit LLM on GPUs: Memory Alignment, Sparse Outlier, and Asynchronous Dequantization
12/2023 GIFT: Generative Interpretable Fine-Tuning Transformers
12/2023 PEFA: Parameter-Free Adapters for Large-scale Embedding-based Retrieval Models
12/2023 Improving Activation Steering in Language Models with Mean-Centring
12/2023 A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA
12/2023 SparQ Attention: Bandwidth-Efficient LLM Inference
12/2023 ESPN: Memory-Efficient Multi-Vector Information Retrieval
12/2023 Aligner: One Global Token is Worth Millions of Parameters When Aligning Large Language Models
12/2023 CBQ: Cross-Block Quantization for Large Language Models
12/2023 SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
12/2023 Weight subcloning: direct initialization of transformers using larger pretrained ones
12/2023 Cascade Speculative Drafting for Even Faster LLM Inference
12/2023 ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference
12/2023 Lookahead: An Inference Acceleration Framework for Large Language Model with Lossless Generation Accuracy
12/2023 A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Descriptive Properties
12/2023 Algebraic Positional Encodings
12/2023 Preference as Reward, Maximum Preference Optimization with Importance Sampling
01/2024 LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
01/2024 Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
01/2024 LLaMA Pro: Progressive LLaMA with Block Expansion
01/2024 Fast and Optimal Weight Update for Pruned Large Language Models
01/2024 Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
01/2024 MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
01/2024 Chain of LoRA: Efficient Fine-tuning of Language Models via Residual Learning
01/2024 RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation
01/2024 Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
01/2024 AUTOACT: Automatic Agent Learning from Scratch via Self-Planning
01/2024 Extreme Compression of Large Language Models via Additive Quantization (AQLM)
01/2024 Knowledge Translation: A New Pathway for Model Compression
01/2024 Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
01/2024 Transformers are Multi-State RNNs
01/2024 Extending LLMs' Context Window with 100 Samples (Entropy-ABF)
01/2024 ChatQA: Building GPT-4 Level Conversational QA Models
01/2024 AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence Inference
01/2024 Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
01/2024 Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation
01/2024 BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language Models
01/2024 Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment
01/2024 Dynamic Layer Tying for Parameter-Efficient Transformers
01/2024 MambaByte: Token-free Selective State Space Model
01/2024 FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design
01/2024 Accelerating Retrieval-Augmented Language Model Serving with Speculation
01/2024 Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
01/2024 EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
01/2024 With Greater Text Comes Greater Necessity: Inference-Time Training Helps Long Text Generation (Temp LoRA)
01/2024 YODA: Teacher-Student Progressive Learning for Language Models
01/2024 KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
01/2024 LOCOST: State-Space Models for Long Document Abstractive Summarization
01/2024 Convolution Meets LoRA: Parameter Efficient Finetuning for Segment Anything Model
01/2024 RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
02/2024 EE-Tuning: An Economical yet Scalable Solution for Tuning Early-Exit Large Language Models
02/2024 MoDE: A Mixture-of-Experts Model with Mutual Distillation among the Experts
02/2024 Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
02/2024 Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
02/2024 HiQA: A Hierarchical Contextual Augmentation RAG for Massive Documents QA
02/2024 KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
02/2024 DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing
02/2024 QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks
02/2024 Hydragen: High-Throughput LLM Inference with Shared Prefixes
02/2024 Hydra: Sequentially-Dependent Draft Heads for Medusa Decoding
02/2024 LESS: Selecting Influential Data for Targeted Instruction Tuning
02/2024 Accurate LoRA-Finetuning Quantization of LLMs via Information Retention
02/2024 AttnLRP: Attention-Aware Layer-wise Relevance Propagation for Transformers
02/2024 X-LoRA: Mixture of Low-Rank Adapter Experts, a Flexible Framework for Large Language Models with Applications in Protein Mechanics
02/2024 BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
02/2024 Mitigating Object Hallucination in Large Vision-Language Models via Classifier-Free Guidance
02/2024 Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference
02/2024 Uncertainty Decomposition and Quantification for In-Context Learning of Large Language Models
02/2024 RS-DPO: A Hybrid Rejection Sampling and Direct Preference Optimization Method for Alignment of Large Language Models
02/2024 BitDelta: Your Fine-Tune May Only Be Worth One Bit
02/2024 DoRA: Weight-Decomposed Low-Rank Adaptation
02/2024 In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss
02/2024 Aligning Modalities in Vision Large Language Models via Preference Fine-tuning
02/2024 Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct Decoding
02/2024 Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts
02/2024 WKVQuant: Quantizing Weight and Key/Value Cache for Large Language Models Gains More
02/2024 DB-LLM: Accurate Dual-Binarization for Efficient LLMs
02/2024 Data Engineering for Scaling Language Models to 128K Context
02/2024 EBFT: Effective and Block-Wise Fine-Tuning for Sparse LLMs
02/2024 HyperMoE: Towards Better Mixture of Experts via Transferring Among Experts
02/2024 Turn Waste into Worth: Rectifying Top
02/2024 Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive
02/2024 Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
02/2024 Take the Bull by the Horns: Hard Sample-Reweighted Continual Training Improves LLM Generalization
02/2024 MemoryPrompt: A Light Wrapper to Improve Context Tracking in Pre-trained Language Models
02/2024 Fine-tuning CLIP Text Encoders with Two-step Paraphrasing
02/2024 BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation
02/2024 No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization
02/2024 DropBP: Accelerating Fine-Tuning of Large Language Models by Dropping Backward Propagation
02/2024 CoDream: Exchanging dreams instead of models for federated aggregation with heterogeneous models
02/2024 Humanoid Locomotion as Next Token Prediction
02/2024 KTO: Model Alignment as Prospect Theoretic Optimization
02/2024 Noise Contrastive Alignment of Language Models with Explicit Rewards (NCA)
02/2024 ReLU2 Wins: Discovering Efficient Activation Functions for Sparse LLMs
02/2024 Training-Free Long-Context Scaling of Large Language Models (DCA)
03/2024 Not all Layers of LLMs are Necessary during Inference
03/2024 Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models
03/2024 DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models
03/2024 GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
03/2024 Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding
03/2024 Scattered Mixture-of-Experts Implementation
03/2024 AutoLoRA: Automatically Tuning Matrix Ranks in Low-Rank Adaptation Based on Meta Learning
03/2024 BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences
03/2024 Bifurcated Attention for Single-Context Large-Batch Sampling
03/2024 Keyformer: KV Cache Reduction through Key Tokens Selection for Efficient Generative Inference
03/2024 Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering
03/2024 Recurrent Drafter for Fast Speculative Decoding in Large Language Models
03/2024 Arcee's MergeKit: A Toolkit for Merging Large Language Models
03/2024 Rotary Position Embedding for Vision Transformer
03/2024 BiLoRA: A Bi-level Optimization Framework for Overfitting-Resilient Low-Rank Adaptation of Large Pre-trained Models
03/2024 Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition
03/2024 DreamReward: Text-to-3D Generation with Human Preference
03/2024 Evolutionary Optimization of Model Merging Recipes
03/2024 Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance
03/2024 When Do We Not Need Larger Vision Models?
03/2024 FeatUp: A Model-Agnostic Framework for Features at Any Resolution
03/2024 ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching
03/2024 The Unreasonable Ineffectiveness of the Deeper Layers
03/2024 QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs
04/2024 LLM-ABR: Designing Adaptive Bitrate Algorithms via Large Language Models
04/2024 Prompt-prompted Mixture of Experts for Efficient LLM Generation (GRIFFIN)
04/2024 BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models
04/2024 SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget
04/2024 CodecLM: Aligning Language Models with Tailored Synthetic Data
04/2024 Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation
04/2024 Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs
04/2024 Continuous Language Model Interpolation for Dynamic and Controllable Text Generation
04/2024 RULER: What's the Real Context Size of Your Long-Context Language Models?
04/2024 Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models
04/2024 On Speculative Decoding for Multimodal Large Language Models
04/2024 CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models
04/2024 Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs
04/2024 Fewer Truncations Improve Language Modeling
04/2024 When LLMs are Unfit Use FastFit: Fast and Effective Text Classification with Many Classes
04/2024 Learn2Talk: 3D Talking Face Learns from 2D Talking Face
04/2024 Weak-to-Strong Extrapolation Expedites Alignment (EXPO)
04/2024 decoupleQ: Towards 2-bit Post-Training Uniform Quantization via decoupling Parameters into Integer and Floating Points
04/2024 RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation
04/2024 Lossless Acceleration of Large Language Model via Adaptive N-gram Parallel Decoding
04/2024 Mixture of LoRA Experts
04/2024 MARVEL: Multidimensional Abstraction and Reasoning through Visual Evaluation and Learning
04/2024 XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts
04/2024 Retrieval Head Mechanistically Explains Long-Context Factuality
04/2024 Let's Think Dot by Dot: Hidden Computation in Transformer Language Models
04/2024 Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting
05/2024 When to Retrieve: Teaching LLMs to Utilize Information Retrieval Effectively
05/2024 A Careful Examination of Large Language Model Performance on Grade School Arithmetic
05/2024 Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge
05/2024 Parameter-Efficient Fine-Tuning with Discrete Fourier Transform
05/2024 COPAL: Continual Pruning in Large Language Generative Models
05/2024 Revisiting a Pain in the Neck: Semantic Phrase Processing Benchmark for Language Models
05/2024 AlphaMath Almost Zero: process Supervision without process
05/2024 QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
05/2024 xLSTM: Extended Long Short-Term Memory
05/2024 FlashBack:Efficient Retrieval-Augmented Language Modeling for Long Context Inference
05/2024 SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
05/2024 HMT: Hierarchical Memory Transformer for Long Context Language Processing
05/2024 The Future of Large Language Model Pre-training is Federated
05/2024 Layer-Condensed KV Cache for Efficient Inference of Large Language Models
05/2024 MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
05/2024 SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model
05/2024 Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
05/2024 Bagging Improves Generalization Exponentially
05/2024 Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models
05/2024 Unchosen Experts Can Contribute Too: Unleashing MoE Models' Power by Self-Contrast
05/2024 Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum
05/2024 T2 of Thoughts: Temperature Tree Elicits Reasoning in Large Language Models
05/2024 ReALLM: A general framework for LLM compression and fine-tuning
05/2024 SimPO: Simple Preference Optimization with a Reference-Free Reward
05/2024 PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression
05/2024 Removing Bias from Maximum Likelihood Estimation with Model Autophagy
05/2024 RE-Adapt: Reverse Engineered Adaptation of Large Language Models
05/2024 MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence
05/2024 Data Mixing Made Efficient: A Bivariate Scaling Law for Language Model Pretraining
05/2024 Accelerating Transformers with Spectrum-Preserving Token Merging
05/2024 A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training
05/2024 MoEUT: Mixture-of-Experts Universal Transformers
05/2024 Exploring Context Window of Large Language Models via Decomposed Positional Vectors
05/2024 Transformers Can Do Arithmetic with the Right Embeddings
05/2024 OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning
05/2024 MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification
05/2024 Self-Play Preference Optimization for Language Model Alignment
05/2024 The Road Less Scheduled(Schedule-Free)
06/2024 FineWeb: decanting the web for the finest text data at scale
06/2024 Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality (Mamba-2)
06/2024 Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization
06/2024 DeCoOp: Robust Prompt Tuning with Out-of-Distribution Detection
06/2024 MultiMax: Sparse and Multi-Modal Attention Learning
06/2024 MagR: Weight Magnitude Reduction for Enhancing Post-Training Quantization
06/2024 Mix-of-Granularity: Optimize the Chunking Granularity for Retrieval-Augmented Generation
06/2024 QuanTA: Efficient High-Rank Fine-Tuning of LLMs with Quantum-Informed Tensor Adaptation
06/2024 SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining
06/2024 Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
06/2024 VCR: Visual Caption Restoration
06/2024 LoCoCo: Dropping In Convolutions for Long Context Compression
06/2024 Low-Rank Quantization-Aware Training for LLMs
06/2024 Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters
06/2024 DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion
06/2024 TernaryLLM: Ternarized Large Language Model
06/2024 Image and Video Tokenization with Binary Spherical Quantization
06/2024 Discovering Preference Optimization Algorithms with and for Large Language Models
06/2024 ProTrain: Efficient LLM Training via Memory-Aware Techniques
06/2024 PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling
06/2024 Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
06/2024 Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs
06/2024 HiP Attention: Sparse Sub-Quadratic Attention with Hierarchical Attention Pruning
06/2024 LieRE: Generalizing Rotary Position Encodings
06/2024 DiTTo-TTS: Efficient and Scalable Zero-Shot Text-to-Speech with Diffusion Transformer
06/2024 Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
06/2024 Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies
06/2024 mDPO: Conditional Preference Optimization for Multimodal Large Language Models
06/2024 QTIP: Quantization with Trellises and Incoherence Processing
06/2024 Mixture-of-Subspaces in Low-Rank Adaptation (MoSLoRA)
06/2024 Prefixing Attention Sinks can Mitigate Activation Outliers for Large Language Model Quantization
06/2024 Mixture of Scales: Memory-Efficient Token-Adaptive Binarization for Large Language Models
06/2024 DeciMamba: Exploring the Length Extrapolation Potential of Mamba
06/2024 Optimised Grouped-Query Attention Mechanism for Transformers
06/2024 MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression
06/2024 Unsupervised Morphological Tree Tokenizer
06/2024 Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation
06/2024 What Matters in Transformers? Not All Attention is Needed
06/2024 Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention
06/2024 ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models
06/2024 Adam-mini: Use Fewer Learning Rates To Gain More
06/2024 Large Language Models are Interpretable Learners
06/2024 Selective Prompting Tuning for Personalized Conversations with LLMs
06/2024 Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs
06/2024 Unsupervised Morphological Tree Tokenizer (TreeTok)
07/2024 Eliminating Position Bias of Language Models: A Mechanistic Approach
07/2024 Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion
07/2024 Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs
07/2024 LoCo: Low-Bit Communication Adaptor for Large-scale Model Training
07/2024 Code Less, Align More: Efficient LLM Fine-tuning for Code Generation with Data Pruning
07/2024 Learning to (Learn at Test Time): RNNs with Expressive Hidden States (TTT)
07/2024 Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps
07/2024 Mixture-of-Modules: Reinventing Transformers as Dynamic Assemblies of Modules
07/2024 OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training
07/2024 Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization
07/2024 Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients
07/2024 FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
07/2024 Lite-SAM Is Actually What You Need for Segment Everything
07/2024 BitNet b1.58 Reloaded: State-of-the-art Performance Also on Smaller Networks
07/2024 Tiled Bit Networks: Sub-Bit Neural Network Compression Through Reuse of Learnable Binary Vectors
07/2024 Patch-Level Training for Large Language Models
07/2024 Correcting the Mythos of KL-Regularization: Direct Alignment without Overparameterization via Chi-squared Preference Optimization
07/2024 LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference
07/2024 Hi-EF: Benchmarking Emotion Forecasting in Human-interaction
07/2024 RazorAttention: Efficient KV Cache Compression Through Retrieval Heads
07/2024 MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning
07/2024 Palu: Compressing KV-Cache with Low-Rank Projection
07/2024 AI-Assisted Generation of Difficult Math Questions (MATH^2)
07/2024 EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
07/2024 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies
08/2024 POA: Pre-training Once for Models of All Sizes
08/2024 An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion
08/2024 Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
08/2024 Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion
08/2024 Eigen Attention: Attention in Low-Rank Space for KV Cache Compression
08/2024 The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
08/2024 Post-Training Sparse Attention with Double Sparsity
08/2024 A Spitting Image: Modular Superpixel Tokenization in Vision Transformers (SPiT)
08/2024 JPEG-LM: LLMs as Image Generators with Canonical Codec Representations
08/2024 SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models
08/2024 HMoE: Heterogeneous Mixture of Experts for Language Modeling
08/2024 First Activations Matter: Training-Free Methods for Dynamic Activation in Large Language Models
08/2024 LLM Pruning and Distillation in Practice: The Minitron Approach
08/2024 FocusLLM: Scaling LLM's Context by Parallel Decoding
08/2024 Memory-Efficient LLM Training with Online Subspace Descent
08/2024 MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
08/2024 Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer
09/2024 FedModule: A Modular Federated Learning Framework
09/2024 Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers
09/2024 STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning
09/2024 Length Desensitization in Directed Preference Optimization (LD-DPO)
09/2024 CPL: Critical Planning Step Learning Boosts LLM Generalization in Reasoning Tasks
09/2024 RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
09/2024 SOAP: Improving and Stabilizing Shampoo using Adam
09/2024 A Controlled Study on Long Context Extension and Generalization in LLMs
09/2024 Scaling FP8 training to trillion-token LLMs
09/2024 Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts
09/2024 INT-FlashAttention: Enabling Flash Attention for INT8 Quantization
09/2024 Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction
09/2024 SATA: Spatial Autocorrelation Token Analysis for Enhancing the Robustness of Vision Transformers
10/2024 VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment
10/2024 FlashMask: Efficient and Rich Mask Extension of FlashAttention
10/2024 OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
10/2024 Parameter Competition Balancing for Model Merging
10/2024 SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
10/2024 ARB-LLM: Alternating Refined Binarizations for Large Language Models
10/2024 Contextual Document Embeddings
10/2024 SparsePO: Controlling Preference Alignment of LLMs via Sparse Token Masks
10/2024 Accelerating Diffusion Transformers with Token-wise Feature Caching
10/2024 Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling
10/2024 Restructuring Vector Quantization with the Rotation Trick
10/2024 Upcycling Large Language Models into Mixture of Experts
10/2024
Edit
Pub: 21 Mar 2023 02:18 UTC
Edit: 11 Oct 2024 04:43 UTC
Views: 18474