Local Models Related Papers

/lmg/ Accelerate
Google Papers Blog
12/2017 Attention Is All You Need (Transformers)
10/2018 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
10/2019 Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (T5)
11/2019 Fast Transformer Decoding: One Write-Head is All You Need
02/2020 GLU Variants Improve Transformer
03/2020 Talking-Heads Attention
05/2020 Conformer: Convolution-augmented Transformer for Speech Recognition
09/2020 Efficient Transformers: A Survey
12/2020 RealFormer: Transformer Likes Residual Attention
01/2021 Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
09/2021 Finetuned Language Models Are Zero-Shot Learners (Flan)
09/2021 Primer: Searching for Efficient Transformers for Language Modeling
11/2021 Sparse is Enough in Scaling Transformers
12/2021 GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
01/2022 LaMDA: Language Models for Dialog Applications
01/2022 Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
04/2022 PaLM: Scaling Language Modeling with Pathways
10/2022 Scaling Instruction-Finetuned Language Models (Flan-Palm)
10/2022 Towards Better Few-Shot and Finetuning Performance with Forgetful Causal Language Models
10/2022 Large Language Models Can Self-Improve
11/2022 Efficiently Scaling Transformer Inference
11/2022 Fast Inference from Transformers via Speculative Decoding
02/2023 Symbolic Discovery of Optimization Algorithms (Lion)
03/2023 PaLM-E: An Embodied Multimodal Language Model
04/2023 Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference
05/2023 Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
05/2023 FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction
05/2023 PaLM 2 Technical Report
05/2023 Symbol tuning improves in-context learning in language models
05/2023 Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models
05/2023 Towards Expert-Level Medical Question Answering with Large Language Models (Med-Palm 2)
05/2023 DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
05/2023 How Does Generative Retrieval Scale to Millions of Passages?
05/2023 GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoint
05/2023 Small Language Models Improve Giants by Rewriting Their Outputs
06/2023 StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners
06/2023 AudioPaLM: A Large Language Model That Can Speak and Listen
06/2023 Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting
07/2023 HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models
09/2023 Uncovering mesa-optimization algorithms in Transformers
10/2023 Think before you speak: Training Language Models With Pause Tokens
10/2023 SpecTr: Fast Speculative Decoding via Optimal Transport
11/2023 UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
11/2023 Automatic Engineering of Long Prompts
OpenAI Papers Blog
07/2017 Proximal Policy Optimization Algorithms
04/2019 Generating Long Sequences with Sparse Transformers
01/2020 Scaling Laws for Neural Language Models
05/2020 Language Models are Few-Shot Learners (GPT-3)
01/2022 Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
03/2022 Training language models to follow instructions with human feedback (InstructGPT)
07/2022 Efficient Training of Language Models to Fill in the Middle
03/2023 GPT-4 Technical Report
04/2023 Consistency Models
05/2023 Let's Verify Step by Step
10/2023 Improving Image Generation with Better Captions (DALL·E 3)
Deepmind (Google Deepmind as of 4/2023) Papers Blog
10/2019 Stabilizing Transformers for Reinforcement Learning
12/2021 Scaling Language Models: Methods, Analysis & Insights from Training Gopher
12/2021 Improving language models by retrieving from trillions of tokens (RETRO)
02/2022 Competition-Level Code Generation with AlphaCode
02/2022 Unified Scaling Laws for Routed Language Models
03/2022 Training Compute-Optimal Large Language Models (Chinchilla)
04/2022 Flamingo: a Visual Language Model for Few-Shot Learning
05/2022 A Generalist Agent (GATO)
07/2022 Formal Algorithms for Transformers
02/2023 Accelerating Large Language Model Decoding with Speculative Sampling
05/2023 Tree of Thoughts: Deliberate Problem Solving with Large Language Models
05/2023 Block-State Transformer
05/2023 Randomized Positional Encodings Boost Length Generalization of Transformers
08/2023 From Sparse to Soft Mixtures of Experts
09/2023 Large Language Models as Optimizers
09/2023 MADLAD-400: A Multilingual And Document-Level Large Audited Dataset (MT Model)
09/2023 Scaling Laws for Sparsely-Connected Foundation Models
09/2023 Language Modeling Is Compression
09/2023 Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution
10/2023 Large Language Models as Analogical Reasoners
10/2023 Controlled Decoding from Language Models
10/2023 A General Theoretical Paradigm to Understand Learning from Human Preferences
Meta (Facebook AI Research) Papers Blog
04/2019 fairseq: A Fast, Extensible Toolkit for Sequence Modeling
07/2019 Augmenting Self-attention with Persistent Memory
11/2019 Improving Transformer Models by Reordering their Sublayers
08/2021 Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
05/2022 OPT: Open Pre-trained Transformer Language Models
07/2022 Beyond neural scaling laws: beating power law scaling via data pruning
11/2022 Galactica: A Large Language Model for Science
01/2023 Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture (I-JEPA)
02/2023 LLaMA: Open and Efficient Foundation Language Models
02/2023 Toolformer: Language Models Can Teach Themselves to Use Tools
03/2023 Scaling Expert Language Models with Unsupervised Domain Discovery
03/2023 SemDeDup: Data-efficient learning at web-scale through semantic deduplication
04/2023 Segment Anything (SAM)
04/2023 A Cookbook of Self-Supervised Learning
05/2023 Learning to Reason and Memorize with Self-Notes
05/2023 ImageBind: One Embedding Space To Bind Them All
05/2023 MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
05/2023 LIMA: Less Is More for Alignment
05/2023 Scaling Speech Technology to 1,000+ Languages
05/2023 READ: Recurrent Adaptation of Large Transformers
05/2023 LLM-QAT: Data-Free Quantization Aware Training for Large Language Models
06/2023 Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
06/2023 Simple and Controllable Music Generation (MusicGen)
06/2023 Improving Open Language Models by Learning from Organic Interactions (BlenderBot 3x)
06/2023 Extending Context Window of Large Language Models via Positional Interpolation
06/2023 Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
07/2023 Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning (CM3leon)
07/2023 Llama 2: Open Foundation and Fine-Tuned Chat Models
08/2023 SeamlessM4T—Massively Multilingual & Multimodal Machine Translation
08/2023 D4: Improving LLM Pretraining via Document De-Duplication and Diversification
08/2023 Code Llama: Open Foundation Models for Code
08/2023 Nougat: Neural Optical Understanding for Academic Documents
09/2023 Contrastive Decoding Improves Reasoning in Large Language Models
09/2023 Effective Long-Context Scaling of Foundation Models
09/2023 AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
09/2023 Vision Transformers Need Registers
10/2023 RA-DIT: Retrieval-Augmented Dual Instruction Tuning
10/2023 Branch-Solve-Merge Improves Large Language Model Evaluation and Generation
10/2023 Generative Pre-training for Speech with Flow Matching
11/2023 Emu Edit: Precise Image Editing via Recognition and Generation Tasks
Microsoft Papers Blog
12/2015 Deep Residual Learning for Image Recognition
05/2021 EL-Attention: Memory Efficient Lossless Attention for Generation
01/2022 DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
03/2022 DeepNet: Scaling Transformers to 1,000 Layers
12/2022 A Length-Extrapolatable Transformer
01/2023 Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
02/2023 Language Is Not All You Need: Aligning Perception with Language Models (Kosmos-1)
03/2023 Sparks of Artificial General Intelligence: Early experiments with GPT-4
03/2023 TaskMatrix. AI: Completing Tasks by Connecting Foundation Models with Millions of APIs
04/2023 Instruction Tuning with GPT-4
04/2023 Inference with Reference: Lossless Acceleration of Large Language Models
04/2023 Low-code LLM: Visual Programming over LLMs
04/2023 WizardLM: Empowering Large Language Models to Follow Complex Instructions
04/2023 MLCopilot: Unleashing the Power of Large Language Models in Solving Machine Learning Tasks
04/2023 ResiDual: Transformer with Dual Residual Connections
05/2023 Code Execution with Pre-trained Language Models
05/2023 Small Models are Valuable Plug-ins for Large Language Models
05/2023 CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing
06/2023 Orca: Progressive Learning from Complex Explanation Traces of GPT-4
06/2023 Augmenting Language Models with Long-Term Memory
06/2023 WizardCoder: Empowering Code Large Language Models with Evol-Instruct
06/2023 Textbooks Are All You Need (phi-1)
07/2023 In-context Autoencoder for Context Compression in a Large Language Model
07/2023 Retentive Network: A Successor to Transformer for Large Language Models
08/2023 Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference
09/2023 Efficient RLHF: Reducing the Memory Usage of PPO
09/2023 DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
09/2023 Textbooks Are All You Need II (phi-1.5)
09/2023 PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training
09/2023 A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models
09/2023 Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models
10/2023 Sparse Backpropagation for MoE Training
10/2023 Nugget 2D: Dynamic Contextual Compression for Scaling Decoder-only Language Models
10/2023 Mixture of Quantized Experts (MoQE): Complementary Effect of Low-bit Quantization and Robustness
10/2023 Augmented Embeddings for Custom Retrievals
10/2023 Guiding Language Model Reasoning with Planning Tokens
10/2023 Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
10/2023 CodeFusion: A Pre-trained Diffusion Model for Code Generation
10/2023 LoRAShear: Efficient Large Language Model Structured Pruning and Knowledge Recovery
10/2023 FP8-LM: Training FP8 Large Language Models
11/2023 Orca 2: Teaching Small Language Models How to Reason
Hazy Research (Stanford) Papers Blog
10/2021 Efficiently Modeling Long Sequences with Structured State Spaces (S4)
04/2022 Monarch: Expressive Structured Matrices for Efficient and Accurate Training
05/2022 FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
12/2022 Hungry Hungry Hippos: Towards Language Modeling with State Space Models
02/2023 Simple Hardware-Efficient Long Convolutions for Sequence Modeling
02/2023 Hyena Hierarchy: Towards Larger Convolutional Language Models
06/2023 TART: A plug-and-play Transformer module for task-agnostic reasoning
07/2023 FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
11/2023 FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
THUDM (Tsinghua University) Papers Github
10/2022 GLM-130B: An Open Bilingual Pre-Trained Model
03/2023 CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X
04/2023 DoctorGLM: Fine-tuning your Chinese Doctor is not a Herculean Task
06/2023 WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human Preferences
09/2023 GPT Can Solve Mathematical Problems Without a Calculator (MathGLM)
10/2023 AgentTuning: Enabling Generalized Agent Abilities for LLMs (AgentLM)
11/2023 CogVLM: Visual Expert for Pretrained Language Models
Open Models
06/2021 GPT-J-6B: 6B JAX-Based Transformer
09/2021 Pythia: A Customizable Hardware Prefetching Framework Using Online Reinforcement Learning
03/2022 CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis
04/2022 GPT-NeoX-20B: An Open-Source Autoregressive Language Model
11/2022 BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
04/2023 Visual Instruction Tuning (LLaVA)
05/2023 StarCoder: May the source be with you!
05/2023 CodeGen2: Lessons for Training LLMs on Programming and Natural Languages
05/2023 Otter: A Multi-Modal Model with In-Context Instruction Tuning
05/2023 InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
05/2023 CodeT5+: Open Code Large Language Models for Code Understanding and Generation
05/2023 ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
05/2023 RWKV: Reinventing RNNs for the Transformer Era
05/2023 Lion: Adversarial Distillation of Closed-Source Large Language Model
05/2023 MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training
06/2023 Segment Anything in High Quality
06/2023 Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
06/2023 High-Fidelity Audio Compression with Improved RVQGAN
06/2023 StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
06/2023 Anticipatory Music Transformer
06/2023 RepoFusion: Training Code Models to Understand Your Repository
06/2023 MPT-30B: Raising the bar for open-source foundation models
06/2023 Vec2Vec: A Compact Neural Network Approach for Transforming Text Embeddings with High Fidelity
06/2023 ViNT: A Foundation Model for Visual Navigation
06/2023 How Long Can Open-Source LLMs Truly Promise on Context Length? (LongChat)
07/2023 Hierarchical Open-vocabulary Universal Image Segmentation
07/2023 Focused Transformer: Contrastive Training for Context Scaling (LongLLaMA
07/2023 Rhythm Modeling for Voice Conversion (Urhythmic)
07/2023 Scaling TransNormer to 175 Billion Parameters
08/2023 Separate Anything You Describe
08/2023 StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data
09/2023 RADIO: Reference-Agnostic Dubbing Video Synthesis
09/2023 Matcha-TTS: A fast TTS architecture with conditional flow matching
09/2023 DreamLLM: Synergistic Multimodal Comprehension and Creation
09/2023 Baichuan 2: Open Large-scale Language Models
09/2023 Qwen Technical Report
09/2023 Mistral 7B
10/2023 MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning
10/2023 Improved Baselines with Visual Instruction Tuning (LLaVA 1.5)
10/2023 LLark: A Multimodal Foundation Model for Music
10/2023 SALMONN: Towards Generic Hearing Abilities for Large Language Models
10/2023 Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents
11/2023 Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
11/2023 UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
11/2023 YUAN 2.0: A Large Language Model with Localized Filtering-based Attention
09/2014 Neural Machine Translation by Jointly Learning to Align and Translate
06/2019 Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View
10/2019 Root Mean Square Layer Normalization
10/2019 Transformers without Tears: Improving the Normalization of Self-Attention
12/2019 Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection
02/2020 On Layer Normalization in the Transformer Architecture
04/2020 Longformer: The Long-Document Transformer
06/2020 Memory Transformer
07/2020 Mirostat: A Neural Text Decoding Algorithm that Directly Controls Perplexity
12/2020 ERNIE-Doc: A Retrospective Long-Document Modeling Transformer
01/2021 Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks
03/2021 The Low-Rank Simplicity Bias in Deep Networks
04/2021 RoFormer: Enhanced Transformer with Rotary Position Embedding
06/2021 LoRA: Low-Rank Adaptation of Large Language Models
07/2023 CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention
03/2022 Memorizing Transformers
04/2022 UL2: Unifying Language Learning Paradigms
05/2022 Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning (IA3)
06/2022 nuQmm: Quantized MatMul for Efficient Inference of Large-Scale Generative Language Models
07/2022 Language Models (Mostly) Know What They Know
08/2022 LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
09/2022 Petals: Collaborative Inference and Fine-tuning of Large Models
10/2022 GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
10/2022 Truncation Sampling as Language Model Desmoothing
10/2022 DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation
11/2022 An Algorithm for Routing Vectors in Sequences
12/2022 Self-Instruct: Aligning Language Model with Self Generated Instructions
12/2022 Parallel Context Windows Improve In-Context Learning of Large Language Models
12/2022 Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor
12/2022 Pretraining Without Attention
12/2022 The case for 4-bit precision: k-bit Inference Scaling Laws
12/2022 Prompting Is Programming: A Query Language for Large Language Models
01/2023 SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
01/2023 SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
01/2023 Memory Augmented Large Language Models are Computationally Universal
01/2023 Progress measures for grokking via mechanistic interpretability
02/2023 Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models
02/2023 The Wisdom of Hindsight Makes Language Models Better Instruction Followers
02/2023 End-to-End Deep Learning Framework for Real-Time Inertial Attitude Estimation using 6DoF IMU
02/2023 The Stable Entropy Hypothesis and Entropy-Aware Decoding: An Analysis and Algorithm for Robust Natural Language Generation
03/2023 COLT5: Faster Long-Range Transformers with Conditional Computation
03/2023 High-throughput Generative Inference of Large Language Models with a Single GPU
03/2023 Meet in the Middle: A New Pre-training Paradigm
03/2023 Reflexion: an autonomous agent with dynamic memory and self-reflection
03/2023 Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning
03/2023 FP8 versus INT8 for efficient deep learning inference
03/2023 Self-Refine: Iterative Refinement with Self-Feedback
04/2023 RPTQ: Reorder-based Post-training Quantization for Large Language Models
04/2023 REFINER: Reasoning Feedback on Intermediate Representations
04/2023 Generative Agents: Interactive Simulacra of Human Behavior
04/2023 Compressed Regression over Adaptive Networks
04/2023 A Cheaper and Better Diffusion Language Model with Soft-Masked Noise
04/2023 RRHF: Rank Responses to Align Language Models with Human Feedback without tears
04/2023 CAMEL: Communicative Agents for "Mind" Exploration of Large Scale Language Model Society
04/2023 Automatic Gradient Descent: Deep Learning without Hyperparameters
04/2023 SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models
04/2023 Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study
04/2023 Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling
04/2023 Scaling Transformer to 1M tokens and beyond with RMT
04/2023 Answering Questions by Meta-Reasoning over Multiple Chains of Thought
04/2023 Towards Multi-Modal DBMSs for Seamless Querying of Texts and Tables
04/2023 We're Afraid Language Models Aren't Modeling Ambiguity
04/2023 The Internal State of an LLM Knows When its Lying
04/2023 Search-in-the-Chain: Towards the Accurate, Credible and Traceable Content Generation for Complex Knowledge-intensive Tasks
05/2023 Towards Unbiased Training in Federated Open-world Semi-supervised Learning
05/2023 Unlimiformer: Long-Range Transformers with Unlimited Length Input
05/2023 FreeLM: Fine-Tuning-Free Language Model
05/2023 Cuttlefish: Low-rank Model Training without All The Tuning
05/2023 AttentionViz: A Global View of Transformer Attention
05/2023 Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models
05/2023 A Frustratingly Easy Improvement for Position Embeddings via Random Padding
05/2023 Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
05/2023 Explanation-based Finetuning Makes Models More Robust to Spurious Cues
05/2023 An automatically discovered chain-of-thought prompt generalizes to novel models and datasets
05/2023 Recommender Systems with Generative Retrieval
05/2023 Fast Distributed Inference Serving for Large Language Models
05/2023 Chain-of-Dictionary Prompting Elicits Translation in Large Language Models
05/2023 Recommendation as Instruction Following: A Large Language Model Empowered Recommendation Approach
05/2023 Active Retrieval Augmented Generation
05/2023 Scalable Coupling of Deep Learning with Logical Reasoning
05/2023 Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
05/2023 StructGPT: A General Framework for Large Language Model to Reason over Structured Data
05/2023 Pre-Training to Learn in Context
05/2023 ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings
05/2023 Accelerating Transformer Inference for Translation via Parallel Decoding
05/2023 Cooperation Is All You Need
05/2023 PTQD: Accurate Post-Training Quantization for Diffusion Models
05/2023 LLM-Pruner: On the Structural Pruning of Large Language Models
05/2023 SelfzCoT: a Self-Prompt Zero-shot CoT from Semantic-level to Code-level for a Better Utilization of LLMs
05/2023 QLoRA: Efficient Finetuning of Quantized LLMs
05/2023 "According to ..." Prompting Language Models Improves Quoting from Pre-Training Data
05/2023 Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training
05/2023 Landmark Attention: Random-Access Infinite Context Length for Transformers
05/2023 Scaling Data-Constrained Language Models
05/2023 Fine-Tuning Language Models with Just Forward Passes
05/2023 Intriguing Properties of Quantization at Scale
05/2023 Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time
05/2023 Blockwise Parallel Transformer for Long Context Large Models
05/2023 The Impact of Positional Encoding on Length Generalization in Transformers
05/2023 Adapting Language Models to Compress Contexts
05/2023 Direct Preference Optimization: Your Language Model is Secretly a Reward Model
06/2023 AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
06/2023 Faster Causal Attention Over Large Sequences Through Sparse Flash Attention
06/2023 Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
06/2023 SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression
06/2023 Fine-Tuning Language Models with Advantage-Induced Policy Alignment
06/2023 Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards
06/2023 Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
06/2023 Mixture-of-Domain-Adapters: Decoupling and Injecting Domain Knowledge to Pre-trained Language Models Memories
06/2023 Boosting Fast and High-Quality Speech Synthesis with Linear Diffusion
06/2023 Word sense extension
06/2023 Mitigating Transformer Overconfidence via Lipschitz Regularization
06/2023 Recurrent Attention Networks for Long-text Modeling
06/2023 One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning
06/2023 SqueezeLLM: Dense-and-Sparse Quantization
06/2023 Tune As You Scale: Hyperparameter Optimization For Compute Efficient Training
06/2023 Propagating Knowledge Updates to LMs Through Distillation
06/2023 Full Parameter Fine-tuning for Large Language Models with Limited Resources
06/2023 A Simple and Effective Pruning Approach for Large Language Models
06/2023 InRank: Incremental Low-Rank Learning
06/2023 Evaluating the Zero-shot Robustness of Instruction-tuned Language Models
06/2023 Learning to Generate Better Than Your LLM (RLGF)
06/2023 Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing
06/2023 H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Model
06/2023 FLuRKA: Fast fused Low-Rank & Kernel Attention
06/2023 Stay on topic with Classifier-Free Guidance
07/2023 AutoST: Training-free Neural Architecture Search for Spiking Transformers
07/2023 Single Sequence Prediction over Reasoning Graphs for Multi-hop QA
07/2023 Shifting Attention to Relevance: Towards the Uncertainty Estimation of Large Language Models
07/2023 Facing off World Model Backbones: RNNs, Transformers, and S4
07/2023 Improving Retrieval-Augmented Large Language Models via Data Importance Learning
07/2023 Teaching Arithmetic to Small Transformers
07/2023 QIGen: Generating Efficient Kernels for Quantized Inference on Large Language Models
07/2023 Stack More Layers Differently: High-Rank Training Through Low-Rank Updates
07/2023 Copy Is All You Need (CoG)
07/2023 Multi-Method Self-Training: Improving Code Generation With Text, And Vice Versa
07/2023 Divide & Bind Your Attention for Improved Generative Semantic Nursing
07/2023 Challenges and Applications of Large Language Models
07/2023 Soft Prompt Tuning for Augmenting Dense Retrieval with Large Language Models
07/2023 QuIP: 2-Bit Quantization of Large Language Models With Guarantees
07/2023 CoRe Optimizer: An All-in-One Solution for Machine Learning
07/2023 Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
08/2023 ReCLIP: Refine Contrastive Language Image Pre-Training with Source Free Domain Adaptation
08/2023 EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language Models
08/2023 Activation Addition: Steering Language Models Without Optimization
08/2023 OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models
08/2023 Accelerating LLM Inference with Staged Speculative Decoding
08/2023 YaRN: Efficient Context Window Extension of Large Language Models
08/2023 LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models
09/2023 Making Large Language Models Better Reasoners with Alignment
09/2023 Data-Juicer: A One-Stop Data Processing System for Large Language Models
09/2023 Delta-LoRA: Fine-Tuning High-Rank Parameters with the Delta of Low-Rank Matrices
09/2023 SLiMe: Segment Like Me
09/2023 Norm Tweaking: High-performance Low-bit Quantization of Large Language Models
09/2023 When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale
09/2023 Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
09/2023 Efficient Memory Management for Large Language Model Serving with PagedAttention
09/2023 Cure the headache of Transformers via Collinear Constrained Attention
09/2023 Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
09/2023 LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
09/2023 MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation
09/2023 Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models
09/2023 Improving Code Generation by Dynamic Temperature Sampling
09/2023 Efficient Streaming Language Models with Attention Sinks
10/2023 DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models
10/2023 GrowLength: Accelerating LLMs Pretraining by Progressively Growing Training Length
10/2023 Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image Diffusion Models
10/2023 Elephant Neural Networks: Born to Be a Continual Learner
10/2023 Ring Attention with Blockwise Transformers for Near-Infinite Context
10/2023 Retrieval meets Long Context Large Language Models
10/2023 DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
10/2023 LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers
10/2023 Amortizing intractable inference in large language models (GFlowNet Tuning)
10/2023 SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF
10/2023 Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity
10/2023 Let Models Speak Ciphers: Multiagent Debate through Embeddings
10/2023 InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining
10/2023 CacheGen: Fast Context Loading for Language Model Applications
10/2023 MatFormer: Nested Transformer for Elastic Inference
10/2023 LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
10/2023 Towards End-to-end 4-Bit Inference on Generative Large Language Models (QUIK)
10/2023 Microscaling Data Formats for Deep Learning
10/2023 xVal: A Continuous Number Encoding for Large Language Models
10/2023 An Emulator for Fine-Tuning Large Language Models using Small Language Models
10/2023 Frozen Transformers in Language Models Are Effective Visual Encoder Layers
10/2023 LoBaSS: Gauging Learnability in Supervised Fine-tuning Data
10/2023 Quality-Diversity through AI Feedback
10/2023 DoGE: Domain Reweighting with Generalization Estimation
10/2023 E-Sparse: Boosting the Large Language Model Inference through Entropy-based N:M Sparsity
10/2023 Mixture of Tokens: Efficient LLMs through Cross-Example Aggregation
10/2023 Personas as a Way to Model Truthfulness in Language Models
10/2023 Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
11/2023 AWEQ: Post-Training Quantization with Activation-Weight Equalization for Large Language Models
11/2023 FlashDecoding++: Faster Large Language Model Inference on GPUs
11/2023 Divergent Token Metrics: Measuring degradation to prune away LLM components -- and optimize quantization
11/2023 Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs
11/2023 REST: Retrieval-Based Speculative Decoding
11/2023 DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines
11/2023 Token-level Adaptation of LoRA Adapters for Downstream Task Generalization
11/2023 Exponentially Faster Language Modelling
11/2023 MultiLoRA: Democratizing LoRA for Better Multi-Task Learning
11/2023 LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model Finetuning
11/2023 Token Recycling for Efficient Sequential Inference with Vision Transformers
11/2023 Enabling Fast 2-bit LLM on GPUs: Memory Alignment, Sparse Outlier, and Asynchronous Dequantization
03/2019 Rich Sutton - The Bitter Lesson
06/2022 Yann LeCun - A Path Towards Autonomous Machine Intelligence
01/2023 Lilian Weng - The Transformer Family Version 2.0
01/2023 Lilian Weng - Large Transformer Model Inference Optimization
03/2023 Stanford - Alpaca: A Strong, Replicable Instruction-Following Model
05/2023 OpenAI - Language models can explain neurons in language models
05/2023 Alex Turner - Steering GPT-2-XL by adding an activation vector
06/2023 YyWang - Do We Really Need the KVCache for All Large Language Models
06/2023 kaiokendev - Extending Context is Hard…but not Impossible
06/2023 bloc97 - NTK-Aware Scaled RoPE
07/2023 oobabooga - A direct comparison between llama.cpp, AutoGPTQ, ExLlama, and transformers perplexities
07/2023 Jianlin Su - Carrying the beta position to the end (better NTK RoPe method)
09/2023 FasterDecoding - Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
10/2023 Tri Dao - Flash-Decoding for Long-Context Inference
10/2023 Evan Armstrong - Human-Sourced, AI-Augmented: a promising solution for open source conversational data
11/2023 LMSYS - Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Pub: 21 Mar 2023 02:18 UTC
Edit: 30 Nov 2023 13:07 UTC
Views: 9074