This is just a backup from localmodelslinks, it's based from this post: https://boards.4channel.org/g/thread/93323225#p93328754 It will not be updated. NEW UPDATE, anon who made the original Local Models Related Papers Rentry brought it back up just now.

Local Models Related Papers

/lmg/ Accelerate
Google Papers Blog
12/2017 Attention Is All You Need (Transformers)
10/2018 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
11/2019 Fast Transformer Decoding: One Write-Head is All You Need
02/2020 GLU Variants Improve Transformer
09/2020 Efficient Transformers: A Survey
01/2021 Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
09/2021 Finetuned Language Models Are Zero-Shot Learners (Flan)
11/2021 Sparse is Enough in Scaling Transformers
12/2021 GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
01/2022 LaMDA: Language Models for Dialog Applications
01/2022 Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
04/2022 PaLM: Scaling Language Modeling with Pathways
10/2022 Scaling Instruction-Finetuned Language Models (Flan-Palm)
10/2022 Large Language Models Can Self-Improve
11/2022 Efficiently Scaling Transformer Inference
03/2023 PaLM-E: An Embodied Multimodal Language Model
04/2023 Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference
05/2023 Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
05/2023 FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction
OpenAI Papers Blog
04/2019 Generating Long Sequences with Sparse Transformers
01/2020 Scaling Laws for Neural Language Models
05/2020 Language Models are Few-Shot Learners (GPT-3)
01/2022 Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
03/2022 Training language models to follow instructions with human feedback (InstructGPT)
07/2022 Efficient Training of Language Models to Fill in the Middle
03/2023 GPT-4 Technical Report
04/2023 Consistency Models
Deepmind Papers Blog
12/2021 Scaling Language Models: Methods, Analysis & Insights from Training Gopher
12/2021 Improving language models by retrieving from trillions of tokens(RETRO)
02/2022 Competition-Level Code Generation with AlphaCode
02/2022 Unified Scaling Laws for Routed Language Models
03/2022 Training Compute-Optimal Large Language Models (Chinchilla)
04/2022 Flamingo: a Visual Language Model for Few-Shot Learning
05/2022 A Generalist Agent (GATO)
07/2022 Formal Algorithms for Transformers
Meta Papers Blog
04/2019 fairseq: A Fast, Extensible Toolkit for Sequence Modeling
08/2021 Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
05/2022 OPT: Open Pre-trained Transformer Language Models
11/2022 Galactica: A Large Language Model for Science
02/2023 LLaMA: Open and Efficient Foundation Language Models
02/2023 Toolformer: Language Models Can Teach Themselves to Use Tools
03/2023 Scaling Expert Language Models with Unsupervised Domain Discovery
03/2023 SemDeDup: Data-efficient learning at web-scale through semantic deduplication
04/2023 Segment Anything
04/2023 A Cookbook of Self-Supervised Learning
05/2023 Learning to Reason and Memorize with Self-Notes
Microsoft Papers Blog
01/2022 DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
03/2022 DeepNet: Scaling Transformers to 1,000 Layers
01/2023 Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
02/2023 Language Is Not All You Need: Aligning Perception with Language Models (Kosmos-1)
03/2023 Sparks of Artificial General Intelligence: Early experiments with GPT-4
03/2023 TaskMatrix. AI: Completing Tasks by Connecting Foundation Models with Millions of APIs
04/2023 Instruction Tuning with GPT-4
04/2023 Inference with Reference: Lossless Acceleration of Large Language Models
04/2023 Low-code LLM: Visual Programming over LLMs
04/2023 WizardLM: Empowering Large Language Models to Follow Complex Instructions
04/2023 MLCopilot: Unleashing the Power of Large Language Models in Solving Machine Learning Tasks
04/2023 ResiDual: Transformer with Dual Residual Connections
Anthropic Papers Blog
06/2022 Softmax Linear Units
07/2022 Language Models (Mostly) Know What They Know
12/2022 Constitutional AI: Harmlessness from AI Feedback (Claude)
Hazy Research (Stanford) Papers Blog
10/2021 Efficiently Modeling Long Sequences with Structured State Spaces (S4)
04/2022 Monarch: Expressive Structured Matrices for Efficient and Accurate Training
05/2022 FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
12/2022 Hungry Hungry Hippos: Towards Language Modeling with State Space Models
02/2023 Simple Hardware-Efficient Long Convolutions for Sequence Modeling
02/2023 Hyena Hierarchy: Towards Larger Convolutional Language Models
THUDM (Tsinghua University) Papers Github
10/2022 GLM-130B: An Open Bilingual Pre-Trained Model
03/2023 CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X
04/2023 DoctorGLM: Fine-tuning your Chinese Doctor is not a Herculean Task
Open Models
06/2021 GPT-J-6B: 6B JAX-Based Transformer
09/2021 Pythia: A Customizable Hardware Prefetching Framework Using Online Reinforcement Learning
03/2022 CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis
04/2022 GPT-NeoX-20B: An Open-Source Autoregressive Language Model
11/2022 BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
04/2023 Visual Instruction Tuning (LLaVA)
05/2023 StarCoder: May the source be with you!
05/2023 CodeGen2: Lessons for Training LLMs on Programming and Natural Languages
05/2023 MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs
05/2023 Otter: A Multi-Modal Model with In-Context Instruction Tuning
Surveys
02/2023 A Survey on Efficient Training of Transformers
02/2023 Transformer models: an introduction and catalog
02/2023 A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT
03/2023 A Survey of Large Language Models
04/2023 On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Various
09/2014 Neural Machine Translation by Jointly Learning to Align and Translate
10/2019 Root Mean Square Layer Normalization
01/2021 Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks
03/2021 The Low-Rank Simplicity Bias in Deep Networks
06/2021 LoRA: Low-Rank Adaptation of Large Language Models
03/2022 Memorizing Transformers
04/2022 UL2: Unifying Language Learning Paradigms
06/2022 nuQmm: Quantized MatMul for Efficient Inference of Large-Scale Generative Language Models
08/2022 LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
09/2022 Petals: Collaborative Inference and Fine-tuning of Large Models
10/2022 GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
10/2022 DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation
11/2022 An Algorithm for Routing Vectors in Sequences
12/2022 Self-Instruct: Aligning Language Model with Self Generated Instructions
12/2022 Parallel Context Windows Improve In-Context Learning of Large Language Models
12/2022 Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor
12/2022 Pretraining Without Attention
12/2022 The case for 4-bit precision: k-bit Inference Scaling Laws
12/2022 Prompting Is Programming: A Query Language for Large Language Models
01/2023 SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
01/2023 SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
01/2023 Memory Augmented Large Language Models are Computationally Universal
02/2023 Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models
02/2023 The Wisdom of Hindsight Makes Language Models Better Instruction Followers
03/2023 COLT5: Faster Long-Range Transformers with Conditional Computation
03/2023 High-throughput Generative Inference of Large Language Models with a Single GPU
03/2023 Meet in the Middle: A New Pre-training Paradigm
03/2023 Reflexion: an autonomous agent with dynamic memory and self-reflection
03/2023 Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning
03/2023 FP8 versus INT8 for efficient deep learning inference
03/2023 Self-Refine: Iterative Refinement with Self-Feedback
04/2023 RPTQ: Reorder-based Post-training Quantization for Large Language Models
04/2023 REFINER: Reasoning Feedback on Intermediate Representations
04/2023 Generative Agents: Interactive Simulacra of Human Behavior
04/2023 Compressed Regression over Adaptive Networks
04/2023 A Cheaper and Better Diffusion Language Model with Soft-Masked Noise
04/2023 RRHF: Rank Responses to Align Language Models with Human Feedback without tears
04/2023 CAMEL: Communicative Agents for "Mind" Exploration of Large Scale Language Model Society
04/2023 Automatic Gradient Descent: Deep Learning without Hyperparameters
04/2023 SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models
04/2023 Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study
04/2023 Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling
04/2023 Scaling Transformer to 1M tokens and beyond with RMT
04/2023 Answering Questions by Meta-Reasoning over Multiple Chains of Thought
04/2023 Towards Multi-Modal DBMSs for Seamless Querying of Texts and Tables
04/2023 We're Afraid Language Models Aren't Modeling Ambiguity
04/2023 The Internal State of an LLM Knows When its Lying
04/2023 Search-in-the-Chain: Towards the Accurate, Credible and Traceable Content Generation for Complex Knowledge-intensive Tasks
05/2023 Towards Unbiased Training in Federated Open-world Semi-supervised Learning
05/2023 Unlimiformer: Long-Range Transformers with Unlimited Length Input
05/2023 FreeLM: Fine-Tuning-Free Language Model
05/2023 Cuttlefish: Low-rank Model Training without All The Tuning
05/2023 AttentionViz: A Global View of Transformer Attention
Articles
03/2019 Rich Sutton - The Bitter Lesson
04/2021 EleutherAI - Rotary Embeddings: A Relative Revolution
01/2023 Lilian Weng - The Transformer Family Version 2.0
01/2023 Lilian Weng - Large Transformer Model Inference Optimization
01/2023 Semianalysis - overview of OpenAI Triton And PyTorch 2.0
03/2023 Stanford - Alpaca: A Strong, Replicable Instruction-Following Model
04/2023 Yohei Nakajima - AsymmeTrix: Asymmetric Vector Embeddings for Directional Similarity Search
Edit
Pub: 12 May 2023 17:39 UTC
Edit: 13 May 2023 02:54 UTC
Views: 671