Local Models Related Papers

This is just a backup from localmodelslinks, it's based from this post: https://boards.4channel.org/g/thread/93323225#p93328754 It will not be updated. NEW UPDATE, anon who made the original Local Models Related Papers Rentry brought it back up just now.

/lmg/	Accelerate
Google	Papers Blog
12/2017	Attention Is All You Need (Transformers)
10/2018	BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
11/2019	Fast Transformer Decoding: One Write-Head is All You Need
02/2020	GLU Variants Improve Transformer
09/2020	Efficient Transformers: A Survey
01/2021	Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
09/2021	Finetuned Language Models Are Zero-Shot Learners (Flan)
11/2021	Sparse is Enough in Scaling Transformers
12/2021	GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
01/2022	LaMDA: Language Models for Dialog Applications
01/2022	Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
04/2022	PaLM: Scaling Language Modeling with Pathways
10/2022	Scaling Instruction-Finetuned Language Models (Flan-Palm)
10/2022	Large Language Models Can Self-Improve
11/2022	Efficiently Scaling Transformer Inference
03/2023	PaLM-E: An Embodied Multimodal Language Model
04/2023	Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference
05/2023	Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
05/2023	FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction

OpenAI	Papers Blog
04/2019	Generating Long Sequences with Sparse Transformers
01/2020	Scaling Laws for Neural Language Models
05/2020	Language Models are Few-Shot Learners (GPT-3)
01/2022	Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
03/2022	Training language models to follow instructions with human feedback (InstructGPT)
07/2022	Efficient Training of Language Models to Fill in the Middle
03/2023	GPT-4 Technical Report
04/2023	Consistency Models

Deepmind	Papers Blog
12/2021	Scaling Language Models: Methods, Analysis & Insights from Training Gopher
12/2021	Improving language models by retrieving from trillions of tokens(RETRO)
02/2022	Competition-Level Code Generation with AlphaCode
02/2022	Unified Scaling Laws for Routed Language Models
03/2022	Training Compute-Optimal Large Language Models (Chinchilla)
04/2022	Flamingo: a Visual Language Model for Few-Shot Learning
05/2022	A Generalist Agent (GATO)
07/2022	Formal Algorithms for Transformers

Meta	Papers Blog
04/2019	fairseq: A Fast, Extensible Toolkit for Sequence Modeling
08/2021	Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
05/2022	OPT: Open Pre-trained Transformer Language Models
11/2022	Galactica: A Large Language Model for Science
02/2023	LLaMA: Open and Efficient Foundation Language Models
02/2023	Toolformer: Language Models Can Teach Themselves to Use Tools
03/2023	Scaling Expert Language Models with Unsupervised Domain Discovery
03/2023	SemDeDup: Data-efficient learning at web-scale through semantic deduplication
04/2023	Segment Anything
04/2023	A Cookbook of Self-Supervised Learning
05/2023	Learning to Reason and Memorize with Self-Notes

Microsoft	Papers Blog
01/2022	DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
03/2022	DeepNet: Scaling Transformers to 1,000 Layers
01/2023	Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
02/2023	Language Is Not All You Need: Aligning Perception with Language Models (Kosmos-1)
03/2023	Sparks of Artificial General Intelligence: Early experiments with GPT-4
03/2023	TaskMatrix. AI: Completing Tasks by Connecting Foundation Models with Millions of APIs
04/2023	Instruction Tuning with GPT-4
04/2023	Inference with Reference: Lossless Acceleration of Large Language Models
04/2023	Low-code LLM: Visual Programming over LLMs
04/2023	WizardLM: Empowering Large Language Models to Follow Complex Instructions
04/2023	MLCopilot: Unleashing the Power of Large Language Models in Solving Machine Learning Tasks
04/2023	ResiDual: Transformer with Dual Residual Connections

Anthropic	Papers Blog
06/2022	Softmax Linear Units
07/2022	Language Models (Mostly) Know What They Know
12/2022	Constitutional AI: Harmlessness from AI Feedback (Claude)

Hazy Research (Stanford)	Papers Blog
10/2021	Efficiently Modeling Long Sequences with Structured State Spaces (S4)
04/2022	Monarch: Expressive Structured Matrices for Efficient and Accurate Training
05/2022	FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
12/2022	Hungry Hungry Hippos: Towards Language Modeling with State Space Models
02/2023	Simple Hardware-Efficient Long Convolutions for Sequence Modeling
02/2023	Hyena Hierarchy: Towards Larger Convolutional Language Models

THUDM (Tsinghua University)	Papers Github
10/2022	GLM-130B: An Open Bilingual Pre-Trained Model
03/2023	CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X
04/2023	DoctorGLM: Fine-tuning your Chinese Doctor is not a Herculean Task

Open Models
06/2021	GPT-J-6B: 6B JAX-Based Transformer
09/2021	Pythia: A Customizable Hardware Prefetching Framework Using Online Reinforcement Learning
03/2022	CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis
04/2022	GPT-NeoX-20B: An Open-Source Autoregressive Language Model
11/2022	BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
04/2023	Visual Instruction Tuning (LLaVA)
05/2023	StarCoder: May the source be with you!
05/2023	CodeGen2: Lessons for Training LLMs on Programming and Natural Languages
05/2023	MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs
05/2023	Otter: A Multi-Modal Model with In-Context Instruction Tuning

Surveys
02/2023	A Survey on Efficient Training of Transformers
02/2023	Transformer models: an introduction and catalog
02/2023	A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT
03/2023	A Survey of Large Language Models
04/2023	On Efficient Training of Large-Scale Deep Learning Models: A Literature Review

Various
09/2014	Neural Machine Translation by Jointly Learning to Align and Translate
10/2019	Root Mean Square Layer Normalization
01/2021	Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks
03/2021	The Low-Rank Simplicity Bias in Deep Networks
06/2021	LoRA: Low-Rank Adaptation of Large Language Models
03/2022	Memorizing Transformers
04/2022	UL2: Unifying Language Learning Paradigms
06/2022	nuQmm: Quantized MatMul for Efficient Inference of Large-Scale Generative Language Models
08/2022	LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
09/2022	Petals: Collaborative Inference and Fine-tuning of Large Models
10/2022	GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
10/2022	DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation
11/2022	An Algorithm for Routing Vectors in Sequences
12/2022	Self-Instruct: Aligning Language Model with Self Generated Instructions
12/2022	Parallel Context Windows Improve In-Context Learning of Large Language Models
12/2022	Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor
12/2022	Pretraining Without Attention
12/2022	The case for 4-bit precision: k-bit Inference Scaling Laws
12/2022	Prompting Is Programming: A Query Language for Large Language Models
01/2023	SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
01/2023	SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
01/2023	Memory Augmented Large Language Models are Computationally Universal
02/2023	Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models
02/2023	The Wisdom of Hindsight Makes Language Models Better Instruction Followers
03/2023	COLT5: Faster Long-Range Transformers with Conditional Computation
03/2023	High-throughput Generative Inference of Large Language Models with a Single GPU
03/2023	Meet in the Middle: A New Pre-training Paradigm
03/2023	Reflexion: an autonomous agent with dynamic memory and self-reflection
03/2023	Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning
03/2023	FP8 versus INT8 for efficient deep learning inference
03/2023	Self-Refine: Iterative Refinement with Self-Feedback
04/2023	RPTQ: Reorder-based Post-training Quantization for Large Language Models
04/2023	REFINER: Reasoning Feedback on Intermediate Representations
04/2023	Generative Agents: Interactive Simulacra of Human Behavior
04/2023	Compressed Regression over Adaptive Networks
04/2023	A Cheaper and Better Diffusion Language Model with Soft-Masked Noise
04/2023	RRHF: Rank Responses to Align Language Models with Human Feedback without tears
04/2023	CAMEL: Communicative Agents for "Mind" Exploration of Large Scale Language Model Society
04/2023	Automatic Gradient Descent: Deep Learning without Hyperparameters
04/2023	SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models
04/2023	Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study
04/2023	Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling
04/2023	Scaling Transformer to 1M tokens and beyond with RMT
04/2023	Answering Questions by Meta-Reasoning over Multiple Chains of Thought
04/2023	Towards Multi-Modal DBMSs for Seamless Querying of Texts and Tables
04/2023	We're Afraid Language Models Aren't Modeling Ambiguity
04/2023	The Internal State of an LLM Knows When its Lying
04/2023	Search-in-the-Chain: Towards the Accurate, Credible and Traceable Content Generation for Complex Knowledge-intensive Tasks
05/2023	Towards Unbiased Training in Federated Open-world Semi-supervised Learning
05/2023	Unlimiformer: Long-Range Transformers with Unlimited Length Input
05/2023	FreeLM: Fine-Tuning-Free Language Model
05/2023	Cuttlefish: Low-rank Model Training without All The Tuning
05/2023	AttentionViz: A Global View of Transformer Attention

Articles
03/2019	Rich Sutton - The Bitter Lesson
04/2021	EleutherAI - Rotary Embeddings: A Relative Revolution
01/2023	Lilian Weng - The Transformer Family Version 2.0
01/2023	Lilian Weng - Large Transformer Model Inference Optimization
01/2023	Semianalysis - overview of OpenAI Triton And PyTorch 2.0
03/2023	Stanford - Alpaca: A Strong, Replicable Instruction-Following Model
04/2023	Yohei Nakajima - AsymmeTrix: Asymmetric Vector Embeddings for Directional Similarity Search