Archived News:

Date: (MM/DD/YYYY)	Description:
07/21/2025	Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights code released: https://github.com/jerryliang24/Drag-and-Drop-LLMs
07/21/2025	Qwen3-235B-A22B non-thinking mode update released: https://hf.co/Qwen/Qwen3-235B-A22B-Instruct-2507
07/18/2025	Lucy, deep research model based on Qwen3-1.7B, released: https://hf.co/Menlo/Lucy
07/18/2025	OpenReasoning-Nemotron released: https://hf.co/blog/nvidia/openreasoning-nemotron
07/17/2025	Seed-X translation models released: https://hf.co/collections/ByteDance-Seed/seed-x-6878753f2858bc17afa78543
07/17/2025	Support for Ernie 4.5 MoE merged: https://github.com/ggml-org/llama.cpp/pull/14658
07/16/2025	Support diffusion models: Add Dream 7B merged: https://github.com/ggml-org/llama.cpp/pull/14644
07/15/2025	Support for Kimi-K2 merged: https://github.com/ggml-org/llama.cpp/pull/14654
07/15/2025	Voxtral models for speech understanding released: https://mistral.ai/news/voxtral
07/15/2025	LG AI Research releases EXAONE 4.0: https://www.lgresearch.ai/blog/view?seq=576
07/11/2025	Kimi K2 1T-A32B released: https://moonshotai.github.io/Kimi-K2
07/11/2025	Granite 4.0 support merged: https://github.com/ggml-org/llama.cpp/pull/13550
07/10/2025	Devstral Small 1.1 released: https://hf.co/mistralai/Devstral-Small-2507
07/10/2025	Reka Flash 3.1 21B released: https://reka.ai/news/reinforcement-learning-for-reka-flash-3-1
07/09/2025	Phi-4-mini-flash-reasoning with hybrid SambaY architecture released: https://hf.co/microsoft/Phi-4-mini-flash-reasoning
07/09/2025	T5Gemma released: https://hf.co/collections/google/t5gemma-686ba262fe290b881d21ec86
07/09/2025	MedGemma-27B-it updated with vision: https://hf.co/google/medgemma-27b-it
07/09/2025	ZLUDA Version 5-preview.43 released: https://github.com/vosen/ZLUDA/releases/tag/v5-preview.43
07/09/2025	llama.cpp : support Jamba hybrid Transformer-Mamba models merged: https://github.com/ggml-org/llama.cpp/pull/7531
07/08/2025	SmolLM3: smol, multilingual, long-context reasoner: https://hf.co/blog/smollm3
07/08/2025	Hunyuan MoE support merged: https://github.com/ggml-org/llama.cpp/pull/14425
07/06/2025	Jamba 1.7 released: https://hf.co/collections/ai21labs/jamba-17-68653e9be386dc69b1f30828
07/04/2025	MLX adds support for Ernie 4.5 MoE: https://github.com/ml-explore/mlx-lm/pull/267
07/02/2025	DeepSWE-Preview 32B released: https://hf.co/agentica-org/DeepSWE-Preview
07/02/2025	llama.cpp : initial Mamba-2 support merged: https://github.com/ggml-org/llama.cpp/pull/9126
07/02/2025	GLM-4.1V-9B-Thinking released: https://hf.co/THUDM/GLM-4.1V-9B-Thinking
07/01/2025	Huawei Pangu Pro 72B-A16B released: https://gitcode.com/ascend-tribe/pangu-pro-moe-model
06/29/2025	ERNIE 4.5 released: https://ernie.baidu.com/blog/posts/ernie4.5
06/27/2025	VSCode Copilot Chat is now open source: https://github.com/microsoft/vscode-copilot-chat
06/27/2025	Hunyuan-A13B released: https://hf.co/tencent/Hunyuan-A13B-Instruct
06/26/2025	Gemma 3n released: https://developers.googleblog.com/en/introducing-gemma-3n-developer-guide
06/21/2025	LongWriter-Zero, RL trained ultra-long text generation: https://hf.co/THU-KEG/LongWriter-Zero-32B
06/20/2025	Magenta RealTime open music generation model released: https://hf.co/google/magenta-realtime
06/20/2025	Mistral-Small-3.2 released: https://hf.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506
06/19/2025	Kyutai streaming speech-to-text released: https://kyutai.org/next/stt
06/17/2025	Hunyuan3D-2.1 released: https://hf.co/tencent/Hunyuan3D-2.1
06/17/2025	SongGeneration model released: https://hf.co/tencent/SongGeneration
06/16/2025	Kimi-Dev-72B released: https://hf.co/moonshotai/Kimi-Dev-72B
06/16/2025	MiniMax-M1, hybrid-attention reasoning models released: https://github.com/MiniMax-AI/MiniMax-M1
06/14/2025	llama-model : add dots.llm1 architecture support merged: https://github.com/ggml-org/llama.cpp/pull/14118
06/14/2025	NuExtract-2.0 for structured information extraction: https://hf.co/collections/numind/nuextract-20-67c73c445106c12f2b1b6960
06/13/2025	Jan-Nano: A 4B MCP-Optimized DeepResearch Model: https://hf.co/Menlo/Jan-nano
06/11/2025	MNN TaoAvatar Android - Local 3D Avatar Intelligence: https://github.com/alibaba/MNN/blob/master/apps/Android/Mnn3dAvatar/README.md
06/11/2025	V-JEPA 2 world model released: https://ai.meta.com/blog/v-jepa-2-world-model-benchmarks
06/10/2025	Magistral-Small-2506 released, Mistral Small 3.1 (2503) with reasoning: https://mistral.ai/news/magistral
06/09/2025	Motif 2.6B, trained from scratch on AMD MI250 GPUs: https://hf.co/Motif-Technologies/Motif-2.6B
06/06/2025	Rednote hilab releases dots.llm1: https://hf.co/rednote-hilab/dots.llm1.inst
06/05/2025	GPT-SoVITS v2Pro released: https://github.com/RVC-Boss/GPT-SoVITS/releases/tag/20250606v2pro
06/05/2025	Qwen3 Embedding series released: https://qwenlm.github.io/blog/qwen3-embedding
06/04/2025	MiniMax releases SynLogic for logical reasoning: https://hf.co/collections/MiniMaxAI/synlogic-6836c3246fca0277657ff032
05/30/2025	Direct3D-S2 v1.0 and v1.1.released: https://hf.co/wushuang98/Direct3D-S2
05/30/2025	Xiaomi releases MiMo-VL-7B-RL: https://hf.co/XiaomiMiMo/MiMo-VL-7B-RL
05/29/2025	Osmosis-Structure-0.6B for structured output generation: https://hf.co/osmosis-ai/Osmosis-Structure-0.6B
05/29/2025	Anthropic open-sources circuit tracing tools: https://anthropic.com/research/open-source-circuit-tracing
05/28/2025	MSR_UFormers for music source restoration released: https://hf.co/yongyizang/MSR_UFormers
05/28/2025	DeepSeek-R1-0528 released: https://hf.co/deepseek-ai/DeepSeek-R1-0528
05/28/2025	Chatterbox TTS released with emotion exaggeration control: https://hf.co/ResembleAI/chatterbox
05/27/2025	PLaMo 2 translation model released: https://tech.preferred.jp/ja/blog/plamo-translate
05/26/2025	QwenLong-L1-32B trained with RL for long-context reasoning: https://hf.co/Tongyi-Zhiwen/QwenLong-L1-32B
05/25/2025	Orsta, trained on both visual perception and reasoning: https://github.com/MiniMax-AI/One-RL-to-See-Them-All
05/23/2025	AceReason-Nemotron-14B math and code reasoning model released: https://hf.co/nvidia/AceReason-Nemotron-14B
05/23/2025	Ming-Lite-Omni-Preview, unified MoE framework for cross-modal understanding and generation: https://hf.co/inclusionAI/Ming-Lite-Omni
05/23/2025	Kanana 1.5 Korean model released: https://tech.kakao.com/posts/707
05/22/2025	MMaDA-8B-Base multimodal large diffusion language model released: https://hf.co/Gen-Verse/MMaDA-8B-Base
05/21/2025	Devstral 24B released: https://mistral.ai/news/devstral
05/20/2025	AdaptThink: LLM Can Learn When to Think: https://github.com/THU-KEG/AdaptThink
05/20/2025	Falcon-H1: A Family of Hybrid-Head Models: https://falcon-lm.github.io/blog/falcon-h1
05/20/2025	BAGEL: Unified Model for Multimodal Understanding and Generation: https://hf.co/ByteDance-Seed/BAGEL-7B-MoT
05/20/2025	Gemma 3n released with MatFormer architecture: https://developers.googleblog.com/en/introducing-gemma-3n
05/20/2025	MedGemma for medical text and image comprehension: https://hf.co/collections/google/medgemma-release-680aade845f90bec6a3f60c4
05/20/2025	#13194 kv-cache : add SWA support merged: https://github.com/ggml-org/llama.cpp/pull/13194
05/19/2025	Marin 8B: Fully open source model based on LlamaForCausalLM: https://marin.community/blog/2025/05/19/announcement
05/17/2025	ParScale: Parallel Scaling Law for Language Model: https://github.com/QwenLM/ParScale
05/17/2025	Qwen releases WorldPM, 72B reward model: https://hf.co/Qwen/WorldPM-72B
05/15/2025	Falcon-Edge: b1.58 1B and 3B models released: https://falcon-lm.github.io/blog/falcon-edge
05/14/2025	Ling-lite-1.5, 16.8B-A2.75B released: https://hf.co/inclusionAI/Ling-lite-1.5
05/12/2025	#10544 llama/ggml: add LLM training support merged: https://github.com/ggml-org/llama.cpp/pull/10544
05/11/2025	INTELLECT-2 released: https://hf.co/PrimeIntellect/INTELLECT-2
05/11/2025	TabbyAPI officially supports exllamav3: https://github.com/theroyallab/tabbyAPI/pull/341
05/10/2025	AM-Thinking‑v1, 32B dense reasoning model built on Qwen2.5‑32B‑Base: https://a-m-team.github.io/am-thinking-v1
05/10/2025	Seed-Coder-8B released with base, instruct, and reasoning variants: https://hf.co/ByteDance-Seed/Seed-Coder-8B-Reasoning
05/09/2025	#12898 server : vision support via libmtmd merged: https://github.com/ggml-org/llama.cpp/pull/12898
05/09/2025	HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation: https://hf.co/tencent/HunyuanCustom
05/06/2025	Ming-Lite-Omni-Preview: 16.8B-A2.75B with cross-modal understanding and generation: https://hf.co/inclusionAI/Ming-Lite-Omni-Preview
05/06/2025	Absolute Zero reasoning training code released: https://github.com/LeapLabTHU/Absolute-Zero-Reasoner
05/06/2025	ACE-Step released, 3B hybrid music generation model: https://hf.co/ACE-Step/ACE-Step-v1-3.5B
05/04/2025	FormalMATH benchmark dataset for formal mathematical reasoning released: https://hf.co/datasets/SphereLab/FormalMATH-All
05/02/2025	Granite 4.0 preview 7B released: https://hf.co/ibm-granite/granite-4.0-tiny-preview
04/30/2025	Phi 4 reasoning models released: https://aka.ms/phi4-mini-reasoning/blog
04/30/2025	DeepSeek-Prover-V2 7B & 671B released: https://hf.co/deepseek-ai/DeepSeek-Prover-V2-671B
04/29/2025	Muyan-TTS released: https://github.com/MYZY-AI/Muyan-TTS
04/29/2025	MiMo-7B, reasoning model trained from scratch on 25T tokens: https://hf.co/XiaomiMiMo/MiMo-7B-RL
04/29/2025	Llama Guard 4 and Prompt Guard 2 models released: https://ai.meta.com/blog/llamacon-llama-news
04/28/2025	Qwen 3 released: https://qwenlm.github.io/blog/qwen3
04/25/2025	Kimi Audio, 7B two way audio model based on Qwen 2.5: https://hf.co/moonshotai/Kimi-Audio-7B-Instruct
04/24/2025	Dia 1.6B TTS released: https://hf.co/nari-labs/Dia-1.6B
04/24/2025	Unsloth Dynamic v2.0 GGUFs: https://unsloth.ai/blog/dynamic-v2
04/22/2025	GPT-SoVITS version 4 released: https://github.com/RVC-Boss/GPT-SoVITS/releases/tag/20250422v4
04/21/2025	SkyReels-V2: Wan finetune video generation model: https://hf.co/collections/Skywork/skyreels-v2-6801b1b93df627d441d0d0d9
04/20/2025	Elo HeLLM: Establishing Elo Ratings for LLMs: https://github.com/JohannesGaessler/elo_hellm
04/17/2025	Wan-AI First-Last-Frame-to-Video model released: https://hf.co/Wan-AI/Wan2.1-FLF2V-14B-720P
04/17/2025	FramePack project: Efficient Video Diffusion with Limited VRAM: https://lllyasviel.github.io/frame_pack_gitpage
04/17/2025	RealCustom: Disentangling Subject Similarity from Text Controllability: https://hf.co/bytedance-research/RealCustom
04/17/2025	DFloat11: Lossless LLM Compression for Efficient GPU Inference: https://github.com/LeanModels/DFloat11
04/17/2025	UI-TARS 1.5 7B: GUI agent reasoning model: https://seed-tars.com/1.5
04/17/2025	Perception-LM, Locate-3D, and BLT: Facebook Research Releases New Models and Code: https://ai.meta.com/blog/meta-fair-updates-perception-localization-reasoning
04/16/2025	Introduction of HELMET benchmark: Evaluating Roleplaying Capabilities in Language Models: https://github.com/princeton-nlp/HELMET
04/15/2025	INTELLECT-2: Globally Distributed 32B Parameter Model Training: https://www.primeintellect.ai/blog/intellect-2
04/15/2025	DeepSeek MLA implementation added to llama.cpp: https://github.com/ggml-org/llama.cpp/pull/12801
04/15/2025	VL-Rethinker: Vision-Language Model with Reinforcement Learning: https://tiger-ai-lab.github.io/VL-Rethinker
04/15/2025	Microsoft releases new 2B BitNet model trained on 4T tokens: https://hf.co/microsoft/bitnet-b1.58-2B-4T
04/14/2025	GLM-4-0414 and GLM-Z1 released: https://hf.co/collections/THUDM/glm-4-0414-67f3cbcb34dd9d252707cb2e
04/14/2025	Nemotron-H hybrid models released: https://hf.co/collections/nvidia/nemotron-h-67fd3d7ca332cdf1eb5a24bb
04/10/2025	Ultra long context Llama-3.1-8B: https://hf.co/collections/nvidia/ultralong-67c773cfe53a9a518841fbbe
04/10/2025	HoloPart: Generative 3D Part Amodal Segmentation: https://vast-ai-research.github.io/HoloPart
04/10/2025	InternVL3 released: https://hf.co/collections/OpenGVLab/internvl3-67f7f690be79c2fe9d74fe9d
04/09/2025	Kimi-VL: 16B-A3B reasoning VLM released: https://github.com/MoonshotAI/Kimi-VL
04/08/2025	DeepCoder-14B-Preview, code reasoning model finetuned from Deepseek-R1-Distilled-Qwen-14B via distributed RL: https://together.ai/blog/deepcoder
04/08/2025	Cogito v1 Preview released: https://deepcogito.com/research/cogito-v1-preview
04/08/2025	Llama-3.1-Nemotron-Ultra-253B-v1 released: https://hf.co/nvidia/Llama-3_1-Nemotron-Ultra-253B-v1
04/07/2025	Llama 4 text-only support merged: https://github.com/ggml-org/llama.cpp/pull/12791
04/06/2025	OuteTTS 1.0 released: https://hf.co/OuteAI/Llama-OuteTTS-1.0-1B
04/06/2025	Early preview release of ExLlamaV3, uses QTIP: https://github.com/turboderp-org/exllamav3
04/05/2025	Dream 7B diffusion LLM checkpoints and inference code released: https://github.com/HKUNLP/Dream
04/05/2025	Llama 4 Scout (109B-A17B) and Maverick (400B-A17B) released: https://ai.meta.com/blog/llama-4-multimodal-intelligence
04/03/2025	Lumina-mGPT 2.0: VLLM for image modeling: https://github.com/Alpha-VLLM/Lumina-mGPT-2.0
04/02/2025	RolmOCR, a drop-in alternative to olmOCR based on Qwen2.5-VL-7B: https://hf.co/reducto/RolmOCR
04/02/2025	AudioX: Diffusion Transformer for Anything-to-Audio Generation: https://github.com/ZeyueT/AudioX
03/31/2025	Qwen3 and Qwen3MoE support merged in Transformers: https://github.com/huggingface/transformers/pull/36878
03/28/2025	TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models: https://github.com/VAST-AI-Research/TripoSG
03/27/2025	QVQ-Max visual reasoning model: https://qwenlm.github.io/blog/qvq-max-preview
03/26/2025	Qwen2.5-Omni-7B with text, audio, image, video input, and text, audio output: https://qwenlm.github.io/blog/qwen2.5-omni
03/25/2025	Ling-Coder-Lite:16.8B-A2.75B released: https://hf.co/inclusionAI/Ling-Coder-lite
03/24/2025	Qwen2.5-VL-32B-Instruct released: https://qwenlm.github.io/blog/qwen2.5-vl-32b
03/24/2025	DeepSeek-V3-0324 released: https://hf.co/deepseek-ai/DeepSeek-V3-0324
03/21/2025	Open-RS: 1.5B reasoning models trained on math specific dataset: https://hf.co/collections/knoveleng/open-rs-67d940abc201a7e7f252ca4e
03/21/2025	SpatialLM: 3D LLM for structured 3D scene understanding: https://manycore-research.github.io/SpatialLM
03/21/2025	Bonsai: 0.5B ternary weight model trained on 5B tokens: https://github.com/deepgrove-ai/Bonsai
03/18/2025	Orpheus-TTS, built on the Llama-3B backbone: https://github.com/canopyai/Orpheus-TTS
03/18/2025	Hunyuan3D-2 finetune for multi-view generation: https://hf.co/tencent/Hunyuan3D-2mv
03/18/2025	Isaac GR00T N1, foundation model for robot reasoning and skills: https://github.com/NVIDIA/Isaac-GR00T
03/18/2025	Llama Nemotron reasoning models released: https://developer.nvidia.com/blog/build-enterprise-ai-agents-with-advanced-open-nvidia-llama-nemotron-reasoning-models
03/17/2025	EXAONE reasoning models of 2.4B, 7.8B, & 32B: https://hf.co/collections/LGAI-EXAONE/exaone-deep-67d119918816ec6efa79a4aa
03/17/2025	Mistral Small 3.1 released with multimodal understanding and 128k context: https://mistral.ai/fr/news/mistral-small-3-1
03/13/2025	Anon finetunes SmolLM on live 4chan data using GRPO: https://hf.co/theantichrist/Alpha-Anon-V01-135M
03/13/2025	Sesame 1B conversational speech generation model released: https://hf.co/sesame/csm-1b
03/13/2025	OLMo 2 32B released: https://allenai.org/blog/olmo2-32B
03/13/2025	Mistral-Small-24B finetuned with toggleable reasoning: https://hf.co/NousResearch/DeepHermes-3-Mistral-24B-Preview
03/13/2025	Cohere Command A 111B released: https://cohere.com/blog/command-a
03/12/2025	Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models: https://m-arriola.com/bd3lms
03/12/2025	Gemma 3 released: https://ai.google.dev/gemma/docs/core/model_card_3
03/11/2025	Reka Flash 3: 21B reasoning model based on LlamaForCausalLM: https://reka.ai/news/introducing-reka-flash
03/10/2025	Self-correcting textgen diffusion model: https://github.com/dvruette/gidd
03/10/2025	Self-correcting textgen diffusion model: https://hf.co/collections/dvruette/generalized-interpolating-discrete-diffusion-67c6fc45663eafb85c6487af
03/09/2025	RNN reasoning model released: https://hf.co/BlinkDL/rwkv7-g1
03/06/2025	Jamba 1.6 Mini (52B-A23B) & Large (398B-A94B) released: https://www.ai21.com/blog/introducing-jamba-1-6
03/05/2025	AMD GenAI releases 3B models trained on MI300X GPUs: https://github.com/AMD-AIG-AIMA/Instella
03/05/2025	QwQ 32B released: https://qwenlm.github.io/blog/qwq-32b
03/04/2025	Cohere releases Aya Vision 8B & 32B: https://hf.co/collections/CohereForAI/c4ai-aya-vision-67c4ccd395ca064308ee1484
03/01/2025	DeepSeek-V3/R1 Inference System Overview: https://xcancel.com/deepseek_ai/status/1895688300574462431
02/28/2025	GPT-SoVITS v3 official release: https://github.com/RVC-Boss/GPT-SoVITS/releases/tag/20250228v3
02/28/2025	DeepSeek releases 3FS, a distributed file system: https://github.com/deepseek-ai/3FS
02/27/2025	Cohere releases Arabic model: https://hf.co/CohereForAI/c4ai-command-r7b-arabic-02-2025
02/27/2025	DeepSeek releases optimized parallelism strategies: https://xcancel.com/deepseek_ai/status/1894931931554558199
02/26/2025	Phi-4-multimodal-instruct released: https://hf.co/microsoft/Phi-4-multimodal-instruct
02/26/2025	DeepGEMM - an FP8 GEMM library: https://github.com/deepseek-ai/DeepGEMM
02/25/2025	olmOCR for PDF text extraction with VLMs: https://github.com/allenai/olmocr
02/25/2025	Yandex releases model based on LlamaForCausalLM: https://hf.co/yandex/YandexGPT-5-Lite-8B-pretrain
02/25/2025	Wan2.1, 1.3B & 14B video generation models released : https://github.com/Wan-Video/Wan2.1
02/25/2025	DeepSeek releases DeepEP, EP communication library: https://github.com/deepseek-ai/DeepEP
02/24/2025	Polish government finetunes: https://hf.co/collections/CYFRAGOVPL/pllum-instruct-67a5d37f798f22e6bf0940c1
02/24/2025	DeepSeek releases FlashMLA kernel for Hopper GPUs: https://github.com/deepseek-ai/FlashMLA
02/22/2025	Moonlight-16B-A3B trained with Muon optimizer: https://hf.co/moonshotai/Moonlight-16B-A3B-Instruct
02/21/2025	Triton implementation of NSA paper: https://github.com/fla-org/native-sparse-attention
02/21/2025	DeepSeek announces plans to open source online service infrastructure: https://github.com/deepseek-ai/open-infra-index
02/19/2025	Microsoft releases 200M & 1.6B World and Human Action Model (WHAM): https://hf.co/microsoft/wham
02/19/2025	PaliGemma 2 mix released: https://developers.googleblog.com/en/introducing-paligemma-2-mix
02/18/2025	DeepSeek introduces Native Sparse Attention for ultra-fast long-context training and inference: https://arxiv.org/abs/2502.11089
02/17/2025	Human-centric HunyuanVideo finetune with img2vid support: https://github.com/SkyworkAI/SkyReels-V1
02/17/2025	Step-Audio: 130B bidirectional speech model & 3B TTS: https://github.com/stepfun-ai/Step-Audio
02/17/2025	Step-Video-T2V, 30B text-to-video model up to 204 frames: https://github.com/stepfun-ai/Step-Video-T2V
02/14/2025	Inference-time scaling of Flux: https://github.com/sayakpaul/tt-scale-flux
02/13/2025	Bakeneko: Qwen2.5 models continually pre-trained on Japanese-specific corpora: https://hf.co/collections/rinna/qwen25-bakeneko-67aa2ef444910bbc55a21222
02/11/2025	DeepScaleR: Training script & dataset reproducing R1's RL: https://github.com/agentica-project/deepscaler
02/10/2025	Huginn: 3.5B latent recurrent-depth proof-of-concept model: https://hf.co/tomg-group-umd/huginn-0125
02/10/2025	Zonos: TTS with voice cloning, emotion control, and audio prefixes: https://github.com/Zyphra/Zonos
02/10/2025	QuEST: Stable Training of LLMs with 1-Bit Weights and Activations: https://github.com/IST-DASLab/QuEST
02/10/2025	KTransformers adds DeepSeek-R1 and V3 support, up to 3~28x speedup: https://github.com/kvcache-ai/ktransformers/releases/tag/v0.2.0
02/04/2025	Physical Intelligence open sources pi0 robotics foundation model: https://pi.website/blog/openpi
01/30/2025	YuE for full-song generation, now under Apache 2.0: https://map-yue.github.io
01/30/2025	Mistral Small 3 base & instruct 24B released: https://mistral.ai/news/mistral-small-3
01/30/2025	Tülu 3 405B released: https://allenai.org/blog/tulu-3-405B
01/28/2025	32B distilled from 5B+ tokens worth of Deepseek-v3 logits: https://hf.co/arcee-ai/Virtuoso-Medium-v2
01/27/2025	Japanese finetune of R1-Distill-Qwen-32B: https://hf.co/cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese
01/27/2025	YuE for full-song generation: https://map-yue.github.io
01/27/2025	Psyche for decentralized model training: https://github.com/PsycheFoundation/psyche
01/27/2025	Qwen2.5 VL released: https://qwenlm.github.io/blog/qwen2.5-vl
01/27/2025	DeepSeek releases Janus-Pro-7B: https://hf.co/deepseek-ai/Janus-Pro-7B
01/26/2025	Alibaba releases MnnLlmApp for Android: https://github.com/alibaba/MNN/blob/master/project/android/apps/MnnLlmApp
01/26/2025	Qwen2.5-1M, with context length up to 1M tokens: https://qwenlm.github.io/blog/qwen2.5-1m
01/25/2025	In progress reproduction of DeepSeek-R1: https://github.com/huggingface/open-r1
01/25/2025	32B reasoner trained to reduce generation lengths: https://hf.co/NovaSky-AI/Sky-T1-32B-Flash
01/24/2025	TinyZero: Reproduction of DeepSeek R1 Zero: https://github.com/Jiayi-Pan/TinyZero
01/24/2025	Hunyuan-7B-Instruct released: https://hf.co/tencent/Hunyuan-7B-Instruct
01/22/2025	VideoLLaMA3, based on Qwen2.5, released: https://github.com/DAMO-NLP-SG/VideoLLaMA3
01/22/2025	MiniCPM-Omni image understanding support merged: https://github.com/ggerganov/llama.cpp/pull/11289
01/22/2025	UI-TARS: 8B & 72B VLM GUI agent models: https://github.com/bytedance/UI-TARS
01/22/2025	Hunyuan3D-2.0GP runs with less than 6 GB of VRAM: https://github.com/deepbeepmeep/Hunyuan3D-2GP
01/21/2025	BSC-LT, funded by EU, releases 2B, 7B & 40B models: https://hf.co/collections/BSC-LT/salamandra-66fc171485944df79469043a
01/21/2025	Hunyuan3D 2.0 released: https://hf.co/tencent/Hunyuan3D-2
01/20/2025	DeepSeek releases R1, R1 Zero, & finetuned Qwen and Llama models: https://hf.co/collections/deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d
01/17/2025	Nvidia AceInstruct, finetuned on Qwen2.5-Base: https://hf.co/nvidia/AceInstruct-72B
01/16/2025	OuteTTS-0.3 released with voice cloning & punctuation support: https://hf.co/collections/OuteAI/outetts-03-6786b1ebc7aeb757bc17a2fa
01/15/2025	InternLM3-8B-Instruct released with deep thinking capability: https://hf.co/internlm/internlm3-8b-instruct
01/14/2025	MiniMax-Text-01 released with 456B-A45.9B & hybrid-lightning attention: https://hf.co/MiniMaxAI/MiniMax-Text-01
01/14/2025	MiniCPM-o 2.6 released with multi-image and video understanding, realtime speech conversation, voice cloning, and multimodal live streaming: https://hf.co/openbmb/MiniCPM-o-2_6
01/08/2025	Phi-4 weights released: https://hf.co/microsoft/phi-4
01/06/2025	NVIDIA Project DIGITS announced, capable of running 200B models: https://nvidianews.nvidia.com/news/nvidia-puts-grace-blackwell-on-every-desk-and-at-every-ai-developers-fingertips
01/06/2025	Nvidia releases Cosmos world foundation models: https://github.com/NVIDIA/Cosmos
01/04/2025	DeepSeek V3 support merged: https://github.com/ggerganov/llama.cpp/pull/11049
12/26/2024	CogAgent-9B updated version released: https://hf.co/THUDM/cogagent-9b-20241220
12/26/2024	DeepSeek-V3 instruct released: https://hf.co/deepseek-ai/DeepSeek-V3
12/25/2024	DeepSeek-V3-Base 671B-A37B released: https://hf.co/deepseek-ai/DeepSeek-V3-Base
12/24/2024	QVQ: 72B visual reasoning model released: https://qwenlm.github.io/blog/qvq-72b-preview
12/24/2024	Infinity 2B, bitwise autoregressive text-to-image model: https://hf.co/FoundationVision/Infinity
12/20/2024	RWKV-7 released: https://hf.co/BlinkDL/rwkv-7-world
12/19/2024	Finally, a Replacement for BERT: https://hf.co/blog/modernbert
12/18/2024	Bamba-9B, hybrid model trained by IBM, Princeton, CMU, and UIUC on open data: https://hf.co/blog/bamba
12/18/2024	Apollo unreleased: https://github.com/Apollo-LMMs/Apollo
12/18/2024	Granite 3.1 released: https://hf.co/ibm-granite/granite-3.1-8b-instruct
12/17/2024	Falcon3 models released, including b1.58 quants: https://hf.co/blog/falcon3
12/16/2024	Apollo: Qwen2.5 models finetuned by Meta GenAI for video understanding: https://hf.co/Apollo-LMMs/Apollo-7B-t32
12/15/2024	CosyVoice2-0.5B released: https://funaudiollm.github.io/cosyvoice2
12/14/2024	Qwen2VL support merged: https://github.com/ggerganov/llama.cpp/pull/10361
12/13/2024	Sberbank releases Russian model based on DeepseekForCausalLM: https://hf.co/ai-sage/GigaChat-20B-A3B-instruct
12/13/2024	DeepSeek-VL2/-Small/-Tiny release. MoE vision models with 4.5B/2.8B/1.0B active parameters: https://hf.co/deepseek-ai/deepseek-vl2
12/13/2024	Cohere releases Command-R7B: https://cohere.com/blog/command-r7b
12/12/2024	QRWKV6-32B-Instruct preview releases, a linear model converted from Qwen2.5-32B-Instruct: https://hf.co/recursal/QRWKV6-32B-Instruct-Preview-v0.1
12/12/2024	LoRA training for HunyuanVideo: https://github.com/tdrussell/diffusion-pipe
12/10/2024	HF decides not to limit public storage: https://hf.co/posts/julien-c/388331843225875
12/10/2024	Upgraded version of DeepSeek-V2.5: https://hf.co/deepseek-ai/DeepSeek-V2.5-1210
12/09/2024	LG releases EXAONE-3.5: https://hf.co/LGAI-EXAONE/EXAONE-3.5-32B-Instruct
12/06/2024	Microsoft releases TRELLIS, a large 3D asset generation model: https://github.com/Microsoft/TRELLIS
12/06/2024	Qwen2-VL released: https://hf.co/Qwen/Qwen2-VL-72B
12/06/2024	InternVL2.5 released: https://hf.co/OpenGVLab/InternVL2_5-78B
12/06/2024	Meta releases Llama-3.3-70B-Instruct: https://hf.co/meta-llama/Llama-3.3-70B-Instruct
12/05/2024	PaliGemma 2: https://hf.co/collections/google/paligemma-2-release-67500e1e1dbfdd4dee27ba48
12/04/2024	Fish Speech V1.5 released: https://hf.co/fishaudio/fish-speech-1.5
12/03/2024	HunyuanVideo: 13B large video generation model released: https://hf.co/tencent/HunyuanVideo
12/02/2024	Nous trains a 15B model using DisTrO: https://distro.nousresearch.com
11/29/2024	INTELLECT-1 released: https://hf.co/PrimeIntellect/INTELLECT-1-Instruct
11/27/2024	Qwen2.5-32B-Instruct reflection tune: https://qwenlm.github.io/blog/qwq-32b-preview
11/26/2024	OLMo 2 released: https://hf.co/collections/allenai/olmo-2-674117b93ab84e98afc72edc
11/26/2024	Anon re-implements Sparse Matrix Tuning paper: https://github.com/HeroMines/SMFT
11/25/2024	Qwen2VL integrated with Flux: https://github.com/erwold/qwen2vl-flux
11/25/2024	Speculative decoding added to llama-server: https://github.com/ggerganov/llama.cpp/pull/10455
11/22/2024	LTX-Video: Real-time video generation on a single 4090: https://github.com/Lightricks/LTX-Video
11/21/2024	Tülu3: Instruct finetunes on top of Llama 3.1 base: https://hf.co/collections/allenai/tulu-3-models-673b8e0dc3512e30e7dc54f5
11/20/2024	LLaMA-Mesh weights released: https://hf.co/Zhengyi/LLaMA-Mesh
11/18/2024	Mistral and Pixtral Large Instruct 2411 released: https://mistral.ai/news/pixtral-large
11/12/2024	Qwen2.5-Coder series released: https://qwenlm.github.io/blog/qwen2.5-coder-family
11/08/2024	Sarashina2-8x70B, a Japan-trained LLM model: https://hf.co/sbintuitions/sarashina2-8x70b
11/05/2024	Hunyuan-Large released with 389B and 52B active: https://hf.co/tencent/Tencent-Hunyuan-Large
10/31/2024	QTIP: Quantization with Trellises and Incoherence Processing: https://github.com/Cornell-RelaxML/qtip
10/31/2024	Fish Agent V0.1 3B: Voice-to-Voice and TTS model: https://hf.co/fishaudio/fish-agent-v0.1-3b
10/31/2024	Transluce open-sources AI investigation toolkit: https://github.com/TransluceAI/observatory
10/30/2024	TokenFormer models with fully attention-based architecture: https://hf.co/Haiyang-W/TokenFormer-1-5B
10/30/2024	MaskGCT: Zero-Shot TTS with Masked Generative Codec Transformer: https://hf.co/amphion/MaskGCT
10/25/2024	GLM-4-Voice: End-to-end speech and text model based on GLM-4-9B: https://hf.co/THUDM/glm-4-voice-9b
10/24/2024	Aya Expanse released with 23 supported languages: https://hf.co/CohereForAI/aya-expanse-32b
10/22/2024	genmoai-smol allows video inference on 24 GB RAM: https://github.com/victorchall/genmoai-smol
10/22/2024	Mochi-1: 10B Asymmetric Diffusion Transformer text-to-video model: https://hf.co/genmo/mochi-1-preview
10/22/2024	Pangea: Open-source multilingual multimodal LLM supporting 39 languages: https://neulab.github.io/Pangea
10/21/2024	IBM releases Granite 3.0: https://hf.co/collections/ibm-granite/granite-30-models-66fdb59bbb54785c3512114f
10/18/2024	New research, models, and datasets from Meta FAIR: https://ai.meta.com/blog/fair-news-segment-anything-2-1-meta-spirit-lm-layer-skip-salsa-lingua
10/18/2024	bitnet.cpp: Official inference framework for 1-bit LLMs: https://github.com/microsoft/BitNet
10/18/2024	DeepSeek releases Janus-1.3B with multimodal understanding and generation: https://hf.co/deepseek-ai/Janus-1.3B
10/16/2024	Ministral 8B instruct model released: https://mistral.ai/news/ministraux
10/15/2024	PLaMo-100B: English and Japanese base model: https://hf.co/pfnet/plamo-100b
10/15/2024	Llama-3.1-70B-Instruct customized by NVIDIA: https://hf.co/nvidia/Llama-3.1-Nemotron-70B-Instruct
10/14/2024	Llama 3.1 linearized: https://hf.co/collections/hazyresearch/lolcats-670ca4341699355b61238c37
10/14/2024	Zamba2-7B released: https://www.zyphra.com/post/zamba2-7b
10/14/2024	Ichigo, voice-to-voice model based on Llama 3.1, released: https://homebrew.ltd/blog/llama-learns-to-talk
10/12/2024	Fast multilingual TTS with voice cloning, based on flow matching with DiT: https://github.com/SWivid/F5-TTS
10/11/2024	14B cross-architecture distillation model: https://hf.co/arcee-ai/SuperNova-Medius
10/10/2024	Aria: 25.3B, 3.9B active, multimodal native MoE model with 64k context: https://hf.co/rhymes-ai/Aria
09/27/2024	Emu3, next-token prediction multimodal models: https://hf.co/collections/BAAI/emu3-66f4e64f70850ff358a2e60f
09/25/2024	Multimodal Llama 3.2 released: https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices
09/25/2024	Molmo: Multimodal models based on OLMo, OLMoE, and Qwen-72B: https://molmo.allenai.org/blog
09/24/2024	Llama-3.1-70B-instruct distilled to 51B: https://hf.co/nvidia/Llama-3_1-Nemotron-51B-Instruct
09/18/2024	Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5
09/18/2024	Llama 8B quantized to b1.58 through finetuning: https://hf.co/blog/1_58_llm_extreme_quantization
09/17/2024	Mistral releases new 22B with 128k context and function calling: https://mistral.ai/news/september-24-release
09/12/2024	DataGemma with DataCommons retrieval: https://blog.google/technology/ai/google-datagemma-ai-llm
09/12/2024	LLaMA-Omni: Multimodal LLM with seamless speech interaction: https://hf.co/ICTNLP/Llama-3.1-8B-Omni
09/11/2024	Fish Speech multilingual TTS with voice replication: https://hf.co/fishaudio/fish-speech-1.4
09/11/2024	Pixtral: 12B with image input vision adapter: https://xcancel.com/mistralai/status/1833758285167722836
09/11/2024	Solar Pro Preview, Phi-3-medium upscaled to 22B: https://hf.co/upstage/solar-pro-preview-instruct
09/06/2024	DeepSeek-V2.5 released, combines Chat and Instruct: https://hf.co/deepseek-ai/DeepSeek-V2.5
09/05/2024	FluxMusic: Text-to-Music Generation with Rectified Flow Transformer: https://github.com/feizc/fluxmusic
09/04/2024	Yi-Coder: 1.5B & 9B with 128K context and 52 programming languages: https://hf.co/blog/lorinma/yi-coder
09/04/2024	OLMoE 7x1B fully open source model release: https://hf.co/allenai/OLMoE-1B-7B-0924-Instruct
08/30/2024	Command models get an August refresh: https://docs.cohere.com/changelog/command-gets-refreshed
08/29/2024	Qwen2-VL 2B & 7B image+video models released: https://qwenlm.github.io/blog/qwen2-vl/
08/27/2024	CogVideoX-5B, diffusion transformer text-to-video model: https://hf.co/THUDM/CogVideoX-5b
08/22/2024	Jamba 1.5: 52B & 398B MoE: https://hf.co/collections/ai21labs/jamba-15-66c44befa474a917fcf55251
08/20/2024	Microsoft's Phi-3.5 released: mini+MoE+vision: https://hf.co/microsoft/Phi-3.5-MoE-instruct
08/16/2024	MiniCPM-V-2.6 support merged: https://github.com/ggerganov/llama.cpp/pull/8967
08/15/2024	Hermes 3 released, full finetunes of Llama 3.1 base models: https://hf.co/collections/NousResearch/hermes-3-66bd6c01399b14b08fe335ea
08/12/2024	Falcon Mamba 7B model from TII UAE: https://hf.co/tiiuae/falcon-mamba-7b
08/09/2024	Qwen large audio-input language models: https://hf.co/Qwen/Qwen2-Audio-7B-Instruct
08/07/2024	LG AI releases Korean bilingual model: https://hf.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct
08/05/2024	vLLM GGUF loading support merged: https://github.com/vllm-project/vllm/pull/5191
07/31/2024	Gemma 2 2B, ShieldGemma, and Gemma Scope: https://developers.googleblog.com/en/smaller-safer-more-transparent-advancing-responsible-ai-with-gemma
07/27/2024	Llama 3.1 rope scaling merged: https://github.com/ggerganov/llama.cpp/pull/8676
07/25/2024	BAAI & TeleAI release 1T parameter model: https://hf.co/CofeAI/Tele-FLM-1T
07/24/2024	Mistral Large 2 123B released: https://hf.co/mistralai/Mistral-Large-Instruct-2407
07/23/2024	Llama 3.1 officially released: https://ai.meta.com/blog/meta-llama-3-1/
07/22/2024	llamanon leaks 405B base model: https://files.catbox.moe/d88djr.torrent >>101516633
07/18/2024	Improved DeepSeek-V2-Chat 236B: https://hf.co/deepseek-ai/DeepSeek-V2-Chat-0628
07/18/2024	Mistral NeMo 12B base & instruct with 128k context: https://mistral.ai/news/mistral-nemo/
07/16/2024	Codestral Mamba, tested up to 256k context: https://hf.co/mistralai/mamba-codestral-7B-v0.1
07/16/2024	MathΣtral Instruct based on Mistral 7B: https://hf.co/mistralai/mathstral-7B-v0.1
07/13/2024	Llama 3 405B coming July 23rd: https://x.com/steph_palazzolo/status/1811791968600576271
07/09/2024	Anole, based on Chameleon, for interleaved image-text generation: https://hf.co/GAIR/Anole-7b-v0.1
07/07/2024	Support for glm3 and glm4 merged into llama.cpp: https://github.com/ggerganov/llama.cpp/pull/8031
07/02/2024	Japanese LLaMA-based model pre-trained on 2T tokens: https://hf.co/cyberagent/calm3-22b-chat
06/28/2024	Inference support for Gemma 2 merged: https://github.com/ggerganov/llama.cpp/pull/8156
06/27/2024	Meta announces LLM Compiler, based on Code Llama, for code optimization and disassembly: https://go.fb.me/tdd3dw
06/27/2024	Gemma 2 released: https://hf.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315
06/25/2024	Cambrian-1: Collection of vision-centric multimodal LLMs: https://cambrian-mllm.github.io
06/23/2024	Support for BitnetForCausalLM merged: https://github.com/ggerganov/llama.cpp/pull/7931
06/18/2024	Meta Research releases multimodal 34B, audio, and multi-token prediction models: https://ai.meta.com/blog/meta-fair-research-new-releases
06/17/2024	DeepSeekCoder-V2 released with 236B & 16B MoEs: https://github.com/deepseek-ai/DeepSeek-Coder-V2
06/14/2024	Nemotron-4-340B: Dense model designed for synthetic data generation: https://hf.co/nvidia/Nemotron-4-340B-Instruct
06/14/2024	Nvidia collection of Mamba-2-based research models: https://hf.co/collections/nvidia/ssms-666a362c5c3bb7e4a6bcfb9c
06/11/2024	Google releases RecurrentGemma, based on a hybrid RNN architecture: https://hf.co/google/recurrentgemma-9b-it
06/06/2024	Qwen2 releases, with better benchmarks than Llama 3: https://qwenlm.github.io/blog/qwen2/
06/01/2024	KV cache quantization support merged: https://github.com/ggerganov/llama.cpp/pull/7527
05/31/2024	K2: Fully-reproducible model outperforming Llama 2 70B using 35% less compute: https://hf.co/LLM360/K2
05/29/2024	Mistral releases Codestral-22B: https://mistral.ai/news/codestral/
05/28/2024	DeepSeek-V2 support officially merged: https://github.com/ggerganov/llama.cpp/pull/7519
05/24/2024	Draft PR adds support for Jamba: https://github.com/ggerganov/llama.cpp/pull/7531
05/23/2024	Cohere releases 8B & 35B Aya 23 with multilingual capabilities: https://hf.co/collections/CohereForAI/c4ai-aya-23-664f4cda3fa1a30553b221dc
05/22/2024	Mistral v0.3 models with function calling and extended vocab: https://github.com/mistralai/mistral-inference#model-download
05/21/2024	Fork of llama.cpp adds DeepSeek-V2 support: https://hf.co/leafspark/DeepSeek-V2-Chat-GGUF
05/21/2024	Microsoft launches Phi-3 small (7B) and medium (14B) under MIT: https://aka.ms/phi3-hf
05/16/2024	DeepSeek AI releases 16B V2-Lite: https://hf.co/deepseek-ai/DeepSeek-V2-Lite-Chat
05/14/2024	PaliGemma, Gemma 2, and LLM Comparator: https://developers.googleblog.com/gemma-family-and-toolkit-expansion-io-2024
05/12/2024	Yi-1.5 Released with Improved Coding, Math, and Reasoning Capabilities: https://hf.co/collections/01-ai/yi-15-2024-05-663f3ecab5f815a3eaca7ca8
05/11/2024	Japanese 13B model trained on CPU supercomputer: https://hf.co/Fugaku-LLM/Fugaku-LLM-13B
05/11/2024	OneBit: Towards Extremely Low-bit LLMs: https://github.com/xuyuzhuang11/OneBit
05/10/2024	Gemma 2B - 10M Context: https://hf.co/mustafaaljadery/gemma-2B-10M
05/08/2024	Refuel LLM-2 for data labeling, enrichment, and cleaning: https://hf.co/refuelai/Llama-3-Refueled
05/08/2024	OpenAI releases AI Specification: https://cdn.openai.com/spec/model-spec-2024-05-08.html
05/06/2024	IBM releases Granite Code Models: https://github.com/ibm-granite/granite-code-models
05/02/2024	Nvidia releases Llama3-ChatQA-1.5, excels at QA & RAG: https://chatqa-project.github.io/
05/01/2024	KAN: Kolmogorov-Arnold Networks: https://arxiv.org/abs/2404.19756
05/01/2024	Orthogonalized Llama-3-8b: https://hf.co/hjhj3168/Llama-3-8b-Orthogonalized-exl2
04/27/2024	Refusal in LLMs is mediated by a single direction: https://alignmentforum.org/posts/jGuXSZgv6qfdhMCuJ
04/24/2024	Snowflake Arctic Instruct 128x3B MoE released: https://hf.co/Snowflake/snowflake-arctic-instruct
04/23/2024	Phi-3 Mini model released: https://hf.co/microsoft/Phi-3-mini-128k-instruct-onnx
04/21/2024	Llama3 70B pruned to 42B parameters: https://hf.co/chargoddard/llama3-42b-v0
04/18/2024	Llama3 8B, 70B pretrained and instruction-tuned models released: https://llama.meta.com/llama3/
04/17/2024	Mixtral-8x22B-Instruct-v0.1 released: https://mistral.ai/news/mixtral-8x22b/
04/15/2024	Microsoft AI unreleases WizardLM 2: https://web.archive.org/web/20240415221214/https://wizardlm.github.io/WizardLM2/
04/15/2024	Microsoft AI releases WizardLM 2, including Mixtral 8x22B finetune: https://wizardlm.github.io/WizardLM2/
04/09/2024	Mistral releases Mixtral-8x22B: https://twitter.com/MistralAI/status/1777869263778291896
04/09/2024	Llama 3 coming in the next month: https://techcrunch.com/2024/04/09/meta-confirms-that-its-llama-3-open-source-llm-is-coming-in-the-next-month/
04/08/2024	StableLM 2 12B released https://huggingface.co/stabilityai/stablelm-2-12b
04/05/2024	Qwen1.5-32B released with GQA: https://huggingface.co/Qwen/Qwen1.5-32B
04/04/2024	Command R+ released with GQA, 104B, 128K context: https://huggingface.co/CohereForAI/c4ai-command-r-plus
03/28/2024	MiniGemini: Dense and MoE vision models: https://github.com/dvlab-research/MiniGemini
03/28/2024	Jamba 52B MoE released with 256k context: https://huggingface.co/ai21labs/Jamba-v0.1
03/27/2024	Databricks releases 132B MoE model: https://huggingface.co/collections/databricks/dbrx-6601c0852a0cdd3c59f71962
03/23/2024	Mistral releases 7B v0.2 base model with 32k context: https://models.mistralcdn.com/mistral-7b-v0-2/mistral-7B-v0.2.tar
03/23/2024	Grok support merged: https://github.com/ggerganov/llama.cpp/pull/6204
03/17/2024	xAI open sources Grok: https://github.com/xai-org/grok
03/15/2024	Control vector support in llamacpp: https://github.com/ggerganov/llama.cpp/pull/5970
03/11/2024	New 35B RAG model with 128K context: https://huggingface.co/CohereForAI/c4ai-command-r-v01
03/11/2024	This week, xAI will open source Grok: https://twitter.com/elonmusk/status/1767108624038449405
02/28/2024	The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits https://arxiv.org/abs/2402.17764
02/27/2024	Mistral readds the notice to their website https://www.reddit.com/r/LocalLLaMA/comments/1b18817/mistral_changing_and_then_reversing_website/
02/26/2024	Mistral partners with Microsoft, removes mentions of open models from website https://siliconangle.com/2024/02/26/now-microsoft-partner-mistral-ai-challenges-openai-three-new-llms/
02/21/2024	Google releases two open models, Gemma https://blog.google/technology/developers/gemma-open-models/
02/18/2024	1.5bit quant for lcpp merged https://github.com/ggerganov/llama.cpp/pull/5453
02/17/2024	Kobold.cpp-1.58 prebuilt released https://github.com/LostRuins/koboldcpp/releases/tag/v1.58
02/16/2024	Exl2 added Qwen support https://github.com/turboderp/exllamav2/issues/334
02/16/2024	ZLUDA for lcpp merged, however, outlook is questionable at best https://github.com/vosen/ZLUDA/pull/102
06/23/2023	Ooba's preset arena results and SuperHOT 16k prototype releases
06/22/2023	Vicuna 33B (preview), OpenLLaMA 7B scaled and MPT 30B released
06/20/2023	SuperHOT Prototype 2 w/ 8K context released >>94191797
06/18/2023	Minotaur 15B 8K, WizardLM 7B Uncensored v1.0 and Vicuna 1.3 released
06/17/2023	exllama support merged into ooba; API server rewrite merged into llama.cpp
06/16/2023	OpenLlama 13B released
06/16/2023	Airoboros GPT-4 v1.2 released
06/16/2023	Robin-33B-V2 released
06/16/2023	Dan's 30B Personality Engine LoRA released
06/14/2023	WizardCoder 15B Released
06/14/2023	CUDA full GPU acceleration merged in llama.cpp
06/10/2023	First Landmark Attention models released >>93993800 (Cross-thread)
06/08/2023	Openllama 3B and 7B released
06/07/2023	StarCoderPlus / StarChat-β released
06/07/2023	chronos-33b released
06/06/2023	RedPajama 7B released + Instruct&Chat
06/06/2023	WizardLM 30B v1.0 released
06/05/2023	k-quantization released for llama.cpp
06/03/2023	Nous-Hermes-13b released
06/03/2023	WizardLM-Uncensored-Falcon-40b released
05/27/2023	FalconLM release Falcon-7B & 40B, new foundational models
05/26/2023	BluemoonRP 30B 4K released
05/25/2023	QLoRA and 4bit bitsandbytes released
05/23/2023	exllama transformer rewrite offers around x2 t/s increases for GPU models
05/22/2023	SuperHOT 13B prototype & WizardLM Uncensored 30B released
05/22/2023	SuperHOT 13B prototype & WizardLM Uncensored 30B released
05/19/2023	RTX 30 series 15% performance gains, quantization breaking changes again >>93536523
05/19/2023	PygmalionAI release 13B Pyg & Meth
05/18/2023	VicunaUnlocked-30B released
05/14/2023	llama.cpp quantization change breaks current Q4 & Q5 models, must be quantized again
05/13/2023	llama.cpp GPU acceleration has been merged onto master >>93403996 >>93404319
05/10/2023	GPU-accelerated token generation >>93334002
05/06/2023	MPT 7B, 65k context model trained on 1T tokens: https://huggingface.co/mosaicml/mpt-7b-storywriter
05/05/2023	GPT4-x-AlpacaDente2-30b. https://huggingface.co/Aeala/GPT4-x-AlpacaDente2-30b
05/04/2023	Allegedly leaked document from Google, fretting over Open Source LLM's. https://www.semianalysis.com/p/google-we-have-no-moat-and-neither
05/04/2023	StarCoder, a 15.5B parameter models trained on 80+ programming languages: https://huggingface.co/bigcode/starcoderbase
04/30/2023	Uncucked Vicuna 13B released: https://huggingface.co/reeducator/vicuna-13b-free
04/30/2023	PygmalionAI release two 7B LLaMA-based models: https://huggingface.co/PygmalionAI
04/29/2023	GPT4 X Alpasta 30B Merge: https://huggingface.co/MetaIX/GPT4-X-Alpasta-30b-4bit
04/25/2023	Proxy script for Tavern via Kobold/webui, increases LLaMA: output quality https://github.com/anon998/simple-proxy-for-tavern
04/23/2023	OASS 30B released & quantized: https://huggingface.co/MetaIX/OpenAssistant-Llama-30b-4bit
04/22/2023	SuperCOT LoRA (by kaiokendev), merged by helpful anons: https://huggingface.co/tsumeone/llama-30b-supercot-4bit-128g-cuda https://huggingface.co/ausboss/llama-13b-supercot-4bit-128g
04/22/2023	OASS "releases" XORs again, deletes them soon after... again
04/21/2023	StableLM models performing terribly, are apparently broken: https://github.com/Stability-AI/StableLM/issues/30