Archived News:

Date: (MM/DD/YYYY) Description:
02/17/2024 Step-Audio: 130B bidirectional speech model & 3B TTS: https://github.com/stepfun-ai/Step-Audio
02/17/2024 Step-Video-T2V, 30B text-to-video model up to 204 frames: https://github.com/stepfun-ai/Step-Video-T2V
02/14/2024 Inference-time scaling of Flux: https://github.com/sayakpaul/tt-scale-flux
02/13/2024 Bakeneko: Qwen2.5 models continually pre-trained on Japanese-specific corpora: https://hf.co/collections/rinna/qwen25-bakeneko-67aa2ef444910bbc55a21222
02/11/2024 DeepScaleR: Training script & dataset reproducing R1's RL: https://github.com/agentica-project/deepscaler
02/10/2024 Huginn: 3.5B latent recurrent-depth proof-of-concept model: https://hf.co/tomg-group-umd/huginn-0125
02/10/2024 Zonos: TTS with voice cloning, emotion control, and audio prefixes: https://github.com/Zyphra/Zonos
02/10/2024 QuEST: Stable Training of LLMs with 1-Bit Weights and Activations: https://github.com/IST-DASLab/QuEST
02/10/2024 KTransformers adds DeepSeek-R1 and V3 support, up to 3~28x speedup: https://github.com/kvcache-ai/ktransformers/releases/tag/v0.2.0
02/04/2024 Physical Intelligence open sources pi0 robotics foundation model: https://pi.website/blog/openpi
01/30/2024 YuE for full-song generation, now under Apache 2.0: https://map-yue.github.io
01/30/2024 Mistral Small 3 base & instruct 24B released: https://mistral.ai/news/mistral-small-3
01/30/2024 Tülu 3 405B released: https://allenai.org/blog/tulu-3-405B
01/28/2024 32B distilled from 5B+ tokens worth of Deepseek-v3 logits: https://hf.co/arcee-ai/Virtuoso-Medium-v2
01/27/2024 Japanese finetune of R1-Distill-Qwen-32B: https://hf.co/cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese
01/27/2024 YuE for full-song generation: https://map-yue.github.io
01/27/2024 Psyche for decentralized model training: https://github.com/PsycheFoundation/psyche
01/27/2024 Qwen2.5 VL released: https://qwenlm.github.io/blog/qwen2.5-vl
01/27/2024 DeepSeek releases Janus-Pro-7B: https://hf.co/deepseek-ai/Janus-Pro-7B
01/26/2024 Alibaba releases MnnLlmApp for Android: https://github.com/alibaba/MNN/blob/master/project/android/apps/MnnLlmApp
01/26/2024 Qwen2.5-1M, with context length up to 1M tokens: https://qwenlm.github.io/blog/qwen2.5-1m
01/25/2024 In progress reproduction of DeepSeek-R1: https://github.com/huggingface/open-r1
01/25/2024 32B reasoner trained to reduce generation lengths: https://hf.co/NovaSky-AI/Sky-T1-32B-Flash
01/24/2024 TinyZero: Reproduction of DeepSeek R1 Zero: https://github.com/Jiayi-Pan/TinyZero
01/24/2024 Hunyuan-7B-Instruct released: https://hf.co/tencent/Hunyuan-7B-Instruct
01/22/2024 VideoLLaMA3, based on Qwen2.5, released: https://github.com/DAMO-NLP-SG/VideoLLaMA3
01/22/2024 MiniCPM-Omni image understanding support merged: https://github.com/ggerganov/llama.cpp/pull/11289
01/22/2024 UI-TARS: 8B & 72B VLM GUI agent models: https://github.com/bytedance/UI-TARS
01/22/2024 Hunyuan3D-2.0GP runs with less than 6 GB of VRAM: https://github.com/deepbeepmeep/Hunyuan3D-2GP
01/21/2024 BSC-LT, funded by EU, releases 2B, 7B & 40B models: https://hf.co/collections/BSC-LT/salamandra-66fc171485944df79469043a
01/21/2024 Hunyuan3D 2.0 released: https://hf.co/tencent/Hunyuan3D-2
01/20/2024 DeepSeek releases R1, R1 Zero, & finetuned Qwen and Llama models: https://hf.co/collections/deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d
01/17/2024 Nvidia AceInstruct, finetuned on Qwen2.5-Base: https://hf.co/nvidia/AceInstruct-72B
01/16/2024 OuteTTS-0.3 released with voice cloning & punctuation support: https://hf.co/collections/OuteAI/outetts-03-6786b1ebc7aeb757bc17a2fa
01/15/2024 InternLM3-8B-Instruct released with deep thinking capability: https://hf.co/internlm/internlm3-8b-instruct
01/14/2024 MiniMax-Text-01 released with 456B-A45.9B & hybrid-lightning attention: https://hf.co/MiniMaxAI/MiniMax-Text-01
01/14/2024 MiniCPM-o 2.6 released with multi-image and video understanding, realtime speech conversation, voice cloning, and multimodal live streaming: https://hf.co/openbmb/MiniCPM-o-2_6
01/08/2024 Phi-4 weights released: https://hf.co/microsoft/phi-4
01/06/2024 NVIDIA Project DIGITS announced, capable of running 200B models: https://nvidianews.nvidia.com/news/nvidia-puts-grace-blackwell-on-every-desk-and-at-every-ai-developers-fingertips
01/06/2024 Nvidia releases Cosmos world foundation models: https://github.com/NVIDIA/Cosmos
01/04/2024 DeepSeek V3 support merged: https://github.com/ggerganov/llama.cpp/pull/11049
12/26/2024 CogAgent-9B updated version released: https://hf.co/THUDM/cogagent-9b-20241220
12/26/2024 DeepSeek-V3 instruct released: https://hf.co/deepseek-ai/DeepSeek-V3
12/25/2024 DeepSeek-V3-Base 671B-A37B released: https://hf.co/deepseek-ai/DeepSeek-V3-Base
12/24/2024 QVQ: 72B visual reasoning model released: https://qwenlm.github.io/blog/qvq-72b-preview
12/24/2024 Infinity 2B, bitwise autoregressive text-to-image model: https://hf.co/FoundationVision/Infinity
12/20/2024 RWKV-7 released: https://hf.co/BlinkDL/rwkv-7-world
12/19/2024 Finally, a Replacement for BERT: https://hf.co/blog/modernbert
12/18/2024 Bamba-9B, hybrid model trained by IBM, Princeton, CMU, and UIUC on open data: https://hf.co/blog/bamba
12/18/2024 Apollo unreleased: https://github.com/Apollo-LMMs/Apollo
12/18/2024 Granite 3.1 released: https://hf.co/ibm-granite/granite-3.1-8b-instruct
12/17/2024 Falcon3 models released, including b1.58 quants: https://hf.co/blog/falcon3
12/16/2024 Apollo: Qwen2.5 models finetuned by Meta GenAI for video understanding: https://hf.co/Apollo-LMMs/Apollo-7B-t32
12/15/2024 CosyVoice2-0.5B released: https://funaudiollm.github.io/cosyvoice2
12/14/2024 Qwen2VL support merged: https://github.com/ggerganov/llama.cpp/pull/10361
12/13/2024 Sberbank releases Russian model based on DeepseekForCausalLM: https://hf.co/ai-sage/GigaChat-20B-A3B-instruct
12/13/2024 DeepSeek-VL2/-Small/-Tiny release. MoE vision models with 4.5B/2.8B/1.0B active parameters: https://hf.co/deepseek-ai/deepseek-vl2
12/13/2024 Cohere releases Command-R7B: https://cohere.com/blog/command-r7b
12/12/2024 QRWKV6-32B-Instruct preview releases, a linear model converted from Qwen2.5-32B-Instruct: https://hf.co/recursal/QRWKV6-32B-Instruct-Preview-v0.1
12/12/2024 LoRA training for HunyuanVideo: https://github.com/tdrussell/diffusion-pipe
12/10/2024 HF decides not to limit public storage: https://hf.co/posts/julien-c/388331843225875
12/10/2024 Upgraded version of DeepSeek-V2.5: https://hf.co/deepseek-ai/DeepSeek-V2.5-1210
12/09/2024 LG releases EXAONE-3.5: https://hf.co/LGAI-EXAONE/EXAONE-3.5-32B-Instruct
12/06/2024 Microsoft releases TRELLIS, a large 3D asset generation model: https://github.com/Microsoft/TRELLIS
12/06/2024 Qwen2-VL released: https://hf.co/Qwen/Qwen2-VL-72B
12/06/2024 InternVL2.5 released: https://hf.co/OpenGVLab/InternVL2_5-78B
12/06/2024 Meta releases Llama-3.3-70B-Instruct: https://hf.co/meta-llama/Llama-3.3-70B-Instruct
12/05/2024 PaliGemma 2: https://hf.co/collections/google/paligemma-2-release-67500e1e1dbfdd4dee27ba48
12/04/2024 Fish Speech V1.5 released: https://hf.co/fishaudio/fish-speech-1.5
12/03/2024 HunyuanVideo: 13B large video generation model released: https://hf.co/tencent/HunyuanVideo
12/02/2024 Nous trains a 15B model using DisTrO: https://distro.nousresearch.com
11/29/2024 INTELLECT-1 released: https://hf.co/PrimeIntellect/INTELLECT-1-Instruct
11/27/2024 Qwen2.5-32B-Instruct reflection tune: https://qwenlm.github.io/blog/qwq-32b-preview
11/26/2024 OLMo 2 released: https://hf.co/collections/allenai/olmo-2-674117b93ab84e98afc72edc
11/26/2024 Anon re-implements Sparse Matrix Tuning paper: https://github.com/HeroMines/SMFT
11/25/2024 Qwen2VL integrated with Flux: https://github.com/erwold/qwen2vl-flux
11/25/2024 Speculative decoding added to llama-server: https://github.com/ggerganov/llama.cpp/pull/10455
11/22/2024 LTX-Video: Real-time video generation on a single 4090: https://github.com/Lightricks/LTX-Video
11/21/2024 Tülu3: Instruct finetunes on top of Llama 3.1 base: https://hf.co/collections/allenai/tulu-3-models-673b8e0dc3512e30e7dc54f5
11/20/2024 LLaMA-Mesh weights released: https://hf.co/Zhengyi/LLaMA-Mesh
11/18/2024 Mistral and Pixtral Large Instruct 2411 released: https://mistral.ai/news/pixtral-large
11/12/2024 Qwen2.5-Coder series released: https://qwenlm.github.io/blog/qwen2.5-coder-family
11/08/2024 Sarashina2-8x70B, a Japan-trained LLM model: https://hf.co/sbintuitions/sarashina2-8x70b
11/05/2024 Hunyuan-Large released with 389B and 52B active: https://hf.co/tencent/Tencent-Hunyuan-Large
10/31/2024 QTIP: Quantization with Trellises and Incoherence Processing: https://github.com/Cornell-RelaxML/qtip
10/31/2024 Fish Agent V0.1 3B: Voice-to-Voice and TTS model: https://hf.co/fishaudio/fish-agent-v0.1-3b
10/31/2024 Transluce open-sources AI investigation toolkit: https://github.com/TransluceAI/observatory
10/30/2024 TokenFormer models with fully attention-based architecture: https://hf.co/Haiyang-W/TokenFormer-1-5B
10/30/2024 MaskGCT: Zero-Shot TTS with Masked Generative Codec Transformer: https://hf.co/amphion/MaskGCT
10/25/2024 GLM-4-Voice: End-to-end speech and text model based on GLM-4-9B: https://hf.co/THUDM/glm-4-voice-9b
10/24/2024 Aya Expanse released with 23 supported languages: https://hf.co/CohereForAI/aya-expanse-32b
10/22/2024 genmoai-smol allows video inference on 24 GB RAM: https://github.com/victorchall/genmoai-smol
10/22/2024 Mochi-1: 10B Asymmetric Diffusion Transformer text-to-video model: https://hf.co/genmo/mochi-1-preview
10/22/2024 Pangea: Open-source multilingual multimodal LLM supporting 39 languages: https://neulab.github.io/Pangea
10/21/2024 IBM releases Granite 3.0: https://hf.co/collections/ibm-granite/granite-30-models-66fdb59bbb54785c3512114f
10/18/2024 New research, models, and datasets from Meta FAIR: https://ai.meta.com/blog/fair-news-segment-anything-2-1-meta-spirit-lm-layer-skip-salsa-lingua
10/18/2024 bitnet.cpp: Official inference framework for 1-bit LLMs: https://github.com/microsoft/BitNet
10/18/2024 DeepSeek releases Janus-1.3B with multimodal understanding and generation: https://hf.co/deepseek-ai/Janus-1.3B
10/16/2024 Ministral 8B instruct model released: https://mistral.ai/news/ministraux
10/15/2024 PLaMo-100B: English and Japanese base model: https://hf.co/pfnet/plamo-100b
10/15/2024 Llama-3.1-70B-Instruct customized by NVIDIA: https://hf.co/nvidia/Llama-3.1-Nemotron-70B-Instruct
10/14/2024 Llama 3.1 linearized: https://hf.co/collections/hazyresearch/lolcats-670ca4341699355b61238c37
10/14/2024 Zamba2-7B released: https://www.zyphra.com/post/zamba2-7b
10/14/2024 Ichigo, voice-to-voice model based on Llama 3.1, released: https://homebrew.ltd/blog/llama-learns-to-talk
10/12/2024 Fast multilingual TTS with voice cloning, based on flow matching with DiT: https://github.com/SWivid/F5-TTS
10/11/2024 14B cross-architecture distillation model: https://hf.co/arcee-ai/SuperNova-Medius
10/10/2024 Aria: 25.3B, 3.9B active, multimodal native MoE model with 64k context: https://hf.co/rhymes-ai/Aria
09/27/2024 Emu3, next-token prediction multimodal models: https://hf.co/collections/BAAI/emu3-66f4e64f70850ff358a2e60f
09/25/2024 Multimodal Llama 3.2 released: https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices
09/25/2024 Molmo: Multimodal models based on OLMo, OLMoE, and Qwen-72B: https://molmo.allenai.org/blog
09/24/2024 Llama-3.1-70B-instruct distilled to 51B: https://hf.co/nvidia/Llama-3_1-Nemotron-51B-Instruct
09/18/2024 Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5
09/18/2024 Llama 8B quantized to b1.58 through finetuning: https://hf.co/blog/1_58_llm_extreme_quantization
09/17/2024 Mistral releases new 22B with 128k context and function calling: https://mistral.ai/news/september-24-release
09/12/2024 DataGemma with DataCommons retrieval: https://blog.google/technology/ai/google-datagemma-ai-llm
09/12/2024 LLaMA-Omni: Multimodal LLM with seamless speech interaction: https://hf.co/ICTNLP/Llama-3.1-8B-Omni
09/11/2024 Fish Speech multilingual TTS with voice replication: https://hf.co/fishaudio/fish-speech-1.4
09/11/2024 Pixtral: 12B with image input vision adapter: https://xcancel.com/mistralai/status/1833758285167722836
09/11/2024 Solar Pro Preview, Phi-3-medium upscaled to 22B: https://hf.co/upstage/solar-pro-preview-instruct
09/06/2024 DeepSeek-V2.5 released, combines Chat and Instruct: https://hf.co/deepseek-ai/DeepSeek-V2.5
09/05/2024 FluxMusic: Text-to-Music Generation with Rectified Flow Transformer: https://github.com/feizc/fluxmusic
09/04/2024 Yi-Coder: 1.5B & 9B with 128K context and 52 programming languages: https://hf.co/blog/lorinma/yi-coder
09/04/2024 OLMoE 7x1B fully open source model release: https://hf.co/allenai/OLMoE-1B-7B-0924-Instruct
08/30/2024 Command models get an August refresh: https://docs.cohere.com/changelog/command-gets-refreshed
08/29/2024 Qwen2-VL 2B & 7B image+video models released: https://qwenlm.github.io/blog/qwen2-vl/
08/27/2024 CogVideoX-5B, diffusion transformer text-to-video model: https://hf.co/THUDM/CogVideoX-5b
08/22/2024 Jamba 1.5: 52B & 398B MoE: https://hf.co/collections/ai21labs/jamba-15-66c44befa474a917fcf55251
08/20/2024 Microsoft's Phi-3.5 released: mini+MoE+vision: https://hf.co/microsoft/Phi-3.5-MoE-instruct
08/16/2024 MiniCPM-V-2.6 support merged: https://github.com/ggerganov/llama.cpp/pull/8967
08/15/2024 Hermes 3 released, full finetunes of Llama 3.1 base models: https://hf.co/collections/NousResearch/hermes-3-66bd6c01399b14b08fe335ea
08/12/2024 Falcon Mamba 7B model from TII UAE: https://hf.co/tiiuae/falcon-mamba-7b
08/09/2024 Qwen large audio-input language models: https://hf.co/Qwen/Qwen2-Audio-7B-Instruct
08/07/2024 LG AI releases Korean bilingual model: https://hf.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct
08/05/2024 vLLM GGUF loading support merged: https://github.com/vllm-project/vllm/pull/5191
07/31/2024 Gemma 2 2B, ShieldGemma, and Gemma Scope: https://developers.googleblog.com/en/smaller-safer-more-transparent-advancing-responsible-ai-with-gemma
07/27/2024 Llama 3.1 rope scaling merged: https://github.com/ggerganov/llama.cpp/pull/8676
07/25/2024 BAAI & TeleAI release 1T parameter model: https://hf.co/CofeAI/Tele-FLM-1T
07/24/2024 Mistral Large 2 123B released: https://hf.co/mistralai/Mistral-Large-Instruct-2407
07/23/2024 Llama 3.1 officially released: https://ai.meta.com/blog/meta-llama-3-1/
07/22/2024 llamanon leaks 405B base model: https://files.catbox.moe/d88djr.torrent >>101516633
07/18/2024 Improved DeepSeek-V2-Chat 236B: https://hf.co/deepseek-ai/DeepSeek-V2-Chat-0628
07/18/2024 Mistral NeMo 12B base & instruct with 128k context: https://mistral.ai/news/mistral-nemo/
07/16/2024 Codestral Mamba, tested up to 256k context: https://hf.co/mistralai/mamba-codestral-7B-v0.1
07/16/2024 MathΣtral Instruct based on Mistral 7B: https://hf.co/mistralai/mathstral-7B-v0.1
07/13/2024 Llama 3 405B coming July 23rd: https://x.com/steph_palazzolo/status/1811791968600576271
07/09/2024 Anole, based on Chameleon, for interleaved image-text generation: https://hf.co/GAIR/Anole-7b-v0.1
07/07/2024 Support for glm3 and glm4 merged into llama.cpp: https://github.com/ggerganov/llama.cpp/pull/8031
07/02/2024 Japanese LLaMA-based model pre-trained on 2T tokens: https://hf.co/cyberagent/calm3-22b-chat
06/28/2024 Inference support for Gemma 2 merged: https://github.com/ggerganov/llama.cpp/pull/8156
06/27/2024 Meta announces LLM Compiler, based on Code Llama, for code optimization and disassembly: https://go.fb.me/tdd3dw
06/27/2024 Gemma 2 released: https://hf.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315
06/25/2024 Cambrian-1: Collection of vision-centric multimodal LLMs: https://cambrian-mllm.github.io
06/23/2024 Support for BitnetForCausalLM merged: https://github.com/ggerganov/llama.cpp/pull/7931
06/18/2024 Meta Research releases multimodal 34B, audio, and multi-token prediction models: https://ai.meta.com/blog/meta-fair-research-new-releases
06/17/2024 DeepSeekCoder-V2 released with 236B & 16B MoEs: https://github.com/deepseek-ai/DeepSeek-Coder-V2
06/14/2024 Nemotron-4-340B: Dense model designed for synthetic data generation: https://hf.co/nvidia/Nemotron-4-340B-Instruct
06/14/2024 Nvidia collection of Mamba-2-based research models: https://hf.co/collections/nvidia/ssms-666a362c5c3bb7e4a6bcfb9c
06/11/2024 Google releases RecurrentGemma, based on a hybrid RNN architecture: https://hf.co/google/recurrentgemma-9b-it
06/06/2024 Qwen2 releases, with better benchmarks than Llama 3: https://qwenlm.github.io/blog/qwen2/
06/01/2024 KV cache quantization support merged: https://github.com/ggerganov/llama.cpp/pull/7527
05/31/2024 K2: Fully-reproducible model outperforming Llama 2 70B using 35% less compute: https://hf.co/LLM360/K2
05/29/2024 Mistral releases Codestral-22B: https://mistral.ai/news/codestral/
05/28/2024 DeepSeek-V2 support officially merged: https://github.com/ggerganov/llama.cpp/pull/7519
05/24/2024 Draft PR adds support for Jamba: https://github.com/ggerganov/llama.cpp/pull/7531
05/23/2024 Cohere releases 8B & 35B Aya 23 with multilingual capabilities: https://hf.co/collections/CohereForAI/c4ai-aya-23-664f4cda3fa1a30553b221dc
05/22/2024 Mistral v0.3 models with function calling and extended vocab: https://github.com/mistralai/mistral-inference#model-download
05/21/2024 Fork of llama.cpp adds DeepSeek-V2 support: https://hf.co/leafspark/DeepSeek-V2-Chat-GGUF
05/21/2024 Microsoft launches Phi-3 small (7B) and medium (14B) under MIT: https://aka.ms/phi3-hf
05/16/2024 DeepSeek AI releases 16B V2-Lite: https://hf.co/deepseek-ai/DeepSeek-V2-Lite-Chat
05/14/2024 PaliGemma, Gemma 2, and LLM Comparator: https://developers.googleblog.com/gemma-family-and-toolkit-expansion-io-2024
05/12/2024 Yi-1.5 Released with Improved Coding, Math, and Reasoning Capabilities: https://hf.co/collections/01-ai/yi-15-2024-05-663f3ecab5f815a3eaca7ca8
05/11/2024 Japanese 13B model trained on CPU supercomputer: https://hf.co/Fugaku-LLM/Fugaku-LLM-13B
05/11/2024 OneBit: Towards Extremely Low-bit LLMs: https://github.com/xuyuzhuang11/OneBit
05/10/2024 Gemma 2B - 10M Context: https://hf.co/mustafaaljadery/gemma-2B-10M
05/08/2024 Refuel LLM-2 for data labeling, enrichment, and cleaning: https://hf.co/refuelai/Llama-3-Refueled
05/08/2024 OpenAI releases AI Specification: https://cdn.openai.com/spec/model-spec-2024-05-08.html
05/06/2024 IBM releases Granite Code Models: https://github.com/ibm-granite/granite-code-models
05/02/2024 Nvidia releases Llama3-ChatQA-1.5, excels at QA & RAG: https://chatqa-project.github.io/
05/01/2024 KAN: Kolmogorov-Arnold Networks: https://arxiv.org/abs/2404.19756
05/01/2024 Orthogonalized Llama-3-8b: https://hf.co/hjhj3168/Llama-3-8b-Orthogonalized-exl2
04/27/2024 Refusal in LLMs is mediated by a single direction: https://alignmentforum.org/posts/jGuXSZgv6qfdhMCuJ
04/24/2024 Snowflake Arctic Instruct 128x3B MoE released: https://hf.co/Snowflake/snowflake-arctic-instruct
04/23/2024 Phi-3 Mini model released: https://hf.co/microsoft/Phi-3-mini-128k-instruct-onnx
04/21/2024 Llama3 70B pruned to 42B parameters: https://hf.co/chargoddard/llama3-42b-v0
04/18/2024 Llama3 8B, 70B pretrained and instruction-tuned models released: https://llama.meta.com/llama3/
04/17/2024 Mixtral-8x22B-Instruct-v0.1 released: https://mistral.ai/news/mixtral-8x22b/
04/15/2024 Microsoft AI unreleases WizardLM 2: https://web.archive.org/web/20240415221214/https://wizardlm.github.io/WizardLM2/
04/15/2024 Microsoft AI releases WizardLM 2, including Mixtral 8x22B finetune: https://wizardlm.github.io/WizardLM2/
04/09/2024 Mistral releases Mixtral-8x22B: https://twitter.com/MistralAI/status/1777869263778291896
04/09/2024 Llama 3 coming in the next month: https://techcrunch.com/2024/04/09/meta-confirms-that-its-llama-3-open-source-llm-is-coming-in-the-next-month/
04/08/2024 StableLM 2 12B released https://huggingface.co/stabilityai/stablelm-2-12b
04/05/2024 Qwen1.5-32B released with GQA: https://huggingface.co/Qwen/Qwen1.5-32B
04/04/2024 Command R+ released with GQA, 104B, 128K context: https://huggingface.co/CohereForAI/c4ai-command-r-plus
03/28/2024 MiniGemini: Dense and MoE vision models: https://github.com/dvlab-research/MiniGemini
03/28/2024 Jamba 52B MoE released with 256k context: https://huggingface.co/ai21labs/Jamba-v0.1
03/27/2024 Databricks releases 132B MoE model: https://huggingface.co/collections/databricks/dbrx-6601c0852a0cdd3c59f71962
03/23/2024 Mistral releases 7B v0.2 base model with 32k context: https://models.mistralcdn.com/mistral-7b-v0-2/mistral-7B-v0.2.tar
03/23/2024 Grok support merged: https://github.com/ggerganov/llama.cpp/pull/6204
03/17/2024 xAI open sources Grok: https://github.com/xai-org/grok
03/15/2024 Control vector support in llamacpp: https://github.com/ggerganov/llama.cpp/pull/5970
03/11/2024 New 35B RAG model with 128K context: https://huggingface.co/CohereForAI/c4ai-command-r-v01
03/11/2024 This week, xAI will open source Grok: https://twitter.com/elonmusk/status/1767108624038449405
02/28/2024 The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits https://arxiv.org/abs/2402.17764
02/27/2024 Mistral readds the notice to their website https://www.reddit.com/r/LocalLLaMA/comments/1b18817/mistral_changing_and_then_reversing_website/
02/26/2024 Mistral partners with Microsoft, removes mentions of open models from website https://siliconangle.com/2024/02/26/now-microsoft-partner-mistral-ai-challenges-openai-three-new-llms/
02/21/2024 Google releases two open models, Gemma https://blog.google/technology/developers/gemma-open-models/
02/18/2024 1.5bit quant for lcpp merged https://github.com/ggerganov/llama.cpp/pull/5453
02/17/2024 Kobold.cpp-1.58 prebuilt released https://github.com/LostRuins/koboldcpp/releases/tag/v1.58
02/16/2024 Exl2 added Qwen support https://github.com/turboderp/exllamav2/issues/334
02/16/2024 ZLUDA for lcpp merged, however, outlook is questionable at best https://github.com/vosen/ZLUDA/pull/102
06/23/2023 Ooba's preset arena results and SuperHOT 16k prototype releases
06/22/2023 Vicuna 33B (preview), OpenLLaMA 7B scaled and MPT 30B released
06/20/2023 SuperHOT Prototype 2 w/ 8K context released >>94191797
06/18/2023 Minotaur 15B 8K, WizardLM 7B Uncensored v1.0 and Vicuna 1.3 released
06/17/2023 exllama support merged into ooba; API server rewrite merged into llama.cpp
06/16/2023 OpenLlama 13B released
06/16/2023 Airoboros GPT-4 v1.2 released
06/16/2023 Robin-33B-V2 released
06/16/2023 Dan's 30B Personality Engine LoRA released
06/14/2023 WizardCoder 15B Released
06/14/2023 CUDA full GPU acceleration merged in llama.cpp
06/10/2023 First Landmark Attention models released >>93993800 (Cross-thread)
06/08/2023 Openllama 3B and 7B released
06/07/2023 StarCoderPlus / StarChat-β released
06/07/2023 chronos-33b released
06/06/2023 RedPajama 7B released + Instruct&Chat
06/06/2023 WizardLM 30B v1.0 released
06/05/2023 k-quantization released for llama.cpp
06/03/2023 Nous-Hermes-13b released
06/03/2023 WizardLM-Uncensored-Falcon-40b released
05/27/2023 FalconLM release Falcon-7B & 40B, new foundational models
05/26/2023 BluemoonRP 30B 4K released
05/25/2023 QLoRA and 4bit bitsandbytes released
05/23/2023 exllama transformer rewrite offers around x2 t/s increases for GPU models
05/22/2023 SuperHOT 13B prototype & WizardLM Uncensored 30B released
05/22/2023 SuperHOT 13B prototype & WizardLM Uncensored 30B released
05/19/2023 RTX 30 series 15% performance gains, quantization breaking changes again >>93536523
05/19/2023 PygmalionAI release 13B Pyg & Meth
05/18/2023 VicunaUnlocked-30B released
05/14/2023 llama.cpp quantization change breaks current Q4 & Q5 models, must be quantized again
05/13/2023 llama.cpp GPU acceleration has been merged onto master >>93403996 >>93404319
05/10/2023 GPU-accelerated token generation >>93334002
05/06/2023 MPT 7B, 65k context model trained on 1T tokens: https://huggingface.co/mosaicml/mpt-7b-storywriter
05/05/2023 GPT4-x-AlpacaDente2-30b. https://huggingface.co/Aeala/GPT4-x-AlpacaDente2-30b
05/04/2023 Allegedly leaked document from Google, fretting over Open Source LLM's. https://www.semianalysis.com/p/google-we-have-no-moat-and-neither
05/04/2023 StarCoder, a 15.5B parameter models trained on 80+ programming languages: https://huggingface.co/bigcode/starcoderbase
04/30/2023 Uncucked Vicuna 13B released: https://huggingface.co/reeducator/vicuna-13b-free
04/30/2023 PygmalionAI release two 7B LLaMA-based models: https://huggingface.co/PygmalionAI
04/29/2023 GPT4 X Alpasta 30B Merge: https://huggingface.co/MetaIX/GPT4-X-Alpasta-30b-4bit
04/25/2023 Proxy script for Tavern via Kobold/webui, increases LLaMA: output quality https://github.com/anon998/simple-proxy-for-tavern
04/23/2023 OASS 30B released & quantized: https://huggingface.co/MetaIX/OpenAssistant-Llama-30b-4bit
04/22/2023 SuperCOT LoRA (by kaiokendev), merged by helpful anons: https://huggingface.co/tsumeone/llama-30b-supercot-4bit-128g-cuda https://huggingface.co/ausboss/llama-13b-supercot-4bit-128g
04/22/2023 OASS "releases" XORs again, deletes them soon after... again
04/21/2023 StableLM models performing terribly, are apparently broken: https://github.com/Stability-AI/StableLM/issues/30
Edit
Pub: 08 Apr 2024 00:22 UTC
Edit: 20 Feb 2025 08:42 UTC
Views: 8225