Open Source LLMs

Introduction

Open source Large Language Models (LLMs) have democratized access to advanced AI capabilities, enabling researchers, developers, and organizations to leverage, customize, and deploy state-of-the-art language models without the limitations of proprietary alternatives. This document provides an overview of key open source LLMs, their capabilities, and deployment considerations.

Major Open Source Models

LLaMA Family (Meta AI)

LLaMA 2

Released: July 2023
Sizes: 7B, 13B, 70B parameters
Context: 4K tokens (extended versions available)
License: Custom permissive for research and commercial use
Notable features: Chat-tuned variants, improved safety compared to LLaMA 1

LLaMA 3

Released: April 2024
Sizes: 8B, 70B parameters (as of initial release)
Context: 8K tokens standard
License: Similar to LLaMA 2 with some restrictions
Notable features: Improved reasoning, reduced hallucinations, stronger multilingual abilities

Mistral AI Models

Mistral 7B

Released: September 2023
Size: 7B parameters
Context: 8K tokens
License: Apache 2.0
Notable features: Strong performance despite smaller size, sliding window attention

Mixtral 8x7B

Released: December 2023
Architecture: Mixture of Experts (8 experts, 7B each)
Context: 32K tokens
License: Apache 2.0
Notable features: Sparse MoE architecture, multilingual capabilities, strong reasoning

Falcon (Technology Innovation Institute)

Released: March/May 2023
Sizes: 1B, 7B, 40B, 180B parameters
Context: 2K-8K tokens
License: Custom permissive license (TII Falcon License)
Training: Trained on 1.5 trillion tokens of RefinedWeb and other datasets

BLOOM (BigScience)

Released: July 2022
Sizes: 560M to 176B parameters
Context: 2K tokens
License: RAIL (Responsible AI License)
Notable features: Built by a collaborative open science effort, 46 languages supported

Specialized Open Source Models

Code Models

CodeLlama: Specialized for code generation and understanding
StarCoder/StarCoderBase: Trained on permissively licensed source code
DeepSeek Coder: Optimized for code completion and generation

Multimodal Models

LLaVA: Vision-language model combining LLaMA with visual encoders
BLIP-2: Bootstrapping Language-Image Pre-training for unified vision-language understanding

Small Efficient Models

TinyLlama: 1.1B parameter model trained on 3 trillion tokens
Phi-2: 2.7B parameter model with strong reasoning capabilities
Gemma: Google's lightweight open models (2B and 7B variants)

Deployment Considerations

Quantization

Many open source models can be quantized to reduce their memory footprint:

GGUF Format: Used by llama.cpp, supports various quantization levels (4-bit, 5-bit, 8-bit)
GPTQ: Post-training quantization method for transformer models
AWQ: Activation-aware weight quantization

Frameworks and Tools

Popular frameworks for deploying open source LLMs:

Hugging Face Transformers: Python library for working with pretrained models
llama.cpp: C/C++ inference engine for efficient deployment
vLLM: High-throughput and memory-efficient inference engine
text-generation-inference: Optimized serving for Hugging Face models

Fine-tuning and Adaptation

Techniques

LoRA: Low-Rank Adaptation for efficient fine-tuning
QLoRA: Quantized Low-Rank Adaptation for memory-efficient fine-tuning
PEFT: Parameter-Efficient Fine-Tuning methods

Datasets

Popular datasets for fine-tuning open source models:

OpenOrca: High-quality synthetic conversations aligned with GPT-4
Alpaca: Stanford's dataset for instruction-following
LIMA: Meta's "Less Is More for Alignment" dataset

References

Meta AI. (2023). LLaMA 2: Open Foundation and Fine-Tuned Chat Models. https://ai.meta.com/llama/
Mistral AI. (2023). Mistral 7B. https://mistral.ai/news/announcing-mistral-7b/
TII. (2023). Falcon LLM. https://falconllm.tii.ae/
BigScience Workshop. (2022). BLOOM. https://bigscience.huggingface.co/blog/bloom

PreviousModels NextDeepseek

Last updated 6 days ago