^ROSE

LLM API for local experimentation

Features

OpenAI-Compatible API - Core endpoints for chat, embeddings, and file management
Local Model Inference - Hugging Face Transformers + PyTorch, GPU-accelerated
Fine-Tuning - LoRA-based pipeline with checkpointing and monitoring
Vector Storage - Integrated ChromaDB for embeddings
Embeddings - Multi-model support with caching
Assistants API - Basic thread/message support with function calling
Responses API - Stateless chat endpoint with optional storage
Streaming Support - SSE for real-time completions