^ROSE
LLM API for local experimentation
Features
- OpenAI-Compatible API - Core endpoints for chat, embeddings, and file management
- Local Model Inference - Hugging Face Transformers + PyTorch, GPU-accelerated
- Fine-Tuning - LoRA-based pipeline with checkpointing and monitoring
- Vector Storage - Integrated ChromaDB for embeddings
- Embeddings - Multi-model support with caching
- Assistants API - Basic thread/message support with function calling
- Responses API - Stateless chat endpoint with optional storage
- Streaming Support - SSE for real-time completions
Github