Image generation, editing, upscaling, and creative production tools.
πΌ An enterprise-grade Next.js boilerplate for high-performance, maintainable apps. Packed with features like Tailwind CSS, TypeScript, ESLint, Prettier, testing.
Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source, speech, and multimodal models on cloud, on-prem, or your laptop β all.
AI Image
Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and agent workflows with explicit.
AI Image
Trench β Open-Source Analytics Infrastructure. A single production-ready Docker image built on ClickHouse, Kafka, and Node.js for tracking events. Easily build product.
AI Image
Open-source LLM/VLM load balancer and serving platform for self-hosting LLMs (and VLMs) at scale ππ¦ Alternative to projects like llm-d, Docker Model Runner, etc but.
π€ Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and.
AI-agent Skill for generating polished HTML slide decks: editorial magazine and Swiss layouts, image prompts, social covers, and a WebGL/low-power presentation runtime.
AI Image
π₯ Java enterprise application development framework for full scenario: Restrained, Efficient, Open, Ecologicalll!!! 700% higher concurrency 50% memory savings Startup.
StarVector is a foundation model for SVG generation that transforms vectorization into a code generation task. Using a vision-language modeling architecture,.
Data-Juicer: The Data Operating System for the Foundation Model Era Multimodal | Cloud-Native | AI-Ready | Large-Scale Data-Juicer (DJ) transforms raw data chaos into.
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language,.
AI Image
LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.
AI Image
True on-device AI for Kotlin Multiplatform (Android, iOS, Desktop, JVM, WASM). LLM, Speech-to-Text and Image Generation β powered by llama.cpp, whisper.cpp and.
AI Image
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention
AI Image
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. GPT-4o
AI Image
AI Research assistant plugin for Zotero 9. Chat with your library, run federated scholarly search, RAG, OCR, systematic reviews, and manage cloud storage. Includes.
AI Image
Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.
AI Image
Deeplake is AI Data Runtime for Agents. It provides serverless postgres with a multimodal datalake, enabling scalable retrieval and training.
AI Image
MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.
SeaTunnel is a multimodal, high-performance, distributed, massive data integration tool.
AI Image
π€ Type-safe, provider-agnostic TypeScript AI SDK for streaming chat, tool calling, agents, and multimodal apps across OpenAI, Anthropic, Gemini, React, Vue, Svelte,.
MineContext is your proactive context-aware AI partnerοΌContext-Engineering+ChatGPT PulseοΌ
AI Image
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
SimpleMem: Efficient Lifelong Memory for LLM Agents β Text & Multimodal
[CVPR 2023] DepGraph: Towards Any Structural Pruning; LLMs, Vision Foundation Models, etc.
AI Image
A fast multimodal LLM for real-time voice
AI Image
Structured data extraction, instruction calling and agentic workflows with ML, LLM and Vision LLM
Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI, specializing in vision-language reasoning.
ConardLi's open-source Skills collection, featuring web design, knowledge retrieval, image generation, and more.
Official Repo for "TheoremExplainAgent: Towards Video-based Multimodal Explanations for LLM Theorem Understanding" [ACL 2025 oral]
clawdcursor compiles whatever's on screen into one UI map β accessibility tree and OCR fused into stable, addressable elements, with a screenshot only when needed β.
AI Image
π» ChatGPT+ AI | One click access to your own ChatGPT+Many AI web services
AI Image
RESTai is an AIaaS (AI as a Service) open-source platform. Supports many public and local LLM suported by Ollama/vLLM/etc. Precise embeddings usage, tuning, analytics.
AI Image
Fine-tune LLMs on your Mac with Apple Silicon. SFT, DPO, GRPO, Vision, TTS, STT, Embedding, and OCR fine-tuning β natively on MLX. Unsloth-compatible API.
AI Image
Janus-Series: Unified Multimodal Understanding and Generation Models
AI Image
Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model
AI Image
AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.
80+ free AI services for chat, image, video, voice & APIs (may sometimes include access to lead gen ai models for free)
AI Image
SGLang is a high-performance serving framework for large language models and multimodal models.
This repository contains examples for customers to get started using the Amazon Bedrock Service. This contains examples for all available foundational models
AI Image
One delightful Ruby framework for every major AI provider. Build AI agents, chatbots, RAG apps, and multimodal workflows in beautiful, expressive code.
AI Image
Turn your PC, Mac, or Linux box into an AI server. LLM inference, chat UI, voice, agents, workflows, RAG, and image generation.
On-device AI SDK for Flutter β LLM inference, vision, STT, TTS, image generation, embeddings, RAG, and function calling. Metal GPU on iOS/macOS.
Recipes for learning, fine-tuning, and adapting ColPali to your multimodal RAG use cases. π¨π»βπ³
AI Image
Official Model Studio CLIοΌ CLIοΌbuilt for AI Agent frameworks, exposing models, search, multimodal, and workflow capabilities as structured tool calls.
AI Image
TeleMem is a high-performance drop-in replacement for Mem0, featuring semantic deduplication, long-term dialogue memory, and multimodal video reasoning.
A web-based tool for visualizing and exploring artifacts from Microsoft's GraphRAG.
A collection of the the best ML and AI news every week (research, news, resources)
AI Image
π Snappy's unique approach unifies vision-language late interaction with structured OCR for region-level knowledge retrieval. Like the project Drop a star! β
AI Image
A multimodal AI agent for geospatial data analysis and interactive visualization