Decoding How AI Works
We are an open AI research lab focused on advancing AI systems. Our vision is not just to optimize AI systems, but to remove the critical bottlenecks that prevent widespread adoption.
Our researchers and contributors come from leading companies and universities worldwide, bringing diverse expertise in AI research.
SK Group
Samsung
LG Group
Hyundai
Lotte Corp
Trillion Labs
Naver Corp
MIT
IIT Kanpur
KAIST
Seoul National University
Yonsei University
Korea University
POSTECH
SK Group
Samsung
LG Group
Hyundai
Lotte Corp
Trillion Labs
Naver Corp
MIT
IIT Kanpur
KAIST
Seoul National University
Yonsei University
Korea University
POSTECH
What We Do
We work on AI advancement across the full stack—from GPU orchestration to vertical AI applications. Our research spans inference optimization, generative AI systems, and specialized deep learning projects across multiple domains.
GPU Orchestration
Efficient resource management and scheduling for distributed GPU workloads and multi-tenant environments.
Inference Optimization
High-throughput serving infrastructure, KV cache optimization, quantization, and efficient attention mechanisms.
Generative AI
LLM serving, multimodal systems, and production-grade GenAI infrastructure for real-world applications.
Agentic AI
Multi-agent systems, workflow orchestration, and efficient serving for complex autonomous AI applications.
Vertical AI Solutions
Domain-specific AI applications in quantitative finance, cybersecurity, and computer vision systems.
Deep Learning Research
Novel architectures, training techniques, and fundamental research advancing the state of deep learning.
Current Projects - v0 Batch
PyTorch native INT8 quantization API for TorchAO
ActiveA quantized tensor subclass enabling INT8 inference for neural networks through seamless PyTorch integration. Supports dynamic activation quantization (INT8×INT8) and weight-only quantization (FP16/BF16×INT8) with optimized kernels for CPU and CUDA. Reduces memory footprint by up to 4× while maintaining model accuracy. Custom CUDA/Triton kernel development and comprehensive benchmarking against Hugging Face and vLLM baselines in progress.
Comparative Analysis of LLM Serving Frameworks
ActiveA comprehensive benchmarking study comparing vLLM, SGLang, and HuggingFace TGI across diverse workload patterns including agentic workflows, long-context processing, and high-throughput scenarios. The research investigates how architectural differences—such as SGLang's RadixAttention versus vLLM's PagedAttention—impact performance metrics (TTFT, TPOT, throughput) under varying conditions.
First Batch Started
We're looking for AI researchers, developers, and explorers to join our first batch. Work completely remote on open source projects that push the boundaries of AI.
Whether you're working on your own project, building a startup, or want to contribute to our research, we connect you with like-minded people and contributors who share your passion.
Apply Now →Join a global community where ideas meet execution.
Our Labs
AI Inference Lab - ModuLabs South Korea (35+ Members)
Bringing together a world-class team of researchers and engineers from Samsung, Google, Trillion Labs, ETRI, Seoul National University, KAIST, and other leading technology firms and universities, AI Inference Lab combines deep expertise in large language models, system optimization, and production infrastructure.
We decode how large language models think, making them smarter. We're not chasing AGI but enabling AI in daily life. We focus on solving the real-world challenges of deploying AI at scale, making this technology achievable for every enterprise, whether on cloud, on-premises, or at the edge.
Want to join, send your profile to daniel@aerlabs.tech
Learn More →AI Inference Lab - India
We're building a community for AI innovators, researchers, and builders in India.
Perfect for:
- Students and researchers working on AI projects or research papers
- Developers contributing to open source AI projects
- Entrepreneurs building AI startups and need technical guidance or support
- Industry professionals exploring AI research alongside their current role
You don't need to join full-time or leave your current position. We collaborate flexibly, whether you need mentorship, want to co-research, or are looking for technical partners for your startup.
Reach out to shubham@aerlabs.tech
Knowledge Sharing
We host regular technical discussions and deep-dive sessions on AI inference, system optimization, and LLM deployment. Our community brings together researchers and engineers to share knowledge and advance the field together.
Speaker Series
Join our speaker series featuring industry experts from NVIDIA, AMD, ByteDance, and leading research institutions.
View upcoming events →Technical Blog
All sessions are documented as in-depth technical articles covering LLM inference, optimization techniques, and system design.
Explore our blog →Interested in our research? Want to collaborate with our researchers to optimize your AI systems? Connect with us at daniel@aerlabs.tech or shubham@aerlabs.tech