Decoding How AI Works
Our vision is not just to optimize AI systems, but to remove the critical bottlenecks that prevent widespread adoption.
What We Do
Our work is fundamentally about enabling efficient AI inference at scale. We are tackling the critical computational and memory bottlenecks that currently limit the widespread deployment of large language models.
Our approach is built on two core pillars:
Full-Stack Co-Design
We don't just look at the model; we analyze the entire system stack. By diving deep into the mechanics of operations like attention, we're building frameworks to co-optimize the algorithm and the underlying hardware pipeline. The goal is to move beyond the 'black box' and engineer for peak performance and minimal resource consumption.
Democratizing Deployment
Our ultimate vision is to make powerful AI accessible beyond large-scale cloud data centers. We are engineering solutions that allow these complex models to run efficiently across diverse hardware platforms—whether on-premises, in the cloud, or critically, at the edge. This is about reducing the cost and latency of inference to make AI a practical tool for every enterprise.
Current Projects
AER-Q: Hardware-Aware Foundation Model
ActiveA 20B parameter foundation model co-designed for ultra-efficient inference. Using gradient-based sensitivity analysis, we integrate quantization-awareness directly into pre-training, achieving 2-3x reduction in latency and memory footprint while maintaining SOTA performance.
AI Hardware Router
ActiveA self-contained edge computing architecture for on-device AI with complete data sovereignty. Features hierarchical inference with on-device SLM for low latency queries and cloud offloading with PII masking for complex tasks, pioneering privacy-first AI deployment.
Multi-Model Agentic Platform
ActiveHigh-throughput serving infrastructure for heterogeneous LLM/VLM mixtures. Hardware-aware architecture with tensor parallelism and speculative decoding, targeting sub-2s TTFT and 1000+ tokens/s throughput for complex multi-hop agentic workflows.
Our Labs
AI Inference Lab - ModuLabs South Korea (30+ Members)
Bringing together a world-class team of researchers and engineers from Samsung, Google, Trillion Labs, ETRI, Seoul National University, KAIST, and other leading technology firms and universities, AI Inference Lab combines deep expertise in large language models, system optimization, and production infrastructure.
We decode how large language models think, making them smarter. We're not chasing AGI but enabling AI in daily life. We focus on solving the real-world challenges of deploying AI at scale, making this technology achievable for every enterprise, whether on cloud, on-premises, or at the edge.
Want to join, send your profile to daniel@aerlabs.tech
Learn More →AI Inference Lab - India
We're building a community of researchers passionate about open source AI research in India.
If you're interested in working on cutting-edge open source AI research and contributing to the future of AI inference optimization, we'd love to hear from you.
Want to join, send your profile to shubham@aerlabs.tech
Knowledge Sharing
We host regular technical discussions and deep-dive sessions on AI inference, system optimization, and LLM deployment. Our community brings together researchers and engineers to share knowledge and advance the field together.
These sessions are documented as in-depth technical articles on our blog. Explore our blog →
Interested in our research? Connect with us at daniel@aerlabs.tech or shubham@aerlabs.tech