Research Projects

Our v0 batch research initiatives focused on advancing AI inference optimization, quantization techniques, and efficient serving infrastructure.

Our First Research Cohort

The v0 batch represents our inaugural research cohort, bringing together talented researchers to tackle fundamental challenges in AI systems. Our focus areas include quantization techniques for efficient inference, comparative analysis of serving frameworks, and optimization strategies for production deployment. Each project combines rigorous experimentation with practical implementation, contributing both to academic understanding and open-source tooling.

Research Areas

  • Quantization & Compression: INT8 quantization techniques, memory optimization, and hardware-aware model compression
  • Serving Infrastructure: Comparative analysis of vLLM, SGLang, HuggingFace TGI across diverse workload patterns
  • Production Optimization: Kernel development, profiling methodologies, and real-world deployment strategies

Active Projects

PyTorch native INT8 quantization API for TorchAO

Active

A quantized tensor subclass enabling INT8 inference for neural networks through seamless PyTorch integration. Supports dynamic activation quantization (INT8×INT8) and weight-only quantization (FP16/BF16×INT8) with optimized kernels for CPU and CUDA. Reduces memory footprint by up to 4× while maintaining model accuracy. Custom CUDA/Triton kernel development and comprehensive benchmarking against Hugging Face and vLLM baselines in progress.

Quantization INT8 PyTorch TorchAO Inference Optimization

Comparative Analysis of LLM Serving Frameworks

Active

A comprehensive benchmarking study comparing vLLM, SGLang, and HuggingFace TGI across diverse workload patterns including agentic workflows, long-context processing, and high-throughput scenarios. The research investigates how architectural differences—such as SGLang's RadixAttention versus vLLM's PagedAttention—impact performance metrics (TTFT, TPOT, throughput) under varying conditions.

Benchmarking vLLM SGLang TGI Serving Infrastructure
By: Hyoseop Song
Detailed Report Coming Soon

Interested in collaborating on research? Contact us at daniel@aerlabs.tech or shubham@aerlabs.tech