Portfolio · Vol. 01

Zhongquan
Cheng.

Software engineer and researcher building reliable backends and applied ML systems. Currently at Washington University in St. Louis — previously Syracuse, Shanghai, and a handful of hackathons in between.

§ Focus / An indexNine areas · 2026
i.Backend
ii.Applied ML
iii.Computer Vision
iv.Multi-Agent LLM
v.Distributed Systems
vi.Sensor Fusion
vii.Data Pipelines
viii.Research
ix.Full-Stack
§ 00 / IndexAbout

A habit of picking up the hard part of the stack — and then finishing it.

Zhongquan Cheng

Fig. 01 — St. Louis, 2025

M.S. Computer Science at Washington University in St. Louis, 3.7 GPA. I build LLM-powered backends and computer-vision pipelines — currently shipping an agentic chatbot at Undergraduation.com and leading a multi-agent surgical-action-recognition system at the AI for Health Institute. Previously: sensor-based HAR research at Syracuse, sales forecasting at Asia Pulp & Paper.

Location
St. Louis, Missouri
Phone
+1 (315) 317-5618
§ 01 / WorkExperience & Research

Positions held, in chronological order.

01Professional

Backend & AI Engineering Intern

Undergraduation.com · Remote

  • Designed a distributed request pipeline on Server-Sent Events + Vercel serverless functions, supporting 50+ concurrent users with zero blocking on long-running tool calls.
  • Built an agentic chatbot with LLM tool-calling that orchestrates multi-step workflows — database lookups, form filling, and third-party API calls — end-to-end.
  • Shipped a fault-tolerant LLM gateway with latency-aware A/B provider selection and automated failover, holding 99.9% uptime while optimizing token spend during outages.
  • Cut 40% of redundant Postgres queries with a composite-key cache; enforced defense-in-depth via Row-Level Security and API middleware scoping Pinecone searches to counselor_id for zero cross-tenant leakage.

Node.js / Supabase / PostgreSQL / Vercel / Pinecone / Redis / LangChain / LLM Tool-Calling

Sep 2025 — Present
02Research

Computer Vision Researcher

AI for Health Institute · WashU · St. Louis, MO

  • Built a medical-imaging privacy pipeline on Meta Sapiens + OpenCV — 16-class semantic body-part segmentation across 10,000+ frames — with dynamic exposure adjustment and temporal quality analysis, raising low-light detection rate by 80%.
  • Designed a LangGraph multi-agent sequence-error-detection system (Perception / Validator / State-Tracker / Supervisor) with a fast-path (vLLM 4B) + slow-path (vLLM 8B) dual loop for real-time surgical-action-compliance monitoring.
  • Served Qwen3-VL in a dual-model cascade via vLLM with FP8 quantization, PagedAttention, and prefix caching — a 16-thread parallel pipeline cut per-clip latency from 200 ms → 120 ms.
  • Adapted a Video-STaR-inspired staged training recipe (SFT → hard-negative mining → temporal extension → tool augmentation) on Qwen 27B + LoRA, lifting HAR accuracy from ~75% to 85.47% on 79,600 balanced samples.
  • Proposed DeAR — Decomposing Attention Head Roles — classifying ViT heads by entropy and injecting learnable attribute tokens + role-attention masks to strengthen weak-class recognition.
  • Added a YOLO ROI crop stage upstream of the VLM, removing ~70% of background noise and sharpening hand–object interaction recognition.
  • Accelerated video ingestion with FFmpeg-based high-speed frame extraction, cutting preprocessing latency by 60% across 50+ hours of clinical footage.
  • Shipped a FastAPI real-time dashboard with SSE streaming, a HITL review queue, and ThreadPoolExecutor concurrency — handling 12,000+-frame long videos end-to-end.

Python / PyTorch / vLLM / Qwen3-VL (4B / 8B / 27B) / LangGraph / LoRA / YOLO / Meta Sapiens / OpenCV / FFmpeg / FastAPI / Docker

Jan 2025 — Present
03Research

Research Assistant

Syracuse University · Salekin Lab · Syracuse, NY

  • Built a scalable Android data-collection framework using the Factory design pattern to decouple sensor implementations, so IMU + audio streams can be added without touching the collection core.
  • Containerized the ML training pipeline with Docker Compose, dropping environment setup from 2h → 5m and making runs reproducible across compute nodes.
  • Designed a hybrid CNN-LSTM network over high-frequency IMU time-series for human-activity / depression-biomarker detection, reaching 87% classification accuracy.

Android (Kotlin) / Docker Compose / CNN-LSTM / IMU Sensor Fusion / Signal Processing

Feb 2024 — Present
04Professional

Data Science Intern

Asia Pulp & Paper · Shanghai, China

  • Built a time-series sales forecasting engine with XGBoost, engineering lag features to capture seasonality — 92% prediction accuracy drove inventory planning for the Shanghai region.
  • Automated ETL against the data warehouse with CTE-rewritten, index-tuned SQL, cutting extraction latency by 40% for downstream reporting.
  • Replaced per-row loops with Pandas vectorization over high-dimensional transaction data to surface sales trends that informed production strategy.

Python / SQL / XGBoost / Scikit-learn / Pandas / ETL

May 2023 — Jun 2023
§ 02 / SelectedProjects

A short reading list of what I've built.

01

DeAR — Decomposing Attention Head Roles

ViT attention-head specialization · WashU

Classifies ViT attention heads by entropy and injects learnable attribute tokens plus role-specific attention masks, targeting weak-class recognition in surgical-action datasets. Core contribution of the HAR research line at the AI for Health Institute.

PyTorchViTLoRAHARResearch
2025
02

LangGraph Multi-Agent Error Detection

Fast-path / slow-path vLLM dual-loop

Perception / Validator / State-Tracker / Supervisor agents on LangGraph, running a vLLM 4B fast loop and a vLLM 8B slow loop in parallel for real-time surgical compliance monitoring with a FastAPI + SSE live dashboard.

LangGraphvLLMQwen3-VLFastAPIMulti-Agent
2025
03

Agentic Chatbot & LLM Gateway

Tool-calling backend · Undergraduation.com

Production SSE-based request pipeline with tool-calling agents for counselor workflows, a latency-aware A/B LLM gateway with failover, and RLS-scoped Pinecone retrieval for multi-tenant isolation.

Node.jsSupabasePineconeLLM Tool-Calling
2025
04

IMU Human-Activity Recognition

Android collection + CNN-LSTM · Syracuse

Factory-pattern Android framework streaming multi-modal IMU + audio for a depression-biomarker HAR study. A hybrid CNN-LSTM over high-frequency sensor windows hits 87% on the labeled cohort.

AndroidKotlinCNN-LSTMHARIMU
2024
05

Library VR Navigation

Unity · C# · 3-person team lead

A VR onboarding experience teaching students how to navigate library resources. Led a 3-person team; implemented shader effects, gesture recognition, and the interactive scene graph. Piloted with undergraduate library tours.

UnityC#VRGesture Recognition
2023
06

LLM Fine-Tuning — F-Lab

PEFT / LoRA on library corpus

Domain-adapted an LLM on a university-library corpus using PEFT/LoRA, getting useful iteration speed out of a constrained GPU budget.

PythonLoRAPEFTLLM
2023
§ 03 / StudiesEducation & Competencies

Where I studied,
and what I've learned to use.

Education

Washington University in St. Louis

M.S. in Computer Science

St. Louis, MO · Aug 2025 May 2027

GPA 3.7 / 4.0 · Advanced Algorithms, Distributed Systems, Applied ML, Computer Vision

Syracuse University

B.S. in Computer Science

Syracuse, NY · Sep 2021 May 2025

Competencies
Languages
PythonJavaScript / TypeScriptKotlinSQLJavaC / C++C#RHaskell
ML & Data
PyTorchvLLMLangGraphLangChainQwen3-VLLoRA / PEFTYOLOXGBoostScikit-learnPandas / NumPyMeta SapiensDPO / SFT
Frameworks
Node.jsReact / Next.jsFastAPIOpenCVFFmpegDockerUnityGit
Platforms
PostgreSQLSupabaseRedisPineconeVercel ServerlessAWSLinux
Awards & Leadership

Senior Workshop Lead

CuseHacks — Syracuse's annual hackathon · 2023 – 2025

Photography Division

Chinese Student Union · Syracuse · 2021 – 2025

Languages
EnglishFluent
Mandarin ChineseNative
§ 04 / ContactGet in touch

A good email beats a bad meeting.
Say hello.

Replies usually within 24h.