bc

projects

Things I have built

Personal website — Next.js 14 frontend, FastAPI backend, Postgres on Neon. Production-grade architecture used as a startup practice run.

GenLMcompleted

Genomic language model pretrained from scratch on 13 Drosophila species genomes — 12-layer dilated CNN with a custom character-level tokenizer, trained to predict variant effects across the genome.

Async data curation tool that scrapes every NeurIPS accepted paper since 1987 — metadata, abstracts, and PDFs downloaded concurrently using asyncio and aiohttp.

Local RAGcompleted

Barebones local retrieval-augmented generation pipeline — PDFs in, answers out, with no API calls leaving the machine. Uses Ollama, ChromaDB, and LangChain.

ChottaLLMcompleted

Small-scale LLM built from scratch in PyTorch — character-level tokenizer, transformer architecture, trained on custom text corpora.

GPT-style autoregressive language model implemented in PyTorch — full training loop, attention, positional encoding, and character-level generation.

Distributed Data Parallel training implemented from first principles in PyTorch — gradient synchronization, process groups, and multi-GPU scaling without using the DDP wrapper.