CMU Course Reviews - 2nd Semester
Here are my thoughts on the courses I took or audited during my second semester at CMU. On Grading at CMU I received straight A’s again this semester. However, I’ve noticed some peculiar aspects of CMU’s grading system. Graduate students in the Carnegie Institute of Technology (College of Engineering) cannot receive A+ grades in transcript even if they scored A+ in a class not offered by CIT, capping their GPA at 4.0. Since an A+ is worth 4.33 on the GPA scale, graduate students in other colleges can offset an A- with an A+ to maintain a 4.0, whereas engineering students cannot. This policy likely stems from CIT being CMU’s oldest and original college, preserving a traditional grading system without A+ grades. ...
Parallel Binomial Option Pricing
This project implements a high-performance parallel pricing engine for American options using the Binomial Options Pricing Model (BOPM), engineered to scale efficiently from multi-core CPUs to GPU-accelerated clusters. By leveraging OpenMP, CUDA and MPI, it addresses the algorithm’s sequential bottlenecks through a diverse set of optimization strategies. More details at: https://github.com/l1-ca0/parallel-binomial-option-pricing
Last Day of Fall Semester
Today is the last — and coldest — day of the semester. I ended the term by giving a recitation for 10‑703 on applying reinforcement learning to diffusion models.
15618 Project Proposal - Parallel Option Pricing
Lock-free Programming is Hard
Lock-free programming has this magical aura around it. If you’ve ever heard of lock-free programming, you’ve probably seen those neat little Compare-And-Swap (CAS) loops that seem to solve everything. I found a bug in a CAS loop that had been sitting quietly in CMU’s 15-418 Parallel Computer Architecture lecture slide for years. The “Simple” Example Here’s what the example in lecture slide looked like: // atomic compare and swap int atomicCAS(int* addr, int compare, int val) { int old = *addr; *addr = (old == compare) ? val : old; return old; } // build atomic max using CAS void atomic_max(int* addr, int x) { int old = *addr; int new = max(old, x); while (atomicCAS(addr, old, new) != old) { old = *addr; new = max(old, x); } } The idea is: ...
Birds can fly -- LLM Edition
Why LLMs Give Confusing True/False Answers Ask an AI “Birds can fly, true or false?” and an AI might initially say “True,” only to concede “False” after a bit more probing. What’s happening here? LLMs don’t “know” facts like humans do. They’re pattern-matching systems that predict the most statistically probable response based on their training data. When they see “birds can fly,” they recognize this phrase appears far more often than “birds cannot fly” in human text, so they lean toward “True.” ...
Dive into FlashAttention-3
The rise of large language models has made the “attention” mechanism a cornerstone of modern AI. But attention is computationally and memory-intensive, often becoming a bottleneck. Enter FlashAttention, a groundbreaking algorithm designed to accelerate this crucial step. While newer variants (e.g., FlashAttention-4 targeting Nvidia’s Blackwell architecture) are appearing, FlashAttention-3 on the Hopper (H100) platform represents an important step in GPU‑aware attention kernels. This post dissects the combination of algorithmic and hardware‑aware techniques reported by the authors (fused kernels, tiling, and hardware-assisted data movement). ...
Three Ingredients of Happiness
Naval Ravikant once distilled happiness down to its essence: “Happiness is a choice. It’s a choice between three things: where you live, who you are with, and what you do.” It took me years to realize how profound this insight was. When I think about the times I’ve felt most fulfilled versus completely stuck, it almost always comes down to these three elements. Where You Live Environment shapes us more than we realize. It’s not just physical space, but the entire ecosystem we inhabit daily. ...
What is the Full Stack Equivalent of Systems Programming?
Web development has popularized the concept of a “full stack developer”—someone who is comfortable working on every part of an application, from the user-facing front-end to the server-side back-end and the database it connects to, with a holistic view of the entire web stack. But what is the equivalent of “full stack” in the world of systems programming? Redefining the “Stack” To answer this, we need to first identify what constitutes the “stack” in systems programming: ...
Finetuning LLM for Text-to-SQL generation
I just completed a project that lets people ask database questions in plain English and get back proper SQL queries using a fine-tuned large language model. For the base model, I chose Mistral-7B-v3 and fine-tuned it specifically for SQL generation. Using QLoRA for efficient training, I was able to train the 7-billion parameter model on a single consumer-grade GPU (Nvidia Tesla P100) in around 3 hours. The resulting model performs well on common SQL patterns like filtering, joins, and aggregations, effectively handling the majority of real-world database queries. That said, it can be less accurate for complex subqueries or really intricate nested queries due to the limitations of the Mistral-7B model —- a larger model would handle these cases better, but this was a tradeoff between performance and computational requirements. ...
CMU Course Reviews - 1st Semester
Looking back on my first semester at CMU, I wanted to share my thoughts and experiences with the courses I took. This might be helpful for future students planning their schedules. This semester I took 4 courses. The workload was intense but manageable with good time management. Here’s my breakdown: 18-613: Introduction to Computer Systems While the famous CSAPP course has been extensively reviewed and is taught at many universities, here’s my personal perspective on the CMU experience. ...
Curiosity is (Almost) All You Need
The landscape of learning has been fundamentally transformed. In an era where Large Language Models can generate code and explain complex concepts, the traditional barriers to learning have largely disappeared. What remains—and what has become more important than ever—is curiosity. The Great Democratization Not too long ago, learning new technologies or skills required: Access to expensive courses or textbooks Mentorship from experienced practitioners Trial and error through countless hours of debugging Physical presence in classrooms or labs Today, anyone with internet access can have a conversation with an AI that knows more about programming, mathematics, science, and virtually any field than most human experts. The means of learning are no longer the bottleneck—curiosity and the drive to learn are. ...