Ai | Li Cao's Blog

Birds can fly -- LLM Edition

Why LLMs Give Confusing True/False Answers Ask an AI “Birds can fly, true or false?” and an AI might initially say “True,” only to concede “False” after a bit more probing. What’s happening here? LLMs don’t “know” facts like humans do. They’re pattern-matching systems that predict the most statistically probable response based on their training data. When they see “birds can fly,” they recognize this phrase appears far more often than “birds cannot fly” in human text, so they lean toward “True.” ...

The 80-Year-Old Architecture Holding Back AI

I came across an IBM Research post recently (https://research.ibm.com/blog/why-von-neumann-architecture-is-impeding-the-power-of-ai-computing ). It turns out one of the biggest things holding back AI computing isn’t some exotic new problem—it’s a design choice from 1945. The culprit? The von Neumann architecture that’s powered nearly every computer. The Problem: A Traffic Jam Inside Your Computer Picture a brilliant chef (the processor) and a massive pantry (memory) separated by a narrow hallway. For each step of a recipe, the chef must walk down the hallway, grab a single ingredient, and walk back. This round trip is repeated thousands of times. That’s essentially what happens when AI models run on traditional computers. ...

Finetuning LLM for Text-to-SQL generation

I just completed a project that lets people ask database questions in plain English and get back proper SQL queries using a fine-tuned large language model. For the base model, I chose Mistral-7B-v3 and fine-tuned it specifically for SQL generation. Using QLoRA for efficient training, I was able to train the 7-billion parameter model on a single consumer-grade GPU (Nvidia Tesla P100) in around 3 hours. The resulting model performs well on common SQL patterns like filtering, joins, and aggregations, effectively handling the majority of real-world database queries. That said, it can be less accurate for complex subqueries or really intricate nested queries due to the limitations of the Mistral-7B model —- a larger model would handle these cases better, but this was a tradeoff between performance and computational requirements. ...

Curiosity is (Almost) All You Need

The landscape of learning has been fundamentally transformed. In an era where Large Language Models can generate code and explain complex concepts, the traditional barriers to learning have largely disappeared. What remains—and what has become more important than ever—is curiosity. The Great Democratization Not too long ago, learning new technologies or skills required: Access to expensive courses or textbooks Mentorship from experienced practitioners Trial and error through countless hours of debugging Physical presence in classrooms or labs Today, anyone with internet access can have a conversation with an AI that knows more about programming, mathematics, science, and virtually any field than most human experts. The means of learning are no longer the bottleneck—curiosity and the drive to learn are. ...