We at Positron set out to build a cost-effective alternative to NVIDIA for LLM inference, and after 12 months, our Florida-based head of sales made our first sale. He taught us the value of chasing our largest competitive advantages, across industries and around the globe. We also managed to build a FPGA-based hardware-and-software inference platform capable of serving monolithic and mixture-of-experts models at very competitive token rates. It wasn't easy, because the LLM landscape changes meaningfully every two weeks. Yet today we have customers both evaluating and in production, with both our physical servers and our hosted cloud service. We'll share a few of the hairy workarounds and engineering heroics that achieved equivalence with NVIDIA so quickly, and tamed the complexity of building a dedicated LLM computer from FPGAs.
Barrett Woodside
In developer-oriented, marketing, and product roles, Barrett spent the past decade of his career working on AI inference, first at NVIDIA, running and profiling computer vision workloads on Jetson. After three years shoehorning models onto embedded systems powering drones, robots, and surveillance systems, he joined Google Cloud where he first-hand experienced the incredible power of Transformer models running accurate translation workloads on third-generation TPUs. He helped launch Cloud AutoML Vision with Fei-Fei Li and announced the TPU Pod's first entry into the MLPerf benchmark. Most recently, he spent two years at Scale AI working on product strategy and go-to-market for Scale Spellbook, its first LLM inference and fine tuning product. Today, he is Positron's co-founder and VP of Product.