Memory Con 2025

Why Should AI Vendors Attend MemCon 2024?

We attract AI vendors from the likes of Aerospike, OpenAI, Microsoft Azure, Shopify, RunAI and more as they come together to:

Forge partnerships with systems vendors and technology partners.
Connect with cloud-vendors, end-user enterprises (banks, pharmas etc.) and investors.
Learn the implentations behind emerging technology.

If you'd like to find out more information about attending as an AI vendors, register your interest here

CONFIRM YOUR PLACE HERE

Featured Speakers Include

Author:

Zaid Kahn

VP, Cloud AI & Advanced Systems Engineering

Microsoft

Zaid is currently a VP in Microsoft’s Silicon, Cloud Hardware, and Infrastructure Engineering organization where he leads systems engineering and hardware development for Azure including AI systems and infrastructure. Zaid is part of the technical leadership team across Microsoft that sets AI hardware strategy for training and inference. Zaid's teams are also responsible for software and hardware engineering efforts developing specialized compute systems, FPGA network products and ASIC hardware accelerators.

Prior to Microsoft Zaid was head of infrastructure at LinkedIn where he was responsible for all aspects of architecture and engineering for Datacenters, Networking, Compute, Storage and Hardware. Zaid also led several software development teams focusing on building and managing infrastructure as code. This included zero touch provisioning, software-defined networking, network operating systems (SONiC, OpenSwitch), self-healing networks, backbone controller, software defined storage and distributed host-based firewalls. The network teams Zaid led built the global network for LinkedIn, including POP's, peering for edge services, IPv6 implementation, DWDM infrastructure and datacenter network fabric. The hardware and datacenter engineering teams Zaid led were responsible for water cooling to the racks, optical fiber infrastructure and open hardware development which was contributed to the Open Compute Project Foundation (OCP).

Zaid holds several patents in networking and is a sought-after keynote speaker at top tier conferences and events. Zaid is currently the chairperson for the OCP Foundation Board. He is also currently on the EECS External Advisory Board (EAB) at UC Berkeley and a board member of Internet Ecosystem Innovation Committee (IEIC), a global internet think tank promoting internet diversity. Zaid has a Bachelor of Science in Computer Science and Physics from the University of the South Pacific.

Author:

Petr Lapukhov

Network Engineer

NVIDIA

Petr Lapukhov is a Network Engineer at Meta. He has 20+ years in the networking industry, designing and operating large scale networks. He has a depth of experience in developing and operating software for network control and monitoring. His past experience includes CCIE/CCDE training and UNIX system administration.

Author:

Sandeep Singh

Director - Applied DL & Computer Vision

Beans.ai

Author:

Zaid Kahn

VP, Cloud AI & Advanced Systems Engineering

Microsoft

Author:

Petr Lapukhov

Network Engineer

NVIDIA

Author:

Sandeep Singh

Director - Applied DL & Computer Vision

Beans.ai

VIEW FULL SPEAKER LINE-UP

Agenda Highlights

Filter by:

All Topics

All Job Focuses

Opening Keynote: How Data and Workloads are Changing the Design of Systems, Clusters and Datacenters

Author:

Zaid Kahn

VP, Cloud AI & Advanced Systems Engineering

Microsoft

Memory Optimizations for Large Language Models: From Training to Inference

Large Language Models (LLMs) have revolutionized natural language processing but have posed significant challenges in training and inference due to their enormous memory requirements. In this talk, we delve into techniques and optimizations to mitigate memory constraints across the entire lifecycle of LLMs.

The first segment explores Memory Optimized LLM Training. We discuss Training challenges and cover different techniques under Parameter Efficient Fine Tuning (PEFT). like prompt tuning with LoRA, and adapters.

LLMs inference is more memory bound rather than compute bound, In this section we will explore inference optimizations mostly for transformer architectures like Paged Key-Value (KV) Cache, Speculative Decoding, Quantization, Inflight Batching strategies, Flash Attention, each contributing to enhanced inference speed and efficiency.

Finally, we explore the concept of Coherent Memory, and how it helps with Inference optimizations by KV Cache offloading and LoRA weight re-computation.

By illuminating these advancements, this talk aims to provide a comprehensive understanding of state-of-the-art memory optimization techniques for LLMs, empowering practitioners to push the boundaries of natural language processing further.

Author:

Arun Raman

Deep Learning Solutions Architect

NVIDIA

Arun Raman is an AI solution architect at NVIDIA, adept at navigating the intricate challenges of deploying AI applications across edge, cloud, and on-premises environments within the consumer Internet industry. In his current role, he works on the design of end-to-end accelerated AI pipelines, for consumer internet customers meticulously addressing preprocessing, training, and inference optimizations. His experience extends beyond AI, having worked with distributed systems and multi-cloud infrastructure. He shares practical strategies and real-world experiences, empowering organizations to leverage AI effectively.

How Deep Learning & Computer Vision Infrastructure Requires Application-Specific Infrastructure

Author:

Sandeep Singh

Director - Applied DL & Computer Vision

Beans.ai

VIEW THE FULL AGENDA

AI Vendors - MemCon

Memory Con

March 2025

Silicon Valley, CA

Why Should AI Vendors Attend MemCon 2024?

Featured Speakers Include

Author:

Zaid Kahn

Author:

Petr Lapukhov

Author:

Sandeep Singh

Author:

Zaid Kahn

Author:

Petr Lapukhov

Author:

Sandeep Singh

Agenda Highlights

Filter by:

Opening Keynote: How Data and Workloads are Changing the Design of Systems, Clusters and Datacenters

Author:

Zaid Kahn

Memory Optimizations for Large Language Models: From Training to Inference

Author:

Arun Raman

How Deep Learning & Computer Vision Infrastructure Requires Application-Specific Infrastructure

Author:

Sandeep Singh

AI Vendors - MemCon

Memory Con

1 March 2025

Silicon Valley, CA

Why Should AI Vendors Attend MemCon 2024?

Featured Speakers Include

Author:

Zaid Kahn

Author:

Petr Lapukhov

Author:

Sandeep Singh

Author:

Zaid Kahn

Author:

Petr Lapukhov

Author:

Sandeep Singh

Agenda Highlights

Filter by:

Opening Keynote: How Data and Workloads are Changing the Design of Systems, Clusters and Datacenters

Author:

Zaid Kahn

Memory Optimizations for Large Language Models: From Training to Inference

Author:

Arun Raman

How Deep Learning & Computer Vision Infrastructure Requires Application-Specific Infrastructure

Author:

Sandeep Singh

March 2025