[AI Hardware & Systems Design Track]: Cloud Resiliency in the Age of High-Performance Computing | Kisaco Research

As the era of high-performance computing (HPC) and artificial intelligence (AI) ushers in unprecedented advancements, the reliance on cloud strategies becomes vital. As cloud infrastructure becomes increasingly integral to supporting demanding computational workloads, maintaining the availability and robustness of these systems becomes paramount.

This panel will delve into the critical intersection of HPC/AI and cloud technology, spotlighting strategies for ensuring uninterrupted operations in the face of emerging challenges. The session brings together leading experts to examine architectural design paradigms that foster robustness, redundancy trade-offs, load balancing, and intelligent fault detection and predictive monitoring mechanisms. Experts will share insights on best practices for optimizing resource allocation, orchestrating seamless workload migrations, and deploying resilient cloud-native solutions. By exploring real-world cases, emerging trends, and practical insights, this discussion aims to equip data center and cloud professionals with insights to elevate their resiliency strategies amidst evolving computational demands.

Sponsor(s): 
proteanTecs
Speaker(s): 
Moderator

Author:

Alam Akbar

Director, Product Marketing
proteanTecs

Alam Akbar is a veteran of the semiconductor industry with experience spanning multiple engineering, product management, and product marketing roles. He holds a Bachelors of Science degree in Electrical Engineering from Texas A&M,  and an MBA from Santa Clara University.

 

Alam began his career at Synopsys as an Application Consultant where he helped grow their market share in the signoff domain. He then joined the business management team at Cadence where he helped launch a new physical verification solution. After Cadence, Alam joined  Intel Foundry services as a design kit program manager, and then moved into the client compute group as director of product marketing. There, he helped scale Intel's storage business, and developed product strategy for new memory solutions for the PC market.

At ProteanTecs, he's part of a team that’s bringing greater insight into the health and performance of semiconductors across the value chain, from the design stage to in field operation, and all the steps in the middle. 

Alam Akbar

Director, Product Marketing
proteanTecs

Alam Akbar is a veteran of the semiconductor industry with experience spanning multiple engineering, product management, and product marketing roles. He holds a Bachelors of Science degree in Electrical Engineering from Texas A&M,  and an MBA from Santa Clara University.

 

Alam began his career at Synopsys as an Application Consultant where he helped grow their market share in the signoff domain. He then joined the business management team at Cadence where he helped launch a new physical verification solution. After Cadence, Alam joined  Intel Foundry services as a design kit program manager, and then moved into the client compute group as director of product marketing. There, he helped scale Intel's storage business, and developed product strategy for new memory solutions for the PC market.

At ProteanTecs, he's part of a team that’s bringing greater insight into the health and performance of semiconductors across the value chain, from the design stage to in field operation, and all the steps in the middle. 

Panellists

Author:

Venkat Ramesh

Hardware Systems Engineer
Meta

Venkat Ramesh is a Hardware Systems Engineer in Meta's Infrastructure Org. 

 

As a Technical Lead in the Release-to-Production team, Venkat has been at the helm of pivotal initiatives aimed at bringing various AI/ML Accelerator, Compute and Storage platforms into the Meta fleet. His multifaceted technical background spans roles across software development, performance engineering, NPI and hardware health telemetry across hyper-scalers and hardware providers.

 

Deeply passionate about the topic of AI hardware resiliency, Venkat's current focus is on building tools and methodologies to enhance hardware reliability, performance and efficiencies for the rapidly evolving AI workloads and technologies.

Venkat Ramesh

Hardware Systems Engineer
Meta

Venkat Ramesh is a Hardware Systems Engineer in Meta's Infrastructure Org. 

 

As a Technical Lead in the Release-to-Production team, Venkat has been at the helm of pivotal initiatives aimed at bringing various AI/ML Accelerator, Compute and Storage platforms into the Meta fleet. His multifaceted technical background spans roles across software development, performance engineering, NPI and hardware health telemetry across hyper-scalers and hardware providers.

 

Deeply passionate about the topic of AI hardware resiliency, Venkat's current focus is on building tools and methodologies to enhance hardware reliability, performance and efficiencies for the rapidly evolving AI workloads and technologies.

Author:

Yun Jin

Engineering Director
Meta

Yun Jin currently works as Engineering Director of Infrastructure in Meta Inc where he leads the Meta's strategy of private cloud capacity and efficiency. Before Meta, Yun has been engineering leadership roles for PPLive, Alibaba Cloud, and Microsoft. Yun has worked on large scale distributed systems, cloud and big data area for 20 years.

Yun Jin

Engineering Director
Meta

Yun Jin currently works as Engineering Director of Infrastructure in Meta Inc where he leads the Meta's strategy of private cloud capacity and efficiency. Before Meta, Yun has been engineering leadership roles for PPLive, Alibaba Cloud, and Microsoft. Yun has worked on large scale distributed systems, cloud and big data area for 20 years.

Author:

Paolo Faraboschi

Vice President and HPE Fellow; Director, AI Research Lab
Hewlett Packard Labs, HPE

Paolo Faraboschi is a Vice President and HPE Fellow and directs the Artificial Intelligence Research Lab at Hewlett Packard Labs. Paolo has been at HP/HPE for three decades, and worked on a broad range of technologies, from embedded printer processors to exascale supercomputers. He previously led exascale computing research (2017-2020), and the hardware architecture of “The Machine” project (2014-2016), pioneered low-energy servers with HP’s project Moonshot (2010-2014), drove scalable system-level simulation research (2004-2009), and was the principal architect of a family of embedded VLIW cores (1994-2003), widely used in video SoCs and HP’s printers. Paolo is an IEEE Fellow (2014) for “contributions to embedded processor architecture and system-on-chip technology”, author of over 100 publications, 70 granted patents, and the book “Embedded Computing: a VLIW approach”. He received a Ph.D. in EECS from the University of Genoa, Italy.

Paolo Faraboschi

Vice President and HPE Fellow; Director, AI Research Lab
Hewlett Packard Labs, HPE

Paolo Faraboschi is a Vice President and HPE Fellow and directs the Artificial Intelligence Research Lab at Hewlett Packard Labs. Paolo has been at HP/HPE for three decades, and worked on a broad range of technologies, from embedded printer processors to exascale supercomputers. He previously led exascale computing research (2017-2020), and the hardware architecture of “The Machine” project (2014-2016), pioneered low-energy servers with HP’s project Moonshot (2010-2014), drove scalable system-level simulation research (2004-2009), and was the principal architect of a family of embedded VLIW cores (1994-2003), widely used in video SoCs and HP’s printers. Paolo is an IEEE Fellow (2014) for “contributions to embedded processor architecture and system-on-chip technology”, author of over 100 publications, 70 granted patents, and the book “Embedded Computing: a VLIW approach”. He received a Ph.D. in EECS from the University of Genoa, Italy.