[Efficient ML Inference Track]: Scaling Recommendation Systems Infrastructure: Overcoming Memory & Latency Challenges | Kisaco Research

Real-time personalized recommendations (RTPRec) have become increasingly prevalent in the digital realm, particularly as more users have become accustomed to using mobile apps and consuming larger amounts of digital data, videos, and engaging in e-commerce activities online following the Covid-19 pandemic.

DL-based recommender is known for its superior accuracy in handling unstructured data, often referred to as embeddings. This characteristic makes them ideal candidates for personalized recommendations. However, it's important to note that DL-based models can involve an extensive number of parameters, ranging into the billions or even trillions, which can pose significant challenges when real-time processing is crucial.

To address this challenge, various strategies such as inference optimization, model compression, and the utilization of hardware accelerators have been introduced to enhance performance and meet the stringent latency requirements of real-time applications. Additionally, this session will delve into accelerator-based distributed systems, offering insights into memory management and performance scalability from an infrastructure perspective.

Sponsor(s): 
Neuchips
Speaker(s): 

Author:

CL Chen

COO
NEUCHIPS

CL Chen is an accomplished leader in the IC design industry with a remarkable career spanning over 27 years including CTO, AMTC Corp., Programme Manager TSMC, Director at Global Unichip Corp which is a TSMC subsidiary public company specialized in SOC design service. His wealth of experience and expertise has contributed significantly to the growth and innovation of the field.

As the Chief Operating Officer of NEUCHIPS, he continues to drive excellence and foster partnerships within the industry. He possesses a wealth of experience in domain specific inferencing accelerator, particularly within the burgeoning field of e-commerce. His insights and contributions have enhanced the customer experience within Taiwan's e-commerce sector, underscoring his commitment to leveraging technology for real-world impact. At NEUCHIPS, CL's role as COO signifies his dedication to advancing the company's operations, growth, and strategic partnerships. His extensive network within the industry, coupled with his proven track record of connecting eco partners, has been instrumental in propelling NEUCHIPS to new heights.

CL Chen

COO
NEUCHIPS

CL Chen is an accomplished leader in the IC design industry with a remarkable career spanning over 27 years including CTO, AMTC Corp., Programme Manager TSMC, Director at Global Unichip Corp which is a TSMC subsidiary public company specialized in SOC design service. His wealth of experience and expertise has contributed significantly to the growth and innovation of the field.

As the Chief Operating Officer of NEUCHIPS, he continues to drive excellence and foster partnerships within the industry. He possesses a wealth of experience in domain specific inferencing accelerator, particularly within the burgeoning field of e-commerce. His insights and contributions have enhanced the customer experience within Taiwan's e-commerce sector, underscoring his commitment to leveraging technology for real-world impact. At NEUCHIPS, CL's role as COO signifies his dedication to advancing the company's operations, growth, and strategic partnerships. His extensive network within the industry, coupled with his proven track record of connecting eco partners, has been instrumental in propelling NEUCHIPS to new heights.

Author:

Puja Das

Senior Director, Personalization
Warner Bros. Entertainment

Dr. Puja Das, leads the Personalization team at Warner Brothers Discovery (WBD) which includes offerings on Max, HBO, Discovery+ and many more.

Prior to WBD, she led a team of Applied ML researchers at Apple, who focused on building large scale recommendation systems to serve personalized content on the App Store, Arcade and Apple Books. Her areas of expertise include user modeling, content modeling, recommendation systems, multi-task learning, sequential learning and online convex optimization. She also led the Ads prediction team at Twitter (now X), where she focused on relevance modeling to improve App Ads personalization and monetization across all of Twitter surfaces.

She obtained her Ph.D from University of Minnesota in Machine Learning, where the focus of her dissertation was online learning algorithms, which work on streaming data. Her dissertation was the recipient of the prestigious IBM Ph D. Fellowship Award.

She is active in the research community and part of the program committee at ML and recommendation system conferences. Shas mentored several undergrad and grad students and participated in various round table discussions through Grace Hopper Conference, Women in Machine Learning Program colocated with NeurIPS, AAAI and Computing Research Association- Women’s chapter.

Puja Das

Senior Director, Personalization
Warner Bros. Entertainment

Dr. Puja Das, leads the Personalization team at Warner Brothers Discovery (WBD) which includes offerings on Max, HBO, Discovery+ and many more.

Prior to WBD, she led a team of Applied ML researchers at Apple, who focused on building large scale recommendation systems to serve personalized content on the App Store, Arcade and Apple Books. Her areas of expertise include user modeling, content modeling, recommendation systems, multi-task learning, sequential learning and online convex optimization. She also led the Ads prediction team at Twitter (now X), where she focused on relevance modeling to improve App Ads personalization and monetization across all of Twitter surfaces.

She obtained her Ph.D from University of Minnesota in Machine Learning, where the focus of her dissertation was online learning algorithms, which work on streaming data. Her dissertation was the recipient of the prestigious IBM Ph D. Fellowship Award.

She is active in the research community and part of the program committee at ML and recommendation system conferences. Shas mentored several undergrad and grad students and participated in various round table discussions through Grace Hopper Conference, Women in Machine Learning Program colocated with NeurIPS, AAAI and Computing Research Association- Women’s chapter.

Author:

Xinghai Hu

Head of US Algorithm
TikTok

Xinghai Hu is currently the head of TikTok US recommendation team. His team works on responsible recommendation system, improving general safety and trustability of content recommendations.

Xinghai Hu

Head of US Algorithm
TikTok

Xinghai Hu is currently the head of TikTok US recommendation team. His team works on responsible recommendation system, improving general safety and trustability of content recommendations.

Author:

Anlu Xing

Senior Data Scientist
Meta

Anlu Xing is a Senior Research Scientist/Machine Learning Engineer at Meta, working on LLM applications for business product (GenAI for Monetization)
and leading projects on the company's top priority product -- Short-form video (reels) recommendation, ranking and creator relevance.

Anlu Xing

Senior Data Scientist
Meta

Anlu Xing is a Senior Research Scientist/Machine Learning Engineer at Meta, working on LLM applications for business product (GenAI for Monetization)
and leading projects on the company's top priority product -- Short-form video (reels) recommendation, ranking and creator relevance.