AI chatbot service has been opening up the mainstream market for AI services. But problems seem to exist with considerably higher operating costs and substantially longer service latency. As the generative AI model size continued to increase, memory intensive function takes up most of the service operation. That’s why even latest GPU system does not provide sufficient performance and energy efficiency. To resolve it, we are introducing shorter latency and operating cost effective generative AI accelerator using AiM (SK hynix’s PIM) We’d like to introduce how to reduce service latency and decrease energy consumption through AiM, as well as explain the architecture of AiMX, an accelerator using AiM. Please come and see for yourself that AiM is no longer a future technology, but can be deployed to the existing system right now.
Euicheol Lim
Eui-cheol Lim is a Research Fellow and leader of Solution Advanced Technology team in SK Hynix. He received the B.S. degree and the M.S. degree from Yonsei University, Seoul, Korea, in 1993 and 1995, and the Ph.D. degree from Sungkyunkwan University, suwon, Korea in 2006. Dr.Lim joined SK Hynix in 2016 as a system architect in memory system R&D. Before joining SK Hynix, he had been working as an SoC architect in Samsung Electronics and leading the architecture of most Exynos mobile SoC. His recent interesting points are memory and storage system architecture with new media memory and new memory solution such as CXL memory and Processing in Memory. In particular, he is proposing a new computing architecture based on PIM, which is more efficient and flexible than existing AI accelerators, to process generative AI and LLM (large language Model) that is currently causing a sensation.