A small artificial intelligence laboratory in China, DeepSeek, made headlines recently by disclosing the technical details of its advanced language model, turning its founder, Liang Wenfeng, into a prominent figure in China’s quest for high-tech progress amid U.S. restrictions.
Founded by Liang, who previously managed a hedge fund, DeepSeek introduced its R1 model, offering a comprehensive guide on constructing a large language model on a limited budget that possesses the capability to learn and enhance itself autonomously.
While U.S. companies such as OpenAI and Google DeepMind have been at the forefront of AI advancements in reasoning models, DeepSeek’s release generated significant interest and debate among U.S. tech firms regarding their ability to maintain their competitive edge.
Liang has emerged as a symbol of national pride in China, being the sole AI leader invited to a significant meeting with Li Qiang, the countryโs second-highest official. During this gathering, entrepreneurs were urged to focus on breaking through key technological barriers.
His venture into AI began in 2021 when he procured numerous Nvidia graphics processing units for his AI project while managing a quantitative trading firm. Initially, industry observers viewed his actions as a billionaire’s hobby.
A business partner recounted their initial impressions of Liang, describing him as an unassuming figure with ambitious plans that lacked clarity. There was skepticism about his ability to realize what seemed possible only for larger corporations like ByteDance and Alibaba.
Liangโs outsider perspective in the AI sector became a unique asset. His work at High-Flyer involved leveraging AI and algorithms to predict stock movements, training his team to optimize the use of Nvidia chips for trading. In 2023, he officially established DeepSeek with an ambition to create AI comparable to human intelligence.
One rival firm founder noted Liang’s ability to assemble a capable infrastructure team well-versed in chip functionalities, transferring top talent from his hedge fund to DeepSeek.
Amid U.S. bans on exporting advanced Nvidia chips to China, Liangโs team has effectively tackled the challenge of maximizing the performance of localized chips, employing innovative strategies known to them.
Industry experts highlight DeepSeekโs focused approach on research, which could position it as a formidable competitor. The company has opted not to pursue external funding or to monetize its models aggressively. Observers liken its operations to the early days of DeepMind, emphasizing a pure emphasis on research and engineering.
Liang actively participates in DeepSeek’s research endeavors and compensates his talent competitively. The company has gained a reputation for offering some of the highest salaries to AI engineers in China, attracting PhDs from leading Chinese universities.
DeepSeek has claimed to have trained its model using 2,048 Nvidia H800s with a budget of $5.6 million, significantly lower than the investments made by OpenAI and Google for similar projects.
Experts note that DeepSeek’s advancements highlight a crucial aspect of AI development: the potential for subsequent creators to achieve results with fewer resources, utilizing a larger pool of skilled engineers in China who excel in optimizing computing resources.
Although DeepSeek has demonstrated significant results with its limited resources, prospects for future competitiveness remain uncertain as the industry progresses. There are concerns regarding the performance of High-Flyer, DeepSeek’s financial backer, which experienced declining returns, potentially due to Liang’s predominant focus on DeepSeek.
U.S. competitors are rapidly advancing, establishing powerful computing clusters and forming partnerships to bolster their AI capabilities. Meanwhile, DeepSeek is recognized for its substantial computing infrastructure but may face challenges in sustaining this advantage in the future.
photo credit: www.ft.com