Everyone’s talking about DeepSeek and what their R1 model means for competitors.
DeepSeek’s R1 model outperformed OpenAI’s o1 on multiple reasoning benchmarks. The Chinese company, which develops open-source large language models, is challenging established AI companies like OpenAI and sparking pricing wars both domestically and abroad by offering high-performance APIs at a much lower cost than competitors.
People are surprised that new architectural improvements reducing AI inference costs came out of China, where AI companies are more known for commercial applications than hardcore research. But DeepSeek isn’t like most Chinese companies.
While most Chinese companies are known for 1-10 applications as opposed to 0-to-1 innovation, DeepSeek is laser-focused on research, working in innovative architectures including Multi-head Latent Attention (MLA) and Mixture-of-Experts (DeepSeekMoE).
(MLA is a computational technique that dramatically reduces memory usage in AI models by compressing complex information into more compact, efficient representations. It allows larger models to run faster and more smoothly. DeepSeekMoE is a specialized version of the MoE approach, where specialized sub-networks or "experts" handle specific subsets of input data, instead of using the entire network all the time, to create models that are more economical to run and train.)
Their mission isn’t some grandiose claim to transform humanity, but to “unravel the mystery of AGI with curiosity.”They care much less about making money than about building Artificial General Intelligence. (Ironically, unlike other large-model startups burning money, DeepSeek has been profitable since last year.)And their talent strategy is one reason they got to where they are today.
Founder and CEO Liang Wenfeng intentionally structured DeepSeek to succeed as an innovation contributor instead of just a follower.
In a recent interview, he shared: “What we lack in innovation is definitely not capital, but a lack of confidence and knowledge of how to organize high-density talent for effective innovation.”
Here are the talent principles he’s employed at DeepSeek to craft a culture of research innovation:
When recruiting, DeepSeek values curiosity and technical capability over traditional experience and an extensive work background.
This means they employ many younger folks from top universities, who have less industry experience but are eager to build and dive into their interests. They also emphasize recruiting from non-CS backgrounds, to help their models do things like generate poetry and answer questions on China’s notoriously difficult college entrance exams.
From CEO Liang: “Our hiring standard has always been passion and curiosity. Many of our team members have unusual experiences, and that is very interesting.”
“When ChatGPT came out, the tech community in China lacked confidence in frontier innovation. From investors to big tech, they all thought that the gap was too big and opted to focus on applications instead. But innovation starts with confidence, which we often see more from young people.”
The novel implementation of architectural designs underlying the R1 model actually stemmed from the personal interest of a young researcher. He proposed an alternative to mainstream attention mechanisms, and DeepSeek formed a team specifically for his idea, spending months developing it.
Liang believes disruptive technology will never be a long-term moat strategy. Competitors will eventually catch up — a true competitive advantage is a team that’s capable of innovation.
“Even OpenAI’s closed source approach can’t prevent others from catching up. So we anchor our value in our team — our colleagues grow through this process, accumulate know-how, and form an organization and culture capable of innovation. That’s our moat.”
At DeepSeek, building an innovation-first org starts with a loose, flexible management style that eschews hierarchies. Instead, the company focuses on empowering highly driven, curious people to execute their new ideas. While this organizational style is popular among early-stage Bay Area startups, it’s less common in East Asia — and even in the Bay, it’s rare to see a combination of bottom-up initiatives combined with efficient top-down resource allocation.
“DeepSeek is still entirely bottom-up. We generally don’t predefine roles; instead, the division of labor occurs naturally. Everyone has their own unique journey, and they bring ideas with them, so there’s no need to push anyone. While we explore, if someone sees a problem, they will naturally discuss it with someone else. However, if an idea shows potential, we do allocate resources top-down.”
Their natural division of labor model fosters exploration and collaboration, supporting innovation by encourage the cross-pollination of ideas.
“Anyone on the team can access GPUs or people at any time. If someone has an idea, they can access the training cluster cards anytime without approval. Similarly, since we don’t have hierarchies or separate departments, people can collaborate across teams, as long as there’s mutual interest.”
Liang acknowledges the “prevailing belief that Americans excel at 0-to-1 technical innovation, while Chinese excel at 1-to-10 application innovation.” 0-to-1 innovation is not as profitable compared to developing practical applications of those innovations, due to significant time and economic costs with uncertain returns.
“[A]fter, all, a new generation of models will inevitably emerge after a few months, so Chinese companies need only follow along and focus on downstream applications.”
But Liang and DeepSeek have focused on innovating at the architectural level, something that has rarely been done, even at larger, more established companies.
Unlike other major Chinese large-model startups, DeepSeek is only focused on research and technology, instead of monetizing commercial applications. Employees are encouraged to pursue technical innovation without worrying about revenue and profitability — Liang has said that his employees’ “desire to do research often comes before making money.”
Because DeepSeek is fully funded by its parent company, quant hedge fund High-Flyer, money is not a concern.Deepseek doesn’t fundraise, and CEO Liang Wenfeng rarely speaks to the public. Still, their employer brand remains strong among top builders and researchers working on LLMs in China.
“Top talents are most drawn to solving the world’s toughest challenges. In fact, top talents in China are underestimated because there’s so little hardcore innovation happening at the societal level, leaving them unrecognized. We’re addressing the hardest problems, which makes us inherently attractive to them.”
While there may not have been as much “hardcore innovation” in China compared to places like Silicon Valley, Liang is bullish on what DeepSeek’s achievements mean for everyone else working on the most difficult problems.
“Once society allows people dedicated to hardcore innovation to achieve fame and fortune, then our collective mindset will adapt.”
DeepSeek's approach is already paying off, not only with the R1 model. They're building something more valuable than any single innovation: an organization that can consistently generate them.
The next wave of AI breakthroughs might not come from Silicon Valley – but they won't come from Chinese companies copying Silicon Valley either.
They'll come from organizations like DeepSeek, whose leaders are not only pioneering AI research but also changing the global AI talent landscape at large.
Speak with our team to learn more about how Paraform can help you fill your difficult positions