Chinese AI company DeepSeek on Wednesday unveiled DeepSeek-R1, which reportedly matches OpenAI’s latest model, o1.
It is a reasoning model, meaning it fact-checks itself before churning out a result. Consequently, it avoids some of the mistakes that AI models usually make.
R1 can analyze tasks, plan, and perform consecutive actions to arrive at an answer. However, the process can take 10 seconds or more to finish.
What are DeepSeek R1’s features?
The available version at the time of writing is the DeepSeek-R1-Lite-Preview. Despite being a preview model, it matches o1’s performance on the AIME and MATH benchmarks.
TechCrunch says AIME uses other AI models to evaluate a model’s performance. On the other hand, MATH uses word problems.
READ: OpenAI o1 is the first ‘reasoning’ ChatGPT model
Despite its quality, R1 still has flaws present in other models. For example, some X (Twitter) commenters say it struggles with logic problems.
Moreover, people can easily jailbreak the system, meaning they can give specific commands to remove its limits.
For example, one X user tricked R1 into providing a detailed recipe for methamphetamine or meth.
DeepSeek-R1 also avoids questions that seem politically sensitive. TechCrunch found that it doesn’t answer questions regarding Chinese President Xi Jinping, Tiananmen Square, and China’s invasion of Taiwan.
These limits are likely due to the Chinese government’s internet regulation, which ensures responses “embody core socialist values.”
READ: Meta and OpenAI to launch AI models with ‘reasoning’ skills
Nowadays, more companies are focusing on reasoning models as the latest large language models aren’t improving as dramatically as before.
Consequently, companies have adopted a different approach, such as developing reasoning models. These models require extra processing time to complete tasks.
“We are seeing the emergence of a new scaling law,” Microsoft CEO Satya Nadella said during a keynote at Microsoft’s Ignite conference.
TechCrunch says DeepSeek will release R1 as an open-source program and its designated API.