This is the real news story. It looks like they may have used Huawei Ascend 910C chips: https://nitter.net/teortaxesTex/status/2071708141037781407#m
This is the real news story. It looks like they may have used Huawei Ascend 910C chips: https://nitter.net/teortaxesTex/status/2071708141037781407#m
So if Huaewi Ascend 910C chips can be used for training a 1.6T model then within the nitter link mentioned, people are talking about how its not for inference.
but they might be forgetting that there are other companies targeting Inference much better, namely cerebras & https://chatjimmy.ai/ and I think that even OpenAI's recent model 5.6 sol is partnering up with Cerebras to provide 760 tokens/s and it was the first (most upvoted?) comment on Hackernews of that release on the technical side of things.
There is an increasingly more competition on Nvidia itself whether its from Google TPU's or other companies in general. I imagine that it doesn't look quite good for Nvidia.
With things like chatjimmy, I sometimes wonder if we might have built in chips with specialized models and maybe even a framework-esque modular framework to remove/upgrade chips. One can dream that it might happen in near future but There is enough money on the table that I imagine a lot of companies attempting to compete in this area so we as a user might get lots of options and cheaper pricing (hopefully sooner rather than later)
If I were Nvidia, I might be a bit worried and I imagine the markets reactively negatively to Nvidia the way that they had done previously when the original deepseek was released, and that was when it was trained on Nvidia but it was trained on less Nvidia chips than expected and people questioned Nvidia itself and the stock fall. This time, they aren't using Nvidia chips in the first place.
> If you could run a nuclear reactor with U-235 as fuel or Pu-241 (both mixed with 95% U-238), which one would you choose and why?
For a human this would not be tricky at all. For an LLM it could be, because this question certainly does not exist in any sort of training, because Pu-241 does not exist in pure form, it only exist as a minor component of reactor-grade plutonium, where Pu-239 would dominate, with Pu-240 coming second and Pu-241 coming third.In any case, LongCat-2.0. gave a very well reason but incorrect answer that Pu-241 is preferable.
I then tested on Qwen 3.7 Plus, and it correctly answered that U-235 is preferable because of its much higher delayed neutron fraction. I then went to Gemini Flash, which answered the same, with much more confidence, and with much stronger arguments, and the speed of the answer was much higher.
Overall I rate Gemini Flash the best, Qwen 3.7 Plus an acceptable second, and LongCat-2.0 an ok'ish third, if you have nothing better.
Or stated another way, "If you could run a generator on gasoline or jet fuel, which one would you choose and why?" I would answer jet fuel owing to slightly higher energy density and purity of the material - likely leading to a cleaner burn. Which would ignore that jet fuel is going to be a multiple of the gasoline price.
Also not a physicist, but I assume from the fact that the OP is asking the LLM this question to trip it up, the point is that U-235 is better even if you have an abundance of both. It's scarcity of Pu-241 leads to the lack of data in training, not that it's actually better.
That doesn’t sound right. If my Duck Fu is any good, jet fuel is currently going due US$3.00 per gallon, avgas (leaded petrol) at $3.30, and gasoline at $2.88 gallon.
There’s nothing much special about jet fuel, it’s just kerosene, same as RP1 (Rocket Propellant), heater fuel, and lamp oil you can buy from the hardware store, with a touch of something to stop it gelling at low temperature if I understand correctly, but also jet fuel tanks are heated if I recall correctly.
I believe standard diesel fuel will also works in jet engines, but kerosene is cheaper.
I’m not in the US, and if I understand correctly their gasoline (petrol) price can vary greatly from state to state, California being the worst? Is that right?
/s
I very much doubt that.
That is a tiny tiny system. OpenAI uses _milions_ of GPUs for training
On the other hand, this probably reuses the existing deepseek v4 architecture and weights. Maybe didn't need that much compute.
Uber is a people delivery company, but they've had a lot of bright engineers working for them on their infrastructure and software over the years, and that work has rippled out through the industry.
Amazon (in VMWare's words) is "a company that sells books", and their leadership couldn't accept they were losing to them ("I look at this audience, and I look at VMware and the brand reputation we have in the enterprise, and I find it really hard to believe that we cannot collectively beat a company that sells books.").
In the same way than Amazon spin-up AWS, they are quite leveraging their tech experience.
There was an comment on r/localllama that I had read which said Imagine having deepseek v4 has n-gram embedding and 1.3 (ternary) or 1 bit model combined, it was when deepseek v4 hadn't released.
I think that there is a lot of research and proof's being released. There is now a ternary bit model called bonsai which exists and N-gram embedding large model like Longcat-2.0 existing as well. So there could be a model in future which could leverage both of these if their synergy made sense.
A bonus would be tok/s on common hardware.
They haven't posted weights/inference solutions for LongCat-2.0 [1], but LongCat-Next had transformers support, which I assume means it works with vLLM/SGLang.
Given it's 1.6T, "common hardware" is probably out of the question; even 2bpw is going to measure out at 400GB, even before considering the bandwidth requirements for 48B active. I haven't read the LongCat-2.0 architecture docs, but if you're not running GLM-5.2, you're probably not running this either.
[1] https://huggingface.co/meituan-longcat/LongCat-2.0: "Model weights coming soon — stay tuned!"
In general the TL;DR is that anything above 35B needs hardware you buy basically only to run large LLMs, and if you have that hardware you don't need to ask the question.
~70B models can run fine (albeit somewhat slow) on consumer hardware with 64GB RAM. There are heavily quantized (Q1.x) models that are still usable on similar hardware. Granted recently there haven't been a lot of models of this size, but still, 35B isn't really the practical limit. 35B is mostly the limit if you're using consumer grade GPUs with limited RAM and need the model to run fast.
People have been toying with running large-ish models by partially offloading on CPU+RAM with mixed results, but as long as you're OK with reduced speed, and you quantize the hell out of the big models, you can apparently try a lot more models locally than popular belief.
Many MoE models (seem?) to only require enough memory to load the active expert.
https://en.wikipedia.org/wiki/Wang_Xing
Wang Xing (Chinese: 王兴; born 18 February 1979) is a Chinese businessman, who co-founded Meituan and has been serving as chief executive officer of Meituan since January 2010. He previously served as chief executive officer of Fanfou from 2007 to 2010.
To think that Nvidia would not have any competition is quite laughable and Jensen knew that China would catch up.
This is the reason why restricting GPUs as a temporary blockade does not work and they would just make all the Chinese AI labs find clever workarounds to serve AI compute as cheap as possible, including building their own hardware.
Like Bitcoin has done with ASICs, AI will soon need them for training and inference (TPUs are also ASICs) and Jensen knew this by buying Groq.
Today is not a good day if you are Anthropic or OpenAI.
Maybe I'm wrong, but that's just the first impression.
EDIT: I take my words back (which happens rarely) - although they do build upon DeepSeek's work, their contribution far exceeds merely post-training the base model in a different way. They did introduce something new to the architecture, though I still can't find the full tech report, with Hugging Face and GitHub links returning 404 right now.
EDIT-2: Now when I think about it, I'm not quite sure if they're going to release in the open the full report with methodology, as well as the model weights, at all.
LongCat-2.0, a large-scale MoE model with 1.6T total and 48B Active
https://longcat.chat/blog/longcat-2.0/