rHXN - Claude Science

lebovic - 16 hours ago

I built one of the connected tools included in this launch (the Biomni HPC [1]), and I have spent an inordinate amount of my life working on this problem. (I also worked at Anthropic, but not on this product.)

As other comments have pointed out, this is for data science – but it's capable of more than making plots and writing papers [2]. It has integrations with many databases and computational tools, including a researcher's institutional cluster.

That alone is valuable. I founded a startup after struggling with this problem at a bio startup; integrating these tools and databases is hard and time consuming. If the only outcome of this product is that great APIs are built for LLMs, it will be a massive positive impact. Many databases used in computational genomics are still only accessible through FTP!

LLMs are particularly good at navigating these tools and databases. It's often very specialized, but straightforward, work that benefits from in-context skills. Seeing an early glimpse of my former customers – bioinformaticians – using LLMs to solve this problem is what led me to join Anthropic in 2024.

Also, this pattern isn't fundamentally constrained to data science: you can also integrate with a wet lab or a CRO for some kinds of science. This is what I'm spending my time on now.

This type of science doesn't solve everything, but it's useful in some niches. For example, progress on many rare diseases is bottlenecked by researcher attention rather than a fundamental breakthrough.

[1] https://x.com/phylo_bio/article/2029233694775624096

[2] In comparison, OpenAI's science product – Prism – was effectively a LaTeX editor they acquired with Crixet.

ricksunny - 21 minutes ago

Thank you for this summary. Especially interested about the wetlab & CRO tie-in. What is meant by a ‘researcher’s institutional cluster’?

SubiculumCode - 15 hours ago

Connecting AI directly to the data sources (instead of just asking it to provide code that I run locally for myself) can get quite complicated in terms of meeting institutional policy, applicable law, data access-storage requirements (e.g. NIH data repositories), and can require legal agreements between institutions and the AI provider.

I cannot touch. At least not yet.

foft - 3 hours ago

If you put your data in Snowflake then there is a built in AI (ok it’s Claude) that can access the databases. This sidesteps a lot of the issues in that the data is clearly already with Snowflake.

aabhay - 15 hours ago

Can you speak to what makes this different from simply including or configuring various agent skills? Or is it simply the combination of lots of helpful defaults that makes this product useful?

lebovic - 6 hours ago

I can't speak for Claude Science, but I prefer using Biomni as an agent for bio over Claude Code with a custom setup because a) Biomni stays on the frontier for bio, b) it has a config that just works and skills I trust are correct, and c) it has better built-in abstractions for long-running sessions.

As a concrete example, computational biology jobs sometimes run for hours on the Biomni HPC. When they're done, the session needs to reawaken, process the results, iterate, etc. You can implement something like this with agent callbacks, but it's not as straightforward.

This repeats many times for many integrations, so it's just simpler for me to use an agent that's built for exploratory bio and already has all of this. Claude Science has some of these features, so I imagine they're aiming for something similar.

rramadass - 7 hours ago

The FAQs at the bottom of the page answer your question.

aabhay - 4 hours ago

The FAQ was exactly why I asked the question, since it made it seem like the answer is no.

nhinck2 - 5 hours ago

> https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-...

Did it produce this absolute clanger of a visualisation that still proudly sits on top an anthropic blog post?

jessetemp - 14 hours ago

How do you validate this kind of work to weed out any confabulating by the LLMs?

Edmond - 12 hours ago

Previously integrated Biomni into our intelligent workspace:

https://blog.codesolvent.com/2025/07/ai-assistant-with-biome...

Happy to chat if intrigued.

imperor - 12 hours ago

I'd really like to see much better visualization from Claude Science at some point. Educational-esque, with full threejs + shaders scenes over just these plots and protein/chemical structures. This for a lot of papers in the literature review would be awesome.

Melatonic - 15 hours ago

Sounds like the perfect use case for some kind of framework where you have a local LLM (that can run on lower spec hardware) collaborating with the main LLM to optimise latency and all the other niche and legacy use cases ?

packeted - 5 hours ago

I watched the announcement and gave it a spin as I'm a heavy user of cowork/code. So far I'm super impressed. I used it to analyze my whole genome sequencing data I have as my son has a rare genetic condition. I used it to answer a question I'd asked a few bioinformaticians to help me with but never got a satisfactory answer, it solved it in about a minute - whether his n-of-1 de novo, heterozygous single nucleotide mutation was likely passed down from mom or dad. It performed a read-backed phasing analysis on the data, identified a nearby SNP with overlapping coverage where mom was homozygous and dad was heterozygous. Identified my variant on his mutated allele so looks like it came from me..

It also crosschecked my data against AMCG Secondary Finding genes and ClinVar likely pathogenic/pathogenic variants and came back with identical results to my Natera Horizon carrier screening results.

I'd previously tried and failed to do this all with some ChatGPT guidance and subsequently hired a couple of bioinformatician post-docs at top tier universities via Upwork who had failed to give me satisfactory results.

And this is just getting started!

letmetweakit - 3 hours ago

You're not worried your whole genome is being sent over to some commercial entity?

packeted - 3 hours ago

Marginally but the data manipulation is actually being done locally as the genome CRAM files are like 24Gb each.

make3 - 2 hours ago

it's not, the genome is treated locally by tools called by the LLM, the LLM itself can't do much with the raw DNA sequence

yuppiepuppie - 3 hours ago

Not sure how to feel about this. I think its super cool that you can dive into this, but it sucks that its your son that has this condition for which you have to do this analysis. I hope it all turns out well.

Quick question: where did you get your genome read and get the raw files? As far as I know, as service like 23andme does not give you back the raw files.

packeted - 3 hours ago

Thanks for the kind words. Actually we got the trio whole genome sequencing through our neurologist/geneticist a couple of years ago. It was performed by a company called GeneDx. They interpreted the data at the time which is how we got to a diagnosis but knowing I'd want to dive in to it later I asked for the raw data. They provided it as the raw CRAM files and also the VCF (variant call files) which are a bit smaller. But each company has its own pipeline and for example uses different versions of the reference human genome which made working with the data quite hard for me and the people I enlisted. Claude Science seemed to make very easy work of it. Also to be clear, the question I was trying to answer was whether his mutation was likely passed down through my sperm or mom's egg - neither of us have the mutation in our own genomes. Turns out spontaneous (de novo) mutations are much more common in sperm because Spermatogonial stem cells have undergone many more cell divisions over their lives. Everyone has de novo mutations (70+), one of his just happens to be in an unlucky location.

GeneDx aren't direct to consumer so you'd need to get it ordered through a physician but there are some DTC options for example, Dante Labs, Nebula Genomics, Sequencing.com but I can't speak to the quality of their testing.

23andMe doesn't do whole genome or whole exome sequencing. They use a microarray technology that tests for about 650,000 single nucleotide polymorphisms. You can actually download the your raw data on 23andMe and do your own analysis or use a tool like promethease.

I'm an MD so I'm quite comfortable exploring this data and whatever it uncovers. Tools like Claude Science are going to put a lot of power in the hands of every day people, potentially outside the guidance of genetic counseling/docs, which many organizations in the past (including the FDA) have been hesitant to allow.

Alexadar - 4 minutes ago

Interesting to test. I set up all scientific subroutines with claude code generated automation and visualization. Honestly, i think that this product would not be a fit for all given diversity of scientific tasks.

teekert - 4 hours ago

I'm a scientist, (biophysicist). Over time I have become a bioinformatician and a python dev.

I wrote articles and applications, and it always was a struggle. But now I can speed up, make it all go much faster. But I often feel like my mental models can't keep up.

Recently the AI has generated a comprehensive data model (in Django) and I find myself retracing its steps with long discussions and explanations (with/from the LLM) and searching for documentation. With scientific assignments I find myself searching literature on my own, read whole papers as I used to. Checking the LLM constantly but adapting to it and I don't like it, don't like how it steers me, just let me search, let me wander the scientific landscape on my own, let me read the words of the authors with opposing views. Then let me make 20 plots and only use 1, let me wrestle with the data. Let me make wrong visuals that by chance communicate something important about the data.

Because otherwise I feel uncomfortable, I need to understand, that is what I do. I can reason about so many things because my internal world model is comprehensive and mostly correct. That has taken 44 years so far. Hard work from time to time, but I've mostly enjoyed it.

I still don't know what to make of these models, I use them everyday, but sometimes I wonder if I was not just as fast with Stack Overflow, because what I crave is understanding, not "some finished app". Yes, I rarely finish things fully (that's how I feel), but in research I've often been told they like my ability to move very fast and creatively in phase one, the development is left to others anyway...

I crave an understanding of what these tools mean to me exactly. This comment is part of that. HN is part of that.

teekert - 4 hours ago

Perhaps it is true that the faster you can internalize knowledge (thoroughly, there is a quality aspect to it), the faster you are. Maybe I'm getting old and learning new skills is getting tougher. Maybe, as my world model grows I'm becoming a slow thinker, or a slow learner? New stuff has to be evaluated against a lot of knowledge. But when it clicks, it really feels like a click, it feels satisfying. Like when some new knowledge does not just explain the problem at hand, but also a lot of things that still lingered in the back of your mind.

Recently my wife said that my daughter (ill at the time) may have heatstroke, my response was: It looks like it but she also has a hefty fever (hot after being more than 24 hours out of the sun), I can't really imagine the immune system being involved in heat stroke, although it's possible... My mind went out to heat damaged proteins presenting neo-antigens triggering an immune reaction. I also labelled that as unlikely and more dangerous than what we were observing. I like that I can do that (of course I went on to verify these thoughts!). That reasoning, it's not exactly 100's tokens a sec, but I like the process and it has value.

I also recently observed some weirdness in a dataset, I spend 3 days hunting it down. Long story short: I though I understood how genes make transcripts but I was wrong and ended up adding a new transcript to the human reference genome annotation together with the Gencode people. Now I understand my data better and can separate two different transcripts better in my data (a difference important to our research).

Things like that. The LLM doesn't speed that up, not really. I read a part of a book on gene expression and the function of transcription factors and their interaction with promoters, but I also used LLMs, In the end it was the book with the pictures and clear language that communicated the concepts most clearly. It was made for that of course, and I knew I could trust it (it's tiring to assign <100% confidence to LLM answers), although I know a real scientist also does that with books :)

Maybe I, we all (humanity), will really be faster in the future. Maybe when you grow up with these things you can build world models better and faster. Maybe I'm just too stuck in my ways, as my neuro-plasticity degrades over time. Or maybe it doesn't degrade, maybe I just need more evidence before changing my world models, they have been building on a heavy foundation for a while now.

gjuggler - 15 hours ago

The most interesting thing here is that Claude Science runs a local server and a web-based UI that connects to that server from your browser. This is very different from Claude Code and Cowork, where the UI is more tightly coupled to the host machine (which makes things like computer use possible).

I think I recognize the strategy: most pharma environments connected to interesting data are tightly locked down, to the point where you can't just connect your Macbook to the source data.

Similarly, access to large genomic biobank datasets like UK Biobank or NIH's All of Us program is granted only through a Trusted Research Environment (TRE), a remote data analysis platform usually quite restricted on internet access, etc. You can't easily run desktop apps, but these environments do usually support running JupyterLab or VS Code, tunneling the user interface through to the end user. (Source: I previously ran the team that built the All of Us TRE.)

Claude Science looks a lot more like something one could imagine spinning up in one of those highly-constrained data environments (with the "server" running within the TRE and the UI proxied to the end user's browser) than the does-everything Claude mega-app. That will be critical for traction within pharma R&D environments.

I will say that for moderately-computational scientists, who are daily driving RStudio, JupyterLab, or maybe VS Code, Claude Science will be quite an unfamiliar shaped product. I'll be curious to see whether something like this gains adoption (1) in place of, (2) alongside, or (3) eventually wrapping around the more traditional data science workbench tools out there.

annzabelle - 13 hours ago

Anecdotally, as someone with a lot of moderately computational sciencey tasks at work (part of my job is as a data analyst for a geology firm that has some interesting sensor data), combining Claude Code and standard python data libraries has been extremely powerful and sped up my workflows immensely. If I just need a quick analysis or visualization, Claude can write something for me in minutes that would take me an hour or so to sort out on my own. I know the relevant libraries well enough to read and verify the code, which is an important distinction from blindly using a black box AI.

I will note that Claude Code and Jupyter in VSCode don't play nicely together right now - it forces me to rerun the whole notebook from the start after every edit Claude makes. This has led to me stepping back from notebooks and having Claude write standalone scripts that I then spend time merging back into a pretty notebook.

IanCal - 4 hours ago

There’s a dead sibling comment but I’d also recommend looking at marimo, I just used it to do some analysis for my brother in law and has Claude write the whole thing. It tracks variables used across cells to see what needs re-running. It’s also got an in built AI helper thing where you can put an api key but I’ve not tried that yet.

mssuraj - 8 hours ago

[dead]

gonzalohm - 14 hours ago

I agree that it's an interesting architecture, but I'm not sure how it would work in a highly controlled server.

If you can't connect from your Mac, then I doubt they will allow an agent to make requests from the server

gravelc - 13 hours ago

Tried this to see how it goes in my particular field - computational design of RNAi-based biopesticides. One-shotted a design for targeting the DvSnf7 transcript of western corn rootworm. It took a fairly naive approach (maybe how a 1st year PhD student would go about it), but got the job done. Also noted caveats with its approach (e.g. using mammalian design rules, limited off-target screening). Not bad really. But also not great. When its flaws were pointed out, the AI determined that it could have taken a more informed approach. Then Opus 4.8's safety system flagged the session.

solenoid0937 - 10 hours ago

> Then Opus 4.8's safety system flagged the session.

If you think you can use this to land real positive impact, you, your institution, or your company should apply for OpenAI and Anthropic's bio programs!

greenavocado - 13 hours ago

> Then Opus 4.8's safety system flagged the session.

The jokes write themselves these days.

I suggest collecting 10 seminal works on the subject matter including 10 textbooks in the general field, converting them to plain text via OCR or text extraction, then trying the same thing with a superior agentic harness, like omp.sh

/goal set create biopesticide targeting the DvSnf7 transcript of western corn rootworm

<sarcasm>make no mistakes</sarcasm>

minimaxir - 17 hours ago

When I saw "Science" I didn't think they meant Data Science, which is what the UIs full of pandas code and plots imply. Even if the focus is on the sciences, I suspect that's the less valuable part of the announcement particularly with the implication of Jupyter Notebook 2.0.

Image-understanding for data viz is a use case that has been ignored, and modern LLMs are getting better at proper EDA. But, uh, I may need to update my resume.

ritzaco - 17 hours ago

A lot of the soft and hard sciences use hacky matplotlib code to produce results and visualisation, without being necessarily data science

From the bits I've seen, I'd take claude-generated code any time over that written by maths, physics, biology, linguistics people. Even though I've seen Claude make some super-big mistakes while doing data analysis I'd guess it's already more reliable than most academics trying to code.

beardedwizard - 16 hours ago

This 100000x over. Nothing is worse than trying to productionize code coming from academics like this.

imperor - 12 hours ago

I think presentation via software just isn't a lot of their strong suits. A lot of researchers' personal or research lab sites too are usually way out of date or just really badly presented from what I've seen. They could all do with some thinking about aesthetics and understandability more.

__MatrixMan__ - 15 hours ago

Conveniently, you can use published results as tests of equivalence, provide the ugly code as context, and regenerate it to your liking. I think the odds of such a regeneration introducing a bug that's within the usage domain but that dodges the golden tests are quite low... so long as you resist the urge to add features along the way.

eli_gottlieb - 10 hours ago

Matplotlib? Ha! There are loads of academic fields where you still write data analyses by hand, one at a time, in Matlab, without proper version-control or libraries.

__MatrixMan__ - 17 hours ago

My take based on the video is that they're thinking more about bioinformatics, which might technically fall under the "data science" umbrella depending how you define your terms, but which is not described that way in common usage.

It's the content that determines the sort of science, not the toolchain.

throwaway219450 - 11 hours ago

It's not obvious from the marketing if this is applicable to non-biosciences. If it is, they couldn't come up with a single example from another domain like astrophysics?

https://www.anthropic.com/news/claude-science-ai-workbench

EDIT: Installed the app, it has zero connectors for non biology, which is a shame. I assume they'll come later.

winwang - 16 hours ago

Honestly quite excited to see what can happen here, I think biology has generally had a lack of data science expertise.

inciampati - 13 hours ago

Tell us, what gives you that impression?

__MatrixMan__ - 8 hours ago

I don't hold that view exactly. But something related...

I once tried to replicate a bioinformatics result based on published data (for a class). I found that although the process did indeed yield plots A and B, as the authors claimed, they were typeset wrong in the PDF so plot A had B's caption and plot B had A's caption.

It would be an easy thing to provide assurances against, if you wanted to. You could repeatably build the pdf so that such a mistake was in plain view, as a bug in the pipeline, rather than something you had to do offline calculations to support or reject.

The situation as it is is not ideal. Instead of anything that would verify either side, it's my word against the author's until a third party bothers to repeat the analysis. That's the best we can do for scientific claims, but there are friendlier ways to make the computational claims verifiable.

The Claude science video showed a little "provenance" button and talked about exactly this. Life sciences have their hands full with the actual science. They're not immature, but they are not in a great position to be proving the validity of the computational connective tissue that underlies their results. That's a whole thing on its own, independent of the underlying scientific reasoning being presented (though I wouldn't call it data science).

Plus, its exactly the sort of thing we need AI to get better at: sourcing evidence that proves its claims and stitching it together so the proof is easily verifiable.

I too am excited.

imperor - 12 hours ago

They do mention things like protein and chemical structure visualization though

quijoteuniv - 16 hours ago

All of these new things are starting to look like soviet space program propaganda. Is there something really new?

dennis_jeeves2 - 15 hours ago

Old wine, new bottle...

PotatoFarmsKing - 15 hours ago

Before LLMs the tech groups I followed were ripping with discussions about this and that topic, what to use and when; I believe these discussions sparked the creation of many frameworks and tools out of "this seems like a good idea, wouldn't hurt to implement it". Unfortunately it all resolves around LLMs nowadays and how to make some LLM work some way or another, we don't even discuss the very topics the groups were created to discuss. I fear science is soon to taste the same thing - discussions about LLMs taking place instead of the actual topics that would be discussed otherwise.

3fffss - 10 hours ago

My friend they have dumped hundreds of billions into LLMS.

The ROIC is not gonna look good if they do not somehow make use of the existing assets...

Not an argument for btw, Im just saying. Ultimately the management answer to shareholders who look at return measures such as that.

ai_fry_ur_brain - 14 hours ago

Well LLMs are largely useless and people are realizing that.

helloplanets - 2 hours ago

Doesn't make sense to fixate on LLMs and not the actual Transformer/attention foundation. The Transformer/attention architecture is the breakthrough, not LLMs. Especially the RLHF chat paradigm is 100% a byproduct. Which is easy to see when you look at how ChatGPT originally came about.

DeepMind has already has had real impact on science with the same foundational architecture as LLMs, for protein folding. They won a Nobel prize for it.

foxyv - 14 hours ago

Raw dog Chat LLMs are pretty worthless. But run an agent with tool invocation and they get scary good. It's amazing how much reasoning is packed into the English language. Provide your model with enough information and it can pull some miracles out of thin air. It's not the "Replace humans" level yet, but you can automate a lot of stuff you wouldn't expect to be able to automate.

applicative - 10 hours ago

What you are saying, if I follow, is that LLMs basically worthless: it turns out that coding is so simple that verifiable rewards can tune weights surprisingly well for that one peculiar task. ('agentic' is fancy word for letting them run what they write - not to put too fine a point on it.)

You've made the most damning remark against Planet LLM I've read.

eli_gottlieb - 10 hours ago

Most of what's impressed me in working with LLMs is just how much "intelligence" you can get out of the agent iteratively refining something it looks back at with each turn, without its ever actually exhibiting human-level intelligence. I've always been an embodied-cognition guy, and it really seems to me like "agent harnesses" are basically task-specific pseudo-embodiments for LLMs.

ai_fry_ur_brain - 10 hours ago

No its not doing magic. Im impressed when anyone can play a guitar, because I dont know anything about playing a guitar. Someone who's been playing the guitar for years isnt impressed by all guitar players.

This seems the case with many people using llms to write code. They think everything an llm does is magical.

It will never be able to replace humans with two brain cells.

protocolture - 6 hours ago

Its kinda both. Its quite underwhelming at the top, but at the bottom its amazing.

Recursing - 17 hours ago

This seems to have unblocked Claude Desktop for Linux ( https://code.claude.com/docs/en/desktop-linux )

loufe - 16 hours ago

unfortunately no arch based distro support. I'm curious why it's not packaged as a flatpak.

arendtio - 16 hours ago

Well, for Arch Linux, there was the unofficial version from the official binary in the AUR already... (Not sure what you mean by 'no arch based distro support').

loufe - 16 hours ago

First party support would be nice since this is not a high-trust in the AUR period, but fair point, I'll probably use it, thank you!

Recursing - 16 hours ago

Many deb packages are easily repackaged for arch by the community

celltalk - 13 hours ago

I basically did the same thing almost one and half years ago and not many people cared, but I still believe that this is the future for computational biology.

https://celvox.co/solutions/axon

keepupnow - 12 hours ago

Competition is healthy, yours looks cooler.

qwerty_clicks - 16 hours ago

Should be called Claude-bio-big-bucks.

What about earth science, physics, engineering? The connectors and skills are all just biology and pharma. Boo

eli_gottlieb - 10 hours ago

If I didn't want companies focused on making money to exclusively target the life sciences, I would simply fund literally anything or everything else commensurately with how much money is thrown at the life sciences for the sheer garbage they actually practice and produce. Don't like it?

NSF annual budget (pre-Trump): ~$6-8 billion

NIH annual budget (pre-Trump) ~$50 billion

There it is.

jkwang - 3 hours ago

Claude Science sounds like a useful shift toward reproducible agentic research. The built-in error recovery and tool orchestration could make it practical for real lab workflows, not just demos.

Sol- - 17 hours ago

So it's like Claude Cowork for Science, i.e. for less tech-savvy users? I would imagine scientists with some coding background might just prefer to use Claude Code normally and integrate it with their stack of choice, but perhaps the comfort and ease of use of Claude Science still wins out.

Abh1Works - 11 hours ago

lebovic answered this, but it isnt just claude cowork especially with connection and abilities related to SPC clusters. I could defientiyle see my former team at a national lab integrating this with their systems, and forgoing the use of Claude Code all together

kfse - 11 hours ago

I've worked with similar tools and while they're impressive, it's too often the case that the LLM literally makes up fake but realistic looking data and pretends that it's real. This includes pretty deep fakery like setting up mock database connectors so that it looks like you're fetching data from the right place, but it's just getting synthetic data

How does this guard against that?

dbcooper - 14 hours ago

A "standing review agent" seems to be one of the main differences beyond the new connectors and in place visualisation tools.

>A standing reviewer agent. This runs in the background during a session, checking citations against sources, flagging numbers it can't trace back to evidence, and catching figures that don't match the code that supposedly generated them. That's not something Code or Cowork do automatically — you'd have to ask Claude to double-check itself as a separate step.

alpineman - 3 hours ago

So that's why Fable was refusing those biology questions

immmmmm - 15 hours ago

When I was doing my phd, around 2 decades ago, I was often going to the library’s compactus to fish for a Phys Rev from the 80s. Back then papers were sparse and expensive. But the quality!

The Higgs boson is 3 papers, 6 authors and 6 pages in total!

At the end of my phd, 30++ pages slop papers were the norm.

Nowadays, well..

The paper by Higgs was one page. The guy probably published less than a hundred pages in his career.

One reason that made me abandon a career was the disgust caused by the publishing frienzy.

And now tokens..

trollbridge - 15 hours ago

There is an obscure topic where I have read basically every single dissertation, study, etc on that topic (or even just articles that mention it). It is very noticeable how much briefer older publications were.

It would be impossible to do that today. I guess I could have an LLM just summarise all the papers…

Daishiman - 15 hours ago

What's the reason for this? Publish-or-perish? Papers have to be more thorough? Extra junk tacked on for the sake of showing lengthier papers?

raphman - 17 hours ago

tl;dr: Use this if you don't like doing science or doing things well. It hallucinates references.

Seems to be based on https://github.com/swaruplab/operon as evidenced by the authorization dialog and https://x.com/testingcatalog/status/2037684573161783373 .

Mostly targeted at life sciences - e.g. integration for FDA, PubMed, genomics databases but no ACM / IEEE as far as I can tell.

Edit: arXiv search seems to be supported - but not Google Scholar etc. So, this tool is of little use for most researchers outside life sciences.

Edit 2: Quick walkthrough: the AppImage starts a browser window with an onboarding wizard and a chat interface. It suggests a few things one might do at the start of a research project - e.g. do a quick literature review. When I chose that option, wrote Python scripts that used MCP calls to do arXiv searches. Stayed seemingly stuck there for a few minutes not returning anything. Then:

> The free-text search returned too much noise

Claude decided to choose a certain paper as a starting point for further research. Shortly afterwards:

> That DOI resolved to the wrong paper. Let me find the correct anchor papers by title/author search directly.

Then it meandered a few more minutes doing research and creating a citation graph (that it did not show to me).

> I have a complete picture. Let me verify the key DOIs resolve and then write the review.

Then:

> The lint flags em-dash overuse. Let me reduce them, then save.

Then: a nice but verbose literature overview of my chosen topic

<blink>BUT it includes at least one hallucinated reference!</blink>

P.S.: What does this mean?

  [reviewer] verifier_mode=default-on downgraded to off: pro subscription tier, autoReviewer withheld (frame=f2a81cb2)

Retr0id - 17 hours ago

> The lint flags em-dash overuse

An explicit text desloppification pass (i.e. LLM-use obfuscation) seems like outright scientific fraud.

sansseriff - 16 hours ago

It sure is! But ironically, because of the intention behind the obfuscation. Not the fact that AI was used in a research paper.

I have no issues with AI use in science. If claude can explain my research better than me, then have at it. But I do NOT want to read a passage thinking it was written by a human when it wasn't. Science has no idea yet how such disclosures should work yet. What should be done by humans as a matter of principle, and what can't be or should not be done by humans.

epihelix - 11 hours ago

The thing that really scared me about the landing site for Claude Science was this promotional image of the software in action:

https://cdn.prod.website-files.com/6889473510b50328dbb70ae6/...

Very depressing. MDPI journals will be saturated with these slop papers (if they're not already). It shocks me that Anthropic thought that this was a good thing, and says a lot about their research integrity (or lack thereof).

> Science has no idea yet how such disclosures should work yet.

Technically, most journals have a policy that LLM use should be acknowledged, but I agree we're still very much in the weeds about this right now. Much firmer guidelines should have been established years ago.

(I also have no issues with LLM usage in research either, btw -- I use LLMs to fact-check / proofread / discuss / sanity-check my conceptual work, to background myself in other research, and to refactor and assist with analytical coding. They can be a game-changer for medical research, when used rationally and sensibly.)

dleeftink - 16 hours ago

Some authors may even choose to leave syntactical errors as a tell for those self-authored passages; long-term, some interesting language drifts may come of it.

Der_Einzige - 15 hours ago

We send our regards: https://arxiv.org/abs/2510.15061 (ICLR 2026)

sampo - 17 hours ago

Biosciences mostly don't use arXiv, they have their own https://www.biorxiv.org/ but it's usage is not as common as arXiv is in e.g. physics.

chazeon - 7 hours ago

Isn't this the company that make the LLM become a degenerate when it comes to bioscience?

cmiles8 - 17 hours ago

Science isn’t suffering from a lack of papers. It’s suffering from a lack of good papers. Making it easier to just pump out paper-mill publications is about the last thing science needs right now.

dgfl - 16 hours ago

My hope is that the flood of AI articles pushes the academic publication system to its highly-anticipated breaking point.

The most absurd part is that everyone in academia knows that publish or perish is tremendously damaging to real research. Yet we’re all hostage of this system that we created in the name of “merit” and “efficiency”.

We need a different system to identify and reward talented hard-working people. Back in the day it all relied on actual interpersonal interaction and subjective judgment, but there were also much fewer researchers worldwide.

dag100 - 16 hours ago

> My hope is that the flood of AI articles pushes the academic publication system to its highly-anticipated breaking point.

This will just make research inaccessible to most researchers. There is no incentive to limit publishing, at all, other than at the highest echelons. Publish or perish will just become worse. Look at what is happening to programming and extrapolate that to research work.

And all for what? Just to keep up this facade of society until most of society can be excised, whether artificially or naturally though lack of reproduction.

breezybottom - 15 hours ago

Oh it's getting there. I've turned down several referee requests this year because the paper looks like AI slop. A lot of it seems to come from China.

godzillabrennus - 17 hours ago

Scientific research is suffering from a reproducibility crisis. Not a publication crisis. LLM's aren't going to solve reproducibility issues.

CJefferson - 17 hours ago

They are going to make it a thousands times worse.

It wasn't perfect before, but it at least took some time to fake a paper. The problem is now people can produce a very plausible looking completely fake paper in minutes. Peer review is in the process of completely collapsing, in fact I think it's already basically done.

The only way this might fix things is if we require all papers are completely reproducable (that doesn't help in subjects like biology of course. They can still provide all the experimental data in the rawest format possible which doesn't break any laws).

FeteCommuniste - 17 hours ago

The two feed into each other. "Publish or perish" ups the incentive to pump out shaky papers to pad resumes. LLMs make it easier to churn them out.

xpct - 16 hours ago

I'm actually quite excited for when (if) the models get good enough to start replicating compsci papers. I'd love it if there was a system which calculated a reproducibility score per-lab or per-researcher, which I could look up alongside their citation count.

I want to see who did the hard work properly, and who focused on publishing with concealed details.

nok22kon - 17 hours ago

it's suffering from having 1 million researchers, when there aren't 1 million important easy problems to solve, yet you must publish something

virissimo - 16 hours ago

It seems to me that LLM's could massively improve reproducibility issues if journals would require that the papers be reproducible by model X using a standardized prompt in < N minutes, etc...

bulder - 3 hours ago

Sorry, how would that work for literally any non-computational science? You just can't submit papers that involve actual reality?

mobeets - 17 hours ago

Por que no los dos? Scientific review times are up, it’s harder to find reviewers, and many reviews are AI generated anyway. Auto-generated research publications will arguably make the replication crisis worse, because there will be more slop to clog up the review system, and these papers will presumably be just as (if not more) not reproducible than human written science

rolph - 17 hours ago

it could also be said that scientific interpretation is suffering from a framework crisis. the scientific convention of experiment, is the test of an hypothesis, as a logical construct.

repetition of materials and methods toward reproducibility, holds far less wieght than multiple variants of process designed to test a common hypothesis resulting in agreement.[null, or failure to null]

realityfactchex - 14 hours ago

Underlying reproducibility is integrity.

Underlying integrity is rigor.

Underlying rigor is education.

It goes deep, for sure, IMO.

messh - 17 hours ago

They're gonna worsen it

ianm218 - 17 hours ago

Isn't this just blanket cynicism?

In the long run conceivable we could use AI to hold papers to a much higher standard, audit all the data and code that is associated etc.

xpct - 16 hours ago

> audit all the data and code that is associated

For a while now there has been very little incentive for providing these alongside the paper, and I don't see why exactly 'AI' would change this. I could even see how making it vague to be harder to test with LLMs could be profitable for citation hackers.

ianm218 - 14 hours ago

You can imagine using AI agents to tag papers that don’t have code or similar work attached and just filtering them out.

The Chinese open source community has made a lot of incentive to make research reproducible for example. The most reproducible works from I.e. deepseek get widely cited and adopted.

I don’t think we can just say “AI” and it’s fixed but with deliberate effort there’s reason to be optimistic.

dag100 - 16 hours ago

Unless reviewing becomes more profitable than publishing, anything that makes both easier will drive one up far more than the other. And it is difficult to conceive of something that would make reviewing much easier without making publishing much easier.

ianm218 - 14 hours ago

Just as a counterpoint ML and AI research has become much more reproducible over time. I feel like this is relevant because ML / AI researchers are huge power users of AI tools.

Between 2016 and 2021 the share of ML/ robotics/ AI researchers being reproducible (ie contianing code and similar instructions to reproduce) doubled [1].

The major US labs have gone largely closed source (I.e. they no longer publish frontier research) but the Chinese ecosystem has incredibly reproducible code.

This is field dependent obviously but I think it atleast gives reason to be optimistic.

Yes people will churn out fake slop research, but it feels like that can be categorized and then ignored.

[1] https://arxiv.org/pdf/2308.10008

dag100 - 14 hours ago

That's good to hear about ML and AI research, but most research is not based on computers and so would require laboratory setups to reproduce. Not only is trying to reproduce such findings (beyond what is effectively a sanity check) through simulations a lost cause, if AI can reproduce such research it would be capable of doing such research itself... in which case it would be far more fruitful to use AI to do further research.

ianm218 - 14 hours ago

This thread is about a product based on fully reproducible research though so I feel like we should stay grounded. Claude science is meant to be used in the context of reproducible science research, there is a decent reason to not be cynical on future research being reproducible.

> if AI can reproduce such research it would be capable of doing such research itself

Well there is a big distinction between research validation and research generation, it is generally much easier to verify that a math proof is true or false than to find a truly novel proof.

But yes in the long run I’d think AI will be doing tons of research and it will by default reproducible. So maybe we’re aligned after all?

cma - 17 hours ago

In some fields like comp sci, when code isn't given but the paper describes the approach, LLMs do help with the reproducibility crisis: you can ask it to reproduce the result through reimplementation by reading the paper.

If it fails you may have to double check it did properly reimplement it, but if it succeeds you do get a reproduction.

jszymborski - 15 hours ago

Any other researchers paranoid of using LLMs for fear of them using your data and front running your publications/work?

Or incorporating it in training data and then spitting it out to a competing lab?

malux85 - 15 hours ago

Pay for enterprise or use one of the guaranteed no data retention models (e.g. Bedrock)

hooloovoo_zoo - 9 hours ago

Doesn’t seem like much value-add beyond pointing Claude code at org mode.

zmmmmm - 3 hours ago

I can't decide if this will make science better or be the death of it. The potential wave of slop about to hit journals is frightening. Essentially what happened with GitHub code reviews is about to hit academic peer reviews and it isn't going to be pretty.

jerven - 13 hours ago

Working on the uniprot services that might be used from the connector it would be nice to learn if this uses public resources or if there is a private anthropic copy of certain uniprot data sets.

stanford_labrat - 17 hours ago

impressive to me, but sadly i feel a little misleading since this is only the data-science part of life sciences.

every few weeks though i test claude and chatgpt on their scientific reasoning and it has definitely improved over time. in my experience without specific instruction on what is known/unknown they typically are lagging behind the leading edge of the field (dev bio/pluripotency in my case). probably because scientific research articles are not open-source so they can't crawl them.

claude has definitely outperformed chatgpt in this regard however, it's scientific reasoning is impressive.

JoshGlazebrook - 18 hours ago

The fact that we are coming up on a month of Fable being unavailable with essentially zero actual signal from Anthropic around when it may be back is crazy to me. Yet still we have these random new products coming out?

striking - 17 hours ago

https://xcancel.com/AnthropicAI/status/2070665903440871779

> Anthropic @AnthropicAI Jun 27, 2026 · 12:29 AM UTC

> Since June 12, we’ve been working closely with the US government to restore access to Claude Mythos 5 and Fable 5. Today, the government notified us that Mythos 5, our strongest cybersecurity model, can be redeployed to a set of US organizations that operate and defend critical infrastructure.

> We’re restoring access for these organizations quickly, and we’re continuing to work with the government to expand access to Mythos 5 and make Fable 5 available for general use again.

shellfishgene - 13 hours ago

This thing is also surprising considering Fable was not allowed to answer any biology questions.

ianm218 - 17 hours ago

I mean the company has like 3k employees or more right? Lots of them are just working on more applied AI use cases that don't require frontier AI just the right integrations and structure etc.

Opus 4.8/ GPT 5.6 level models with the right workflows/ data/ access are still good enough to do huge amounts of economically valueable work.

imperor - 13 hours ago

This plus it's entirely plausible their employees have access to Fable or their own other pre-released models internally. Other than the perks you mentioned they've got excellent distribution too.

khurs - 17 hours ago

Big Pharama = Big Budgets.

So targeting them with a tailored product is understandable.

asdff - 15 hours ago

pharma is currently in a tailspin and not really spending money. they'd rather outsource everything possible to china or india right now.

imperor - 13 hours ago

Eli Lilly's recently partnered with NVIDIA to spend a lot of money for a new research lab in the Bay Area so not entirely

asdff - 11 hours ago

Lily is probably the one major pharma company doing alright these days. That being said this thread has a more sober take:

https://old.reddit.com/r/biotech/comments/1rgjnrj/lilly_bets...

nmilo - 8 hours ago

> Inspect proteins, alignments, genomic tracks, chemical structures, and PDFs in their native form, with no extra installation required.

I like how this implies parsing PDFs is as hard as like protein folding

jvanderbot - 17 hours ago

Thought I'd give it a whirl - crashed immediately.

I was tickled they had a "Download for linux" button prominently shown, but nothing yet.

nickandbro - 17 hours ago

So I guess they released this instead of Sonnet 5?

domrdy - 17 hours ago

It has Sonnet 5 as a usable model. Interesting.

properbrew - 17 hours ago

Looks like they've just announced it - https://www.anthropic.com/news/claude-sonnet-5

andai - 17 hours ago

Just released!

Claude Sonnet 5

https://news.ycombinator.com/item?id=48736605

fastaguy88 - 16 hours ago

Download for mac. Find out I need a different subscription. Cannot quit program (must force quit).

Perhaps I need AI to use it.

theplumber - 16 hours ago

They forgot to include an example of prompt error on “cancer” with Fable in that “nice” video.

devilfileprong - 11 hours ago

Claude can Oppenheimer square as the Roadster.

evolighting - 9 hours ago

Around the time I graduated from the research institute in 2020, it seems my lab already had a similar infrastructure, just without LLMs and agents.

Back then, we had data repositories, databases, Jupyter Notebooks, Slurm batches, open computing platforms, and so on. It could do similar things ---- just by hand.

While adding an LLM agent can indeed drastically improve usability, it must be a massive headache for system administrators. It honestly sounds like introducing a huge, uncontrollable wildcard into the system.

woadwarrior01 - 13 hours ago

Looks like Cursor and Jupiter Lab had a baby.

imdsm - 17 hours ago

Weird that it runs as a local webserver rather than as an app

cowpig - 15 hours ago

I've always found that what science is really lacking is closed, proprietary ecosystems trying to build for-profit moats around research.

Thank our lords at Anthropic for stepping into this void

tripleee - 17 hours ago

maxed out on coding improvements so now they're trying to expand to other markets

cma - 17 hours ago

Why have they talked about this for a long time? They predicted date of code maxing out, and did so not from fitting a sigmoid or something but they predicted it would max out right during a steep part of the slope?

trallnag - 16 hours ago

"Pre-configured for your domain [...] cheminformatics" as in something like ChEMBL?

ChrisArchitect - 17 hours ago

Blog post: https://www.anthropic.com/news/claude-science-ai-workbench

ai_fry_ur_brain - 14 hours ago

Why would you people ever use this companies products? They're actually evil and are trying to scam you and or make you unemployable./worthless. You people really gotta wake up.

dmezzetti - 15 hours ago

Why does HN let OpenAI and Anthropic basically advertise but it throws down the gauntlet at a small developer like myself when we do "self promotion"?

Top 3 posts as of this moment are all about Claude.

brcmthrowaway - 16 hours ago

DoA

game_the0ry - 17 hours ago

Disappointing that science came after cowork. Shows how their priorities are for profitability first and help humanity second.

uejfiweun - 17 hours ago

Now this... this is a hot take. How exactly do you expect these companies to "help humanity" if they're bleeding money?

bozdemir - 17 hours ago

Another overrated packaged workspace to drain more usage... No thank you.

cute_boi - 15 hours ago

whats up with all these samosa? Samosa Manuscript, Samosa Benchmarking?

CamperBob2 - 16 hours ago

Claude: "Not that science"

Retr0id - 17 hours ago

> every step from data wrangling to *publication*

Do they have no shame?

Edit: seems like no https://news.ycombinator.com/item?id=48736814

botfriendsarent - 13 hours ago

Dude! Give me some stolen science!

calldacopsidgaf - 17 hours ago

this a great application for the sycophantic, non-deterministic lying machine!

thrill - 16 hours ago

It's called Claude Science, not Claude Politician.

calldacopsidgaf - 13 hours ago

Bill Maher ass joke

mv_d5339e31 - 2 hours ago

[dead]

xarthurx - 44 minutes ago

[dead]

mariorossi25 - 3 hours ago

[dead]

xarthurx - 44 minutes ago

[dead]

daiz2025 - 10 hours ago

[dead]

mariorossi25 - 3 hours ago

[dead]

aplthrowaway67 - 17 hours ago

[dead]

bigyabai - 18 hours ago

How about no?

AI brand identity has made the unfortunate pivot to "how much do you trust us" which is going be a real race to the bottom. I don't want LLMs managing nuclear reactors or replacing junior lab technicians. I don't trust any of these LLMs to do the bare minimum, regardless of how good it is for your brand.

It's gross watching these stunts unfold. Next ChatGPT will fly a passenger jet, which Claude will one-up with an agentic surgery, which OpenAI will respond to by putting a humanoid robot on the moon. If this is what 21st century market competition looks like, we are all fucked.

torginus - 17 hours ago

Meanwhile in the real world, these Math Olympiad AIs can't even take your fast food order correctly.