Is this LLM psychosis? So much tending and conversing with the matmuls but what was the outcome? Are people who get this into it more successful somehow? It reminds me of people who take drugs and get "revelations" but then are not particularly over represented in the group of successful people for all of their deep insights.
> It reminds me of people who take drugs and get "revelations" but then are not particularly over represented in the group of successful people for all of their deep insights.
This depends on where you're looking for "successful" people.
I generally agree with you - of those people who might report "revelations" through hallucinogenic drugs, the majority may misinterpret their drug-induced experience and hence be more confused / lost than before.
On the other hand, it can still be true that among those who eventually do have genuine spiritual insight, having used hallucinogenic substances is overrepresented compared to the general population.
Quoting from [1], where the author tried to find spiritually advanced individuals:
> Approximately 52% of
participants had used hallucinogenic drugs at
some point; none reported these as the trigger
that led to PNSE.
PNSE = Persistent Non-Symbolic Experience.
My point is: while there are certainly people who go way overboard with the LLM stuff, that is not at odds with skillful use of LLMs being overrepresented in successful people.
I see now that you didn't make that point, but I already typed this all out and I'm gonna leave it.
You are absolutely right!
Kidding, but the analogy sits comfortably with me.
I wonder though if this kind of behavior is potentially harmful, most likely less than drugs but nonetheless...
same author who's idea of constraint decoding for structured outputs was to run an schema-begging-API call in a loop 10 times & then throw an exception on failure.
OpenAI per employee valuation is $150M+ (almost 100x of per employee valuation of our company). I think it may make sense to ponder a bit why a $150M engineer would have such an idea. May it be it is the preferred way of doing things in the new AI world, a paradigm shift.
> Every 30 minutes, check Slack and Gmail for unanswered messages that need my attention...
> When I come back to Slack, replies are often already sitting in drafts. I still decide what gets sent, but the expensive part of gathering context is done.
This just feels so dystopian to me. I hope that I never work with you or someone else doing this.
I personally do use LLMs for work messaging but I'm extremely careful to state clearly like "here's a draft for that quotation request that Claude wrote:" or something like that. I would never present that as my own words.
If the other people in the org are using LLMs to a similar degree, any question to which an LLM can provide a good answer to will never get sent. How useful are the draft replies then?
I guess the point might not be to be useful, but to pingpong responsibility back to somebody else. "There, sent a response, not it's their problem again."
But seriously: It's a game. If that kind of "productivity" is seen as a positive measure of their worth, then in this game they're rewarded for optimizing it.
And the game is simply fucked up.
And that's not new. Ye olde corpo rat race has always often revolved not around maximizing the things that are useful, but instead around maximizing the things that the boss-man perceives to be valuable.
Here in 2026, if the boss-man is himself boss-maxxing by using a bot to evaluate performance, this kind of automated charade would probably work very well. Champagne would fall from the heavens. Doors would open. Velvet ropes would part.
This game is quite clearly not sustainable and must ultimately collapse, but it's still a game with winners and losers. Historically, lots of unsustainable games have left winners standing around when the the games ultimately collapses.
(And, to be frank: It's perfectly OK to hate the game. It's also OK to hate the players and the mediators.)
An interesting piece of context with this guy is he writes about a serious hand injury that prevents him from typing much anymore. He says that adopting LLM workflows saved his hands (beyond just dictating everything).
I lost about 50% functionality in my hands in 2018-2019 and couldn't type more than an hour or so per day, what really saved me was dictation via Dragon NaturallySpeaking, and for coding I used dragonfly to create programming grammars. I'm happy for this guy for finding a solution but LLMs (in this shape) were late to the party.
Still not useful enough for me and I really want this feature!
The problem I encounter is the inability of the LLM to look stuff up and respond to me. "What's that name of that database table?" "What are all the services that call this endpoint?" "Are there any open PRs for this repo right now?"
Once information can flow in both directions not just one it will be a gamechanger for me.
These cloud LLMs are not the tool for you then I suppose. There are local models too, unless your point is why use LLMs, in which case, you don't need to.
Give each Codex an AgentName and ask them to mark their PR/issue/comments with those. Have one or two "managers" that manage PRs and overall project direction. I write the project directions and make long lasting issues. Each Codex session has an almost unachievable `/goal` but they are asked to achieve the goal by landing changes in `main` via PRs
I am running about 14 Codex sessions on 4 machines right now for about two weeks since OpenAI 10x'ed my 20x account and I simply can not run out of tokens fast enough.
Side note: I have multiple Claude accounts too but the new Claude Code `/goal` command is seriously broken. It waits long pauses between iterations and sometimes prematurely stops.
Yes I'm aware of that. Perry is a different project. Folks at Oxi tools are also doing something similar (a ts checker) but that's also not 100% compatible with tsc. My goal with tsz is a superset of tsc that's stricter (for now it's called Sound Mode). But matching more than a decade of work that TypeScript has put in has been a long journey already. None of the existing TypeScript rewrites are matching the original tsc yet. tsgo is the closest but that also has bugs that needs to be addressed before TS 7 is released.
I think my architecture can be faster than tsgo albeit a much more painful codebase to work on. But I'm not claiming any sort of achievement yet.
Ultimately users have to decide and I have to show a very strong case that someone should use a nonofficial rewrite over Microsoft's own code.
Will tsz be a success? I am not sure. Am I learning and having fun? for sure!
Yeah I didn't mean subsuming one project into another, just that maybe both could be integrated so that a user can type check strongly and soundly with tsz and then compile with Perry. Turning TypeScript into a true sound statically typed language with AOT compilation would be amazing.
The diff-as-review point is the one I keep coming back to.
The cost of memory-as-files isn't writing them. It's that the agent will cheerfully claim it updated something and not actually do it, or write a one-line stub that satisfies the spec but loses the original signal. Without a verification layer, the vault accumulates plausible-looking entries that quietly drift from reality.
What ended up working for me was treating the agent's self-reported summary as a wish, not a fact. A separate process diffs the actual file system against the claimed changes and flags mismatches.
After a few cycles, the agent gets calibrated and stops claiming things that don't survive a file check. That has the side benefit of making the diff review itself much higher signal: most of what shows up is real.
The split I'd make early is per-agent instructions vs. cross-thread shared notes.
They sound like the same artifact, but “what this agent should always do” and “what sibling work just learned” age very differently. Mixing them means the wisdom gets stale together.
inb4 "I got prompt injected and they stole my stuff". Now real talk, there are some viable usages of codex here but nothing novel its the same "old": "MEMORY,VAULT,BG TASKS" that everyone is doing.
And about voice mode, I thought it was a good idea but I seriously don't know how you guys use it, my thoughts whenever I use voice are "aaaaaaaaahhhhhh, uhmmm" and then cancel it so that I can type and organize my thoughts. I don't really think those "brain dumps" are useful when you are thinking out loud like "We should really do X oh wait but actually Y is in the way and we have to take into consideration Z, but wait Y was actually done" and so on, and it turns out that your assumptions are wrong, it becomes a mess. I am in favor of the LLM to work with facts and always verify it. To me this post is basically selling Codex app and that's it, nothing new inside.
something is happening with `codex`, at tamarillo.ai we did a [little experiment](https://research.tamarillo.ai/coding-harness-inspection/), with 400K repos that have AI harnesses configured and very interesting behavior is observed
- growing fast as fuck
- overepresentation on starred repos (even though stars mean less these days, it is definitely something to look at)
This depends on where you're looking for "successful" people.
I generally agree with you - of those people who might report "revelations" through hallucinogenic drugs, the majority may misinterpret their drug-induced experience and hence be more confused / lost than before.
On the other hand, it can still be true that among those who eventually do have genuine spiritual insight, having used hallucinogenic substances is overrepresented compared to the general population.
Quoting from [1], where the author tried to find spiritually advanced individuals:
> Approximately 52% of participants had used hallucinogenic drugs at some point; none reported these as the trigger that led to PNSE.
PNSE = Persistent Non-Symbolic Experience.
My point is: while there are certainly people who go way overboard with the LLM stuff, that is not at odds with skillful use of LLMs being overrepresented in successful people.
I see now that you didn't make that point, but I already typed this all out and I'm gonna leave it.
[1] https://digitalcommons.ciis.edu/cgi/viewcontent.cgi?article=...
The further away you go, the more sensible takes you find.
"I'm breakfastmaxxing. I'm a cerealpilled bowlcel in my milk era. I'm a slicemoded breadchad." etc.
SNL Weekend Update: Chad Maxxington on the Art of Looksmaxxing:
https://www.youtube.com/watch?v=4XMPLdiXB1k
> When I come back to Slack, replies are often already sitting in drafts. I still decide what gets sent, but the expensive part of gathering context is done.
This just feels so dystopian to me. I hope that I never work with you or someone else doing this.
I personally do use LLMs for work messaging but I'm extremely careful to state clearly like "here's a draft for that quotation request that Claude wrote:" or something like that. I would never present that as my own words.
But seriously: It's a game. If that kind of "productivity" is seen as a positive measure of their worth, then in this game they're rewarded for optimizing it.
And the game is simply fucked up.
And that's not new. Ye olde corpo rat race has always often revolved not around maximizing the things that are useful, but instead around maximizing the things that the boss-man perceives to be valuable.
Here in 2026, if the boss-man is himself boss-maxxing by using a bot to evaluate performance, this kind of automated charade would probably work very well. Champagne would fall from the heavens. Doors would open. Velvet ropes would part.
This game is quite clearly not sustainable and must ultimately collapse, but it's still a game with winners and losers. Historically, lots of unsustainable games have left winners standing around when the the games ultimately collapses.
(And, to be frank: It's perfectly OK to hate the game. It's also OK to hate the players and the mediators.)
If instead of LLM you googled do you also say "Here are the CPU architectures pytorch supports, that Google search returned"
He must be a pleasure to work with
The problem I encounter is the inability of the LLM to look stuff up and respond to me. "What's that name of that database table?" "What are all the services that call this endpoint?" "Are there any open PRs for this repo right now?"
Once information can flow in both directions not just one it will be a gamechanger for me.
Give each Codex an AgentName and ask them to mark their PR/issue/comments with those. Have one or two "managers" that manage PRs and overall project direction. I write the project directions and make long lasting issues. Each Codex session has an almost unachievable `/goal` but they are asked to achieve the goal by landing changes in `main` via PRs
I am running about 14 Codex sessions on 4 machines right now for about two weeks since OpenAI 10x'ed my 20x account and I simply can not run out of tokens fast enough.
Side note: I have multiple Claude accounts too but the new Claude Code `/goal` command is seriously broken. It waits long pauses between iterations and sometimes prematurely stops.
https://old.reddit.com/r/typescript/comments/1rjxo8z/what_if...
https://perryts.com/
I think my architecture can be faster than tsgo albeit a much more painful codebase to work on. But I'm not claiming any sort of achievement yet.
Ultimately users have to decide and I have to show a very strong case that someone should use a nonofficial rewrite over Microsoft's own code.
Will tsz be a success? I am not sure. Am I learning and having fun? for sure!
The cost of memory-as-files isn't writing them. It's that the agent will cheerfully claim it updated something and not actually do it, or write a one-line stub that satisfies the spec but loses the original signal. Without a verification layer, the vault accumulates plausible-looking entries that quietly drift from reality.
What ended up working for me was treating the agent's self-reported summary as a wish, not a fact. A separate process diffs the actual file system against the claimed changes and flags mismatches.
After a few cycles, the agent gets calibrated and stops claiming things that don't survive a file check. That has the side benefit of making the diff review itself much higher signal: most of what shows up is real.
The split I'd make early is per-agent instructions vs. cross-thread shared notes.
They sound like the same artifact, but “what this agent should always do” and “what sibling work just learned” age very differently. Mixing them means the wisdom gets stale together.
And about voice mode, I thought it was a good idea but I seriously don't know how you guys use it, my thoughts whenever I use voice are "aaaaaaaaahhhhhh, uhmmm" and then cancel it so that I can type and organize my thoughts. I don't really think those "brain dumps" are useful when you are thinking out loud like "We should really do X oh wait but actually Y is in the way and we have to take into consideration Z, but wait Y was actually done" and so on, and it turns out that your assumptions are wrong, it becomes a mess. I am in favor of the LLM to work with facts and always verify it. To me this post is basically selling Codex app and that's it, nothing new inside.
- growing fast as fuck
- overepresentation on starred repos (even though stars mean less these days, it is definitely something to look at)
- overepresentation in `rust`
- in terms of aliveness, codex is first