

Best case scenario is Gary Marcus hangs around lw just long enough to develop even more contempt for them and he starts sneering even harder in this blog.
Best case scenario is Gary Marcus hangs around lw just long enough to develop even more contempt for them and he starts sneering even harder in this blog.
Gary Marcus has been a solid source of sneer material and debunking of LLM hype, but yeah, youāre right. Gary Marcus has been taking victory laps over a bar set so so low by promptfarmers and promptfondlers. Also, side note, his negativity towards LLM hype shouldnāt be misinterpreted as general skepticism towards all AI⦠in particular Gary Marcus is pretty optimistic about neurosymbolic hybrid approaches, itās just his predictions and hypothesizing are pretty reasonable and grounded relative to the sheer insanity of LLM hypsters.
Also, new possible source of sneers in the near future: Gary Marcus has made a lesswrong account and started directly engaging with them: https://www.lesswrong.com/posts/Q2PdrjowtXkYQ5whW/the-best-simple-argument-for-pausing-ai
Predicting in advance: Gary Marcus will be dragged down by lesswrong, not lesswrong dragged up towards sanity. Heāll start to use lesswrong lingo and terminology and using P(some event) based on numbers pulled out of his ass. Maybe heāll even start to be ācharitableā to meet their norms and avoid down votes (I hope not, his snark and contempt are both enjoyable and deserved, but Iām not optimistic based on how the skeptics and critics within lesswrong itself learn to temper and moderate their criticism within the site). Lesswrong will moderately upvote his posts when he is sufficiently deferential to their norms and window of acceptable ideas, but wonāt actually learn much from him.
Unlike with coding, there are no simple ātestsā to try out whether an AIās answer is correct or not.
So for most actual practical software development, writing tests is in fact an entire job in and of itself and its a tricky one because covering even a fraction of the use cases and complexity the software will actually face when deployed is really hard. So simply letting the LLMs brute force trial-and-error their code through a bunch of tests wonāt actually get you good working code.
AlphaEvolve kind of did this, but it was testing very specific, well defined, well constrained algorithms that could have very specific evaluation written for them and it was using an evolutionary algorithm to guide the trial and error process. They donāt say exactly in their paper, but that probably meant generating code hundreds or thousands or even tens of thousands of times to generate relatively short sections of code.
Iāve noticed a trend where people assume other fields have problems LLMs can handle, but the actually competent experts in that field know why LLMs fail at key pieces.
Exactly. I would almost give the AI 2027 authors credit for committing to a hard date⦠except they already have a subtly hidden asterisk in the original AI 2027 noting some of the authors have longer timelines. And Iāve noticed lots of hand-wringing and but achkshuallies in their lesswrong comments about the difference between mode and median and mean dates and other excuses.
Like see this comment chain https://www.lesswrong.com/posts/5c5krDqGC5eEPDqZS/analyzing-a-critique-of-the-ai-2027-timeline-forecasts?commentId=2r8va889CXJkCsrqY :
My timelines move dup to median 2028 before we published AI 2027 actually, based on a variety of factors including iteratively updating our models. But it was too late to rewrite the whole thing to happen a year later, so we just published it anyway. I tweeted about this a while ago iirc.
ā¦You got your AI 2027 reposted like a dozen times to /r/singularity, maybe many dozens of times total across Reddit. The fucking vice president has allegedly read your fiction project. And you couldnāt be bothered to publish your best timeline?
So yeah, come 2028/2029, they already have a ready made set of excuse to backpedal and move back the doomsday prophecy.
So two weeks ago I linked titotalās detailed breakdown of what is wrong with AI 2027ās āmodelā (tldr; even accepting the line goes up premise of the whole thing, AI 2027ās math was so bad that they made the line always asymptote to infinity in the near future regardless of inputs). Titotal went to pretty extreme lengths to meet the ācharitabilityā norms of lesswrong, corresponding with one of the AI 2027 authors, carefully considering what they might have intended, responding to comments in detail and depth, and in general not simply mocking the entire exercise in intellectual masturbation and hype generation like it rightfully deserves.
But even with all that effort, someone still decided make an entire (long, obviously) post with a section dedicated to tone-policing titotal: https://thezvi.substack.com/p/analyzing-a-critique-of-the-ai-2027?open=false#§the-headline-message-is-not-ideal (here is the lw link: https://www.lesswrong.com/posts/5c5krDqGC5eEPDqZS/analyzing-a-critique-of-the-ai-2027-timeline-forecasts)
Oh, and looking back at the comments on titotalās post⦠his detailed elaboration of some pretty egregious errors in AI 2027 didnāt really change anyoneās mind, at most moving them back a year to 2028.
So, morale of the story, lesswrongers and rationalist are in fact not worth the effort to talk to and we are right to mock them. The numbers they claim to use are pulled out of their asses to fit vibes they already feel.
And my choice for most sneerable line out of all the comments:
And I therefore am left wondering what less shoddy toy models I should be basing my life decisions on.
Following up because the talk page keeps providing good materialā¦
Hand of Lixue keeps trying to throw around the Wikipedia rules like the other editors havenāt seen people try to weaponize the rules to push their views many times before.
Particularly for the unflattering descriptions I included, I made sure they reflect the general view in multiple sources, which is why they might have multiple citations attached. Unfortunately, that has now led to complaints about overcitation from @Hand of Lixue. You canāt win with some peopleā¦
Looking back on the original lesswrong brigade organizing discussion of how to improve the wikipedia article, someone tried explaining to Habyrka the rules then and they were dismissive.
I donāt think it counts as canvassing in the relevant sense, as I didnāt express any specific opinion on how the article should be edited.
Yes Habyrka, because you clearly have such a good understanding of the Wikipedia rules and normsā¦
Also, heavily downvoted on the lesswrong discussion is someone suggesting Wikipedia is irrelevant because LLMs will soon be the standard for āaccess to ground truthā. I guess even lesswrong knows that is bullshit.
The wikipedia talk page is some solid sneering material. Itās like Habryka and HandofLixue canāt imagine any legitimate reason why Wikipedia has the norms it does, and they canāt imagine how a neutral Wikipedian could come to write that article about lesswrong.
Eigenbra accurately calling them outā¦
āI also didnāt call for any particular editsā. You literally pointed to two sentences that you wanted edited.
Your twitter post also goes against Wikipedia practices by casting WP:ASPERSIONS. I canāt speak for any of the other editors, but I can say I have never read nor edited RationalWiki, so you might be a little paranoid in that regard.
As to your question:
Was it intentional to try to pick a fight with Wikipedians?
It seems to be ignorance on Habyrkaās part, but judging by the talk page, instead of acknowledging their ignorance of Wikipediaās reasonable policies, they seem to be doubling down.
Also lol at the 2027 guys believing anything about how grok was created.
Judging by various comments the AI 2027 authors have made, sucking up to techbro side of the alt-right was in fact a major goal of AI 2027, and, worryingly they seem to have succeeded somewhat (allegedly JD Vance has read AI 2027) but lol at the notion they could ever talk any of the techbro billionaires into accepting any meaningful regulation. They still donāt understand their doomerism is free marketing hype for the techbros, not anything any of them are actually treating as meaningfully real.
Yeah AI 2027ās model fails back of the envelope sketches as soon as you try working out any features of it, which really draws into question the competency of itās authors and everyone that has signal boosted it. Like they could have easily generated the same crit-hype bullshit with ājustā an exponential model, but for whatever reason they went with this model. (They had a target date they wanted to hit? They correctly realized adding in extraneous details would wow more of their audience? They are incapable of translating their intuitions into math? All three?)
We did make fun of titotal for the effort they put into meeting rationalist on their own terms and charitably addressing their arguments and you know, being an EA themselves (albeit one of the saner ones)ā¦
So us sneerclubbers correctly dismissed AI 2027 as bad scifi with a forecasting model basically amounting to āline goes upā, but if you end up in any discussions with people that want more detail titotal did a really detailed breakdown of why their model is bad, even given their assumptions and trying to model āline goes upā: https://www.lesswrong.com/posts/PAYfmG2aRbdb74mEp/a-deep-critique-of-ai-2027-s-bad-timeline-models
tldr; the AI 2027 model, regardless of inputs and current state, has task time horizons basically going to infinity at some near future date because they set it up weird. Also the authors make a lot of other questionable choices and have a lot of other red flags in their modeling. And the picture they had in their fancy graphical interactive webpage for fits of the task time horizon is unrelated to the model they actually used and is missing some earlier points that make it look worse.
If you wire the LLM directly into a proof-checker (like with AlphaGeometry) or evaluation function (like with AlphaEvolve) and the raw LLM outputs arenāt allowed to do anything on their own, you can get reliability. So you can hope for better, it just requires a narrow domain and a much more thorough approach than slapping some extra firm instructions in an unholy blend of markup languages in the prompt.
In this case, solving math problems is actually something Google search could previously do (before dumping AI into it) and Wolfram Alpha can do, so it really seems like Google should be able to offer a product that does math problems right. Of course, this solution would probably involve bypassing the LLM altogether through preprocessing and post processing.
Also, btw, LLM can be (technically speaking) deterministic if the heat is set all the way down, its just that this doesnāt actually improve their performance at math or anything else. And it would still be ārandomā in the sense that minor variations in the prompt or previous context can induce seemingly arbitrary changes in output.
Have they fixed it as in genuinely uses python completely reliably or āfixedā it, like they tweaked the prompt and now it use python 95% of the time instead of 50/50? Iām betting on the later.
We barely understsnd how LLMs actually work
I would be careful how you say this. Eliezer likes to go on about giant inscrutable matrices to fearmoner, and the promptfarmers use the (supposed) mysteriousness as another avenue for crithype.
Itās true reverse engineering any specific output or task takes a lot of effort and requires access to the modelās internals weights and hasnāt been done for most tasks, but the techniques exist for doing so. And in general there is a good high level conceptual understanding of what makes LLMs work.
which means LLMs donāt understand their own functioning (not that they āunderstandā anything strictly speaking).
This part is absolutely true. If you catch them in mistake, most of their data about responding is from how humans respond, or, at best fine-tuning on other LLM output and they donāt have any way of checking their own internals, so the words they say in response to mistakes is just more bs unrelated to anything.
Example #āIāve lost countā of LLMs ignoring instructions and operating like the bullshit spewing machines they are.
Another thing thatās been annoying me about responses to this paper⦠lots of promptfondlers are suddenly upset that we are judging LLMs by abitrary puzzle solving capabilities⦠as opposed to the arbitrary and artificial benchmarks they love to tout.
So, Iāve been spending too much time on subreddits with heavy promptfondler presence, such as /r/singularity, and the reddit algorithm keeps recommending me subreddit with even more unhinged LLM hype. One annoying trend Iāve noted is that people constantly conflate LLM-hybrid approaches, such as AlphaGeometry or AlphaEvolve (or even approaches that donāt involve LLMs at all, such as AlphaFold) with LLMs themselves. From their they act like of course LLMs can [insert things LLMs canāt do: invent drugs, optimize networks, reliably solve geometry exercise, etc.].
Like I saw multiple instances of commenters questioning/mocking/criticizing the recent Apple paper using AlphaGeometry as a counter example. AlphaGeometry can actually solve most of the problems without an LLM at all, the LLM component replaces a set of heuristics that make suggestions on proof approaches, the majority of the proof work is done by a symbolic AI working with a rigid formal proof system.
I donāt really have anywhere Iām going with this, just something I noted that I donāt want to waste the energy repeatedly re-explaining on reddit, so Iām letting a primal scream out here to get it out of my system.
Just one more training run bro. Just gotta make the model bigger, then it can do bigger puzzles, obviously!
Optimistically, heās merely giving into the urge to try to argue with people: https://xkcd.com/386/
Pessimistically, he realized how much money is in the doomer and e/acc grifts and wants in on it.