HNNewShowAskJobs
Built with Tanstack Start
Search tool that only returns content created before ChatGPT's public release(tegabrain.com)
442 points by dmitrygr 7 hours ago | 156 comments
  • shevy-java2 hours ago

    > This is a search tool that will only return content created before ChatGPT's first public release on November 30, 2022.

    The problem is that Google's search engine - but, oddly enough, ALL search engines - got worse before that already. I noticed that search engines got worse several years before 2022. So, AI further decreased the quality, but the quality had a downwards trend already, as it was. There are some attempts to analyse this on youtube (also owned by Google - Google ruins our digital world); some explanations made sense to me, but even then I am not 100% certain why Google decided to ruin google search.

    One key observation I made was that the youtube search, was copied onto Google's regular search, which makes no sense for google search. If I casually search for a video on youtube, I may be semi-interested in unrelated videos. But if I search on Google search for specific terms, I am not interested in crap such as "others also searched for xyz" - that is just ruining the UI with irrelevant information. This is not the only example, Google made the search results worse here and tries to confuse the user in clicking on things. Plus placement of ads. The quality really worsened.

    • benterix5 minutes ago |parent

      > if I search on Google search for specific terms, I am not interested in crap such as "others also searched for xyz" - that is just ruining the UI with irrelevant information

      You assume the aim here is for you to find relevant information, not increase user retention time. (I just love the corporate speak for making people's lives worse in various ways.)

    • justinclift2 hours ago |parent

      Are you aware of Kagi (kagi.com)?

      With them, at least the AI stuff can be turned off.

      Membership is presently about 61k, and seems to be growing about 2k per month: https://kagi.com/stats

      • ameliusan hour ago |parent

        Be aware of:

        https://www.reddit.com/r/SearchKagi/comments/1gvlqhm/disappo...

        • super2563 minutes ago |parent

          [delayed]

        • smusamashahan hour ago |parent

          There are few other powerful countries, with countless Web services, who freely wages war(s) on other countries and support wars in many different ways. Is there a way to avoid their products?

          • jwr7 minutes ago |parent

            Whataboutism doesn't get us anywhere — saying "but what about X" (insert anything for X here) usually results in doing nothing.

            Some of us would rather take a stand, imperfect as it is, than just sit and do nothing. Especially in the very clear case of someone (Kagi) doing business with a country that invaded a neighboring country for no reason, and keeps killing people there.

        • eirini1an hour ago |parent

          I don't agree with this logic. It implies that people who use Google, Bing and a million other products made by US-based companies are supportive of the huge amount of attrocities commited or aided by the United States. Or other countries. It feels very odd to single out Russia's invasion of Ukraine but to minimize the Israeli genocide of palestinians in Gaza, the multiple unjust wars waged by the United States all over the world etc.

        • justincliftan hour ago |parent

          Damn. I didn't know that.

          Now we need a 2nd Kagi, so we can switch to that one instead. :(

    • groundzeros20156 minutes ago |parent

      Significant changes were made to Google and YouTube in 2016 and 2017 in response to the US election. The changes provided more editorial and reputation based filtering, over best content matching.

    • Maken2 hours ago |parent

      There is also the fact that automatically generated content predates ChatGPT by a lot. By around 2020 most Google searches already returned lots of SEO-optimized pages made from scrapped content or keyword soups made by rudimentary language models or markov chains.

      • black3r32 minutes ago |parent

        Well there's also the fact that GPT-3 API was released in June 2020 and its writing capabilities were essentially on par with ChatGPT initial release. It was just a bit harder to use, because it wasn't yet trained to follow instructions, it only worked as a very good "autocomplete" model, so prompting was a bit "different" and you couldn't do stuff like "rewrite this existing article in your own words" at all, but if you just wanted to write some bullshit SEO spam from scratch it was already as good as ChatGPT would be 2 years later.

        • wongarsu26 minutes ago |parent

          Also the full release of GPT-2 in late 2019. While GPT-2 wasn't really "good" at writing, it was more than good enough to make SEO spam

    • robot-wrangleran hour ago |parent

      > Google made the search results worse here

      Did you mean:

      worse results near me

      are worse results worth it

      worse results net worth

      best worse results

      worse results reddit

    • master-lincolnan hour ago |parent

      I think this is about trustworthy content, not about a good search engine per se

      • trinix912an hour ago |parent

        But it's not necessarily trustworthy content, we had autogenerated listicles and keyword list sites before ChatGPT.

        • GTPan hour ago |parent

          Sure, but I think that the underlying assumption is that, after the public release of ChatGPT, the amount of autogenerated content on the web became significantly bigger. Plus, the auto-generated content was easier to spot before.

    • zipy124an hour ago |parent

      Honestly the biggest failing is just SEO spam sites got too good at defeating the algorithm. The amount of bloody listicles or quora nonsense or backlink farming websties that come up in search is crazy.

      • Nextgrid6 minutes ago |parent

        This is bullshit the search engines want you to believe. It's trivial to detect sites that "defeat" the algorithm; you simply detect their incentives (ads/affiliate links) instead.

        Problem is that no mainstream search engine will do it because they happen to also be in the ad business and wouldn't want to reduce their own revenue stream.

    • bratwurst30002 hours ago |parent

      the main theory is that with bad results you have to search more and get more engaged in ads so more revenue for google. Its enshitification

  • Roritharr2 minutes ago

    I hope there's an uncensored version of the Internet Archive somewhere, I wish I could look at my website ca. 2001, but I think it got removed because of some fraudulent DMCA claim somewhere in the early 2010s.

  • keiferski42 minutes ago

    Projects like this remind me of a plot point in the Cyberpunk 2077 game universe. The "first internet" got too infected with dangerous AIs, so much so that a massive firewall needed to be built, and a "new" internet was built that specifically kept out the harmful AIs.

    (Or something like that: it's been awhile since I played the game, and I don't remember the specific details of the story.)

    It makes me wonder if a new human-only internet will need to be made at some point. It's mostly sci-fi speculation at this point, and you'd really need to has out the details, but I am thinking of something like a meatspace-first network that continually verifies your humanity in order for you to retain access. That doesn't solve the copy-paste problem, or a thousand other ones, but I'm just thinking out loud here.

    • jascha_eng35 minutes ago |parent

      The problem really is that it is impossible to verify that the content someone uploads came from their mind and not a computer program. And at some point probably all content is at least influenced by AI. The real issue is also not that I used chatgpt to look up a synonym or asked a question before writing an article, the problem is when I copy paste the content and claim I wrote it.

    • lukebuehler12 minutes ago |parent

      Arguably this is already happening with much human-to-human interactions moving to private groups on Signal, WhatsApp, Telegram, etc.

  • swyx6 hours ago

    somebody said once we are mining "low-background tokens" like we are mining low-background (radiation) steel post WW2 and i couldnt shake the concept out of my head

    (wrote up in https://www.latent.space/i/139368545/the-concept-of-low-back... - but ironically repeating something somebody else said online is kinda what i'm willingly participating in, and it's unclear why human-origin tokens should be that much higher signal than ai-origin ones)

    • alansabera minute ago |parent

      Since synthetic data for training is pretty ubiquitous seems like a novelty

    • mwidell3 hours ago |parent

      Low background steel is no longer necessary.

      "...began to fall in 1963, when the Partial Nuclear Test Ban Treaty was enacted, and by 2008 it had decreased to only 0.005 mSv/yr above natural levels. This has made special low-background steel no longer necessary for most radiation-sensitive uses, as new steel now has a low enough radioactive signature."

      https://en.wikipedia.org/wiki/Low-background_steel

      • juvoly3 hours ago |parent

        Interesting. I guess that analogously, we might find that X years after some future AI content production ban, we could similarly start ignoring the low background token issue?

        • actionfromafar3 hours ago |parent

          We used a rather low number of atmospheric bombs, while we are carpet bombing the internet every day with AI marketing copy.

          • MadnessASAPan hour ago |parent

            The eternal September has finally ended. We've now entered the AI winter. It promises to be long, dark, and full of annoyances.

        • piker6 minutes ago |parent

          What’s the half-life of a viral meme?

      • doe882 hours ago |parent

        Can't wait, in fifty years we will have our data clean again.

    • jrjfjgkrj5 hours ago |parent

      every human generation built upon the slop of the previous one

      but we appreciated that, we called it "standing on the shoulders of giants"

      • bigiain4 hours ago |parent

        > we called it "standing on the shoulders of giants"

        We do not see nearly so far though.

        Because these days we are standing on the shoulders of giants that have been put into a blender and ground down into a slippery pink paste and levelled out to a statistically typical 7.3mm high layer of goo.

        • _kb3 hours ago |parent

          The secret is you then have to heat up that goo. When the temperature gets high enough things get interesting again.

          • pseidemannan hour ago |parent

            Just simulate some evolution here and there.

          • gilleainan hour ago |parent

            You get Flubber?

      • shevy-java2 hours ago |parent

        This sounds like an Alan Kay quote. He meant that in regards to useful inventions. AI-generated spam just decreases the quality. We'd need a real alternative to this garbage from Google but all the other search engines are also bad. And their UI is also horrible - not as bad as Google, but also bad. Qwant just tries to copy/paste Google for instance (though interestingly enough, sometimes it has better results than Google - but also fewer in general, even ignornig false positive results).

      • groestl3 hours ago |parent

        We have two optimization mechanisms though which reduce noise with respect to their optimization functions: evolution and science. They are implicitly part of "standing on the shoulders of giants", you pick the giant to stand on (or it is picked for you).

        Whether or not the optimization functions align with human survival, and thus our whole existence is not a slop, we're about to find out.

      • ben_w2 hours ago |parent

        There's a reason this is comedy:

          Listen, lad. I built this kingdom up from nothing. When I started here, all there was was swamp. Other kings said I was daft to build a castle on a swamp, but I built it all the same, just to show 'em. It sank into the swamp. So, I built a second one. That sank into the swamp. So, I built a third one. That burned down, fell over, then sank into the swamp, but the fourth one... stayed up! And that's what you're gonna get, lad: the strongest castle in these islands.
        
        While this is religious:

          [24] “Everyone then who hears these words of mine and does them will be like a wise man who built his house on the rock. [25] And the rain fell, and the floods came, and the winds blew and beat on that house, but it did not fall, because it had been founded on the rock. [26] And everyone who hears these words of mine and does not do them will be like a foolish man who built his house on the sand. [27] And the rain fell, and the floods came, and the winds blew and beat against that house, and it fell, and great was the fall of it.”
        
        Humans build not on each other's slop, but on each other's success.

        Capitalism, freedom of expression, the marketplace of ideas, democracy: at their best these things are ways to bend the wisdom of the crowds (such as it is) to the benefit of all; and their failures are when crowds are not wise.

        The "slop" of capitalism is polluted skies, soil and water, are wage slaves and fast fashion that barely lasts one use, and are the reason why workplace health and safety rules are written in blood. The "slop" of freedom of expression includes dishonest marketing, libel, slander, and propaganda. The "slop" of democracy is populists promising everything to everyone with no way to deliver it all. The "slop" of the marketplace of ideas is every idiot demanding their own un-informed rambling be given the same weight as the considered opinions of experts.

        None of these things contributed our social, technological, or economic advancement, they are simply things which happened at the same time.

        AI has stuff to contribute, but using it to make an endless feed of mediocrity is not it. The flood of low-effort GenAI stuff filling feeds and drowning signal with noise, as others have said: just give us your prompt.

      • rebuilder4 hours ago |parent

        That's because the things we built on weren't slop

      • pseidemannan hour ago |parent

        You may have one point.

        The industrial age was built on dinosaur slop, and they were giant.

      • hoppp4 hours ago |parent

        You can't build on slop because slop is a slippery slope

      • kgwgk4 hours ago |parent

        Nothing conveys better the idea of a solid foundation to build upon than the word ‘slop’.

      • walrusted4 hours ago |parent

        the only structure you can build with slop is a burial mound

      • teiferer4 hours ago |parent

        Because the pyramids, the theory of general relativity and the Linux kernel are all totally comparable to ChatGPT output. /s

        Why is anybody still surprised that the AI bubble made it that big?

        • jrjfjgkrj4 hours ago |parent

          for every theory of relativity the is the religious non-sense and superstitions of the medieval ages or today

          • JumpCrisscross3 hours ago |parent

            > for every theory of relativity the is the religious non-sense and superstitions of the medieval ages or today

            If Einstein came up with relativity by standing on "the religious non-sense and superstitions of the medieval ages," you'd have a point.

    • jeffchuber6 hours ago |parent

      that was me swyx

      • rollulus5 hours ago |parent

        Multiple people have coined the idea repeatedly, way before you. The oldest comment on HN I could find was in December 2022 by user spawarotti: https://news.ycombinator.com/item?id=33856172

        • threeducks2 hours ago |parent

          Here is an even older comment chain about it from 2020: https://news.ycombinator.com/item?id=23895706

          Apparently, comparing low-background steel to pre-LLM text is a rather obvious analogy.

          • pseidemann2 hours ago |parent

            As well as that people often do think alike.

            If you have a thought, it's likely it's not new.

          • rollulus2 hours ago |parent

            Oh wow, great find! That’s really early days.

  • tkgally6 hours ago

    Somewhat related, the leaderboard of em-dash users on HN before ChatGPT:

    https://www.gally.net/miscellaneous/hn-em-dash-user-leaderbo...

    • maplethorpe6 hours ago |parent

      They should include users who used a double hyphen, too -- not everyone has easy access to em dashes.

      • bigiain4 hours ago |parent

        That would false positive me. I have used double dashes to delimit quote attribution for decades.

        Like this:

        "You can't believe everything you read on the internet." -- Abraham Lincoln, personal correspondence, 1863

      • gblargg5 hours ago |parent

        Does AI use double hyphens? I thought the point was to find who wasn't AI that used proper em dashes.

        • jader2015 hours ago |parent

          Anytime I do this — and I did it long before AI did — they are always em dashes, because iOS/macOS translates double dashes to em dashes.

          I think there may be a way to disable this, but I don’t care enough to bother.

          If people want to think my posts are AI generated, oh well.

          • JumpCrisscross3 hours ago |parent

            > Anytime I do this — and I did it long before AI did — they are always em dashes

            It depends if you put the space before and after the dashes--that, to be clear, are meant to be there--or if you don't.

            • oniony3 hours ago |parent

              I cannot remember ever reading a book where there was a space around the dashes.

              • kuschku3 hours ago |parent

                That depends on the language — whereas German puts spaces around —, English afaik usually doesn’t.

                Similarly, French puts spaces before and after ? ! while English and German only put spaces afterwards.

                [EDIT: I originally wrote that French treats . , ! ? specially. In reality, french only treats ? and ! specially.]

                • greeniconan hour ago |parent

                  In German you use en-dashes with spaces, whereas in English it’s em-dashes without spaces. Some people dislike em-dashes in English though and use en-dashes with spaces as well.

                • iLoveOncall2 hours ago |parent

                  French doesn't put one before the period.

                • bratwurst3000an hour ago |parent

                  french does "," and "." like the british and germans the rest is space befor space after

              • LoganDark3 hours ago |parent

                Technically, there are supposed to be hair spaces around the dashes, not regular spaces. They're small enough to be sometimes confused for kerning.

                • cachius2 hours ago |parent

                  Em dashes used as parenthetical dividers, and en dashes when used as word joiners, are usually set continuous with the text. However, such a dash can optionally be surrounded with a hair space, U+200A, or thin space, U+2009 or HTML named entities   and   These spaces are much thinner than a normal space (except in a monospaced (non-proportional) font), with the hair space in particular being the thinnest of horizontal whitespace characters.

                  https://en.wikipedia.org/wiki/Whitespace_character#Hair_spac...

                  Typographers usually add space to the left side of the following marks:

                      : ; ” ’ ! ? / ) ] } * ¿ › » @ ® ™ ℓ ° ¡ ' " † + = ÷ - – —
                  
                  And they usually add space to the right of these:

                      “ ‘ / ( [ { > ≥ < ≤ £ $ ¢ € ‹ « √ μ # @ + = ÷ - – —
                  
                  https://www.smashingmagazine.com/2020/05/micro-typography-sp...

                  1. (letterpress typography) A piece of metal type used to create the narrowest space. 2. (typography, US) The narrowest space appearing between letters and punctuation.

                  https://en.wiktionary.org/wiki/hair_space

                  Now I'd like to see how the metal type looks like, but ehm... it's difficult googling it. Also a whole collection of space types and what they're called in other languages.

            • fragmede3 hours ago |parent

              What, no love for our friend the en-dash?

              - vs – vs —

              • chickensong3 hours ago |parent

                I once spent a day debugging some data that came from an English doc written by someone in Japan that had been pasted into a system and caused problems. Turned out to be an en-dash issue that was basically invisible to the eye. No love for en-dash!

                • 171862744022 minutes ago |parent

                  This issue also exists with (so called) "smart" quotes.

                  • fragmede3 minutes ago |parent

                    Which, the iOS keyboard “helpfully” uses for you.

                • ben_w2 hours ago |parent

                  Similar.

                  Compiler error while working on some ObjC. Nothing obviously wrong. Copy-pasted the line, same thing on the copy. Typed it out again, no issue with the re-typed version. Put the error version and the ok version next to each other, apparently identical.

                  I ended up discovering I'd accidentally lent on the option key while pressing the "-"; Monospace font, Xcode, m-dash and minus looked identical.

          • teiferer4 hours ago |parent

            There is also the difference in using space around em-dashes.

      • venturecruelty6 hours ago |parent

        Oof, I feel like you'll accidentally capture a lot of getopt_long() fans. ;)

        • Kinrany5 hours ago |parent

          Excluding those with asymmetrical whitespace around might be enough

    • a5c113 hours ago |parent

      Apparently, it's not only em-dash that's distinctive. I've went through comments of the leader, and spot he also uses the backtick "’" instead of the apostrophe.

      • baiwl3 hours ago |parent

        Just to be clear this is done automatically by macOS or iOS browsers when configured properly.

      • kuschku3 hours ago |parent

        I (~100 in the leaderboard, regardless of how you sort) also frequently use ’ (unicode apostrophe) instead of ' :D

  • permo-w6 hours ago

    besides for training future models, is this really such a big deal? most of the AI-gened text content is just replacing content-farm SEO-spam anyway. the same stuff that any half-awares person wouldn't have read in the past is now slightly better written, using more em dashes and instances of the word "delve". if you're consistently being caught out by this stuff then likely you need to improve your search hygiene, nothing so drastic as this

    the only place I've ever had any issue with AI content is r/chess, where people love to ask ChatGPT a question and then post the answer as if they wrote it, half the time seemingly innocently, which, call me racist, but I suspect is mostly due to the influence of the large and young Indian contingent. otherwise I really don't understand where the issue lies. follow the exact same rules you do for avoiding SEO spam and you will be fine

    • Cadwhisker5 hours ago |parent

      In the past, I'd find one wrong answer and I could easily spot the copies. Now there's a dozen different sites with the same wrong answer, just with better formatting and nicer text.

      • finaard4 hours ago |parent

        The trick is to only search for topics where there are no answers, or only one answer leading to that blog post you wrote 10 years ago and forgot about.

    • never_inline2 hours ago |parent

      A colleague sent me a confident ChatGPT formatted bug report.

      It misidentified what the actual bug was.

      But the tone was so confident, and he replied to my later messages using chat gpt itself, which insisted I was wrong.

      I don't like this future.

      • blitzar2 hours ago |parent

        I have dozens of these over the years - many of the people responsible have "Head of ..." or "Chief ..." job titles now.

    • darkwater3 hours ago |parent

      > besides for training future models, is this really such a big deal? most of the AI-gened text content is just replacing content-farm SEO-spam anyway.

      Yes, it is because of the other side of the coin. If you are writing human-generated, curated content, previously you would just do it in your small patch of Internet, and probably SEs (Google...) will pick it up anyway because it was good quality content. You just didn't care about SEO-driven shit anyway. Now you nicely hand-written content is going to be fed into LLM training and it's going to be used - whatever you want it or not - in the next generation of AI slop content.

    • pajamasam5 hours ago |parent

      SEO-spam was often at least somewhat factual and not complete generated garbage. Recipe sites, for example, usually have a button that lets you skip the SEO stuff and get to the actual recipe.

      Also, the AI slop is covering almost every sentence or phrase you can think of to search. Before, if I used more niche search phrases and exact searches, I was pretty much guaranteed to get specific results. Now, I have to wade through pages and pages of nonsense.

    • zwnow3 hours ago |parent

      Yes it is a big deal. I cant find new artists without having a fear of their art being AI generated, same for books and music. I also cant post my stuff to the internet anymore because I know its going to be fed into LLM training data. The internet is dead to me mostly and thankfully I lost almost all interest of being on my computer as much as I used to be.

    • system26 hours ago |parent

      Yes indeed, it is a problem. Now the old good sites have turned into AI-slop sites because they can't fight the spammers by writing slowly with humans.

  • themanmaran6 hours ago

    The low-background steel of the internet

    https://en.wikipedia.org/wiki/Low-background_steel

    • HelloUsername4 hours ago |parent

      As mentioned half a year ago at https://news.ycombinator.com/item?id=44239481

      • thm2 hours ago |parent

        As mentioned 7 months ago https://news.ycombinator.com/item?id=43811732

  • tobr5 hours ago

    For images, https://same.energy is a nice option that, being abandoned but still functioning since a few years, seems to naturally not have crawled any AI images. And it’s all around a great product.

  • zkmon3 hours ago

    Most of college courses and school books haven't changed in decades. Some reputed college keep courses for Pascal and Fortran instead of Python or Java, just because, it might affect their reputation of being classical or pure or to match their campus buildings style.

    • fastasucanan hour ago |parent

      Or because the core knowledge stay the same no matter how it is expressed.

  • dinkblam2 hours ago

    google results were already 90% SEO crap long before ChatGPT

    just use Kagi and block all SEO sites...

    • paweladamczuk2 hours ago |parent

      How do we (or Kagi) know which ones are "SEO sites"? Is there some filter list or other method to determine that?

  • anticensor6 hours ago

    You should call it Predecember, referring to the eternal December.

    • unfunco6 hours ago |parent

      September?

      • littlestymaar6 hours ago |parent

        ChatGPT was released exactly 3 years ago (on the 30th of November) so December it is in this context.

        • permo-w6 hours ago |parent

          surely that would be eternal November then

          • littlestymaar5 hours ago |parent

            No, being released on Nov 30th means November was still before the slop era.

            • retsibsi4 hours ago |parent

              In the end the analogy doesn't really work, because 'eternal September' referred to what used to be a regular, temporary thing (an influx of noobs disrupting the online culture, before eventually leaving or assimilating) becoming the new normal. 'Eternal {month associated with ChatGPT}' doesn't fit because LLM-generated content was never a periodic phenomenon.

            • permo-w4 hours ago |parent

              to be honest, GPT-3, which was pretty solid and extremely capable of producing webslop, had been out for a good while before ChatGPT, and GPT-2 even had been being used for blogslop years before. maybe ChatGPT was the beginning of when the public became aware of it, but it was going on well beforehand. and, as the sibling commenter points out, the analogy doesn't quite fit structurally either

            • AlecSchueler4 hours ago |parent

              Yes, and this site is for everything before the slop era, hence eternal November.

  • GaryBluto6 hours ago

    Why use this when you can use the before: syntax on most search engines?

    • aDyslecticCrow2 hours ago |parent

      doesn't actually do anything anymore in Google or bing.

  • ricardo815 hours ago

    FWIW Mojeek (an organic search engine in the classic sense) can do this with the before: operator.

    https://www.mojeek.com/search?q=britney+spears+before%3A2010...

  • defraudbah3 hours ago

    ChatGPT also returns content only created before ChatGPT release, which is why I still have to google damn it!

    • stinos2 hours ago |parent

      Is that still the case? And even if so how is it going to avoid keeping it like that in the future? Are they going to stop scraping new content, or are they going to filter it with a tool which recognizes their own content?

      • defraudbah31 minutes ago |parent

        it's a known problem in ML, I think grok solved it partially and chatGPT uses another model on top to search web like suggested below. Hence MLOps field appeared, to solve models management

        I find it a bit annoying to navigate between hallucinations and outdated content. Too much invalid information to filter out.

    • fragmede3 hours ago |parent

      Click the globe icon below the input box to enable web searching by ChatGPT.

  • 1gn157 hours ago

    Does this filter out traditional SEO blogfarms?

    • JKCalhoun6 hours ago |parent

      Yeah, might prefer AI-slop to marketing-slop.

      • al_borland6 hours ago |parent

        They are the same. I was looking for something and tried AI. It gave me a list of stuff. When I asked for its sources, it linked me to some SEO/Amazon affiliate slop.

        All AI is doing is making it harder to know what is good information and what is slop, because it obscures the source, or people ignore the source links.

        • venturecruelty6 hours ago |parent

          I've started just going to more things in person, asking friends for recommendations, and reading more books (should've been doing all of these anyway). There are some niche communities online I still like, and the fediverse is really neat, but I'm not sure we can stem the Great Pacific Garbage Patch-levels of slop, at this point. It's really sad. The web, as we know and love it, is well and truly dead.

  • phplovesong3 hours ago

    The slop is getting worse, as there is so much llm generated shit online, now new models are getting trained on the slop. Slop training slop, and slop. We have gone full circle just in a matter of a few years.

    • muixoozie2 hours ago |parent

      I was replaying Cyberpunk 2077 and trying to think of all the ways one might have dialed up the dystopia to 11 (beyond what the game does). And pervasive AI slop was never on my radar. Kinda reminds me of the foreword in Neuromancer bringing attention to the fact the book was written before cellphones became popular. It's already fucking with my mind. I recently watched Frankenstein 2025 and 100% thought gen ai had a role in the CGI only to find out the director hates it so much he rather die than use it. I've been noticing little things in old movies and anime where I thought to myself (if I didn't know this was made before gen ai, I would have thought this was generated for sure). One example (https://www.youtube.com/watch?v=pGSNhVQFbOc&t=412) cityscape background in this a outro scene with buildings built on top of buildings gave me ai vibes (really the only thing in this whole anime), yet this came out ~1990. So I can already recognize a paranoia / bias in myself and really can't reliably tell what's real.. Probably also other people have this and why some non-zero number of people always thinks every blog post that comes out was written by gen ai.

  • RomanPushkin4 hours ago

    For that purpose I do not update my book on LeanPub about Ruby. I just know one day people gonna read it more, because human-written content would be gold.

  • progman326 hours ago

    Not affiliated, but I've been using kagi's date range filter to similar effect. The difference in results for car maintenance subjects is astounding (and slightly infuriating).

  • voiper15 hours ago

    Of course my first thought was: Let's use this as a tool for AI searches (when I don't need recent news).

  • pknerd4 hours ago

    Something generated by humans does not mean high quality.

    • Krssst4 hours ago |parent

      Yes, but AI-generated is always low quality so it makes sense to filter it out.

      • IshKebab4 hours ago |parent

        I wouldn't say always... Especially because you probably only noticed the bad slop. Usually it is crap though.

    • a5c113 hours ago |parent

      At least when reading a human-made material you can spot author's uncertainty in some topics. Usually, when someone doesn't have knowledge of something, he doesn't try to describe that. AI, however, will try to convince you that pigs can fly.

  • theodrican hour ago

    This tool has no future. We have that in common with it, I fear.

    What we really need to do is build an AI tool to filter out the AI automatically. Anybody want to help me found this company?

  • EGreg2 hours ago

    Can't we just append "before:2021-01-01" to Google?

    I use this to find old news articles for instance.

  • ETH_start3 hours ago

    I'm grateful that I published a large body of content pre-ChatGPT so that I have proof that I'm not completely inarticulate without AI.

  • cryptozeus4 hours ago

    technically you can ask chatgpt to return the same result by asking it to filter by year

  • johng7 hours ago

    I don't know how this works under the hood but it seems like no matter how it works, it could be gamed quite easily.

    • qwertygnu6 hours ago |parent

      True, but there's probably many ways to do this and unless AI content starts falsifying tons of its metadata (which I'm sure would have other consequences), there's definitely a way.

      Plus other sites that link to the content could also give away it's date of creation, which is out of the control of the AI content.

      • layman516 hours ago |parent

        I have heard of a forum (I believe it was Physics Forums) which was very popular in the older days of the internet where some of the older posts were actually edited so that they were completely rewritten with new content. I forget what the reasoning behind it was, but it did feel shady and unethical. If I remember correctly, the impetus behind it was that the website probably went under new ownership and the new owners felt that it was okay to take over the accounts of people who hadn't logged on in several years and to completely rewrite the content of their posts.

        I believe I learned about it through HN, and it was this blog post: https://hallofdreams.org/posts/physicsforums/

        It kind of reminds me of why some people really covet older accounts when they are trying to do a social engineering attack.

        • joshuaissac3 hours ago |parent

          > website probably went under new ownership

          According to the article, it was the founder himself who was doing this.

    • cryzinger6 hours ago |parent

      If it's just using Google search "before <x date>" filtering I don't think there's a way to game it... but I guess that depends on whether Google uses the date that it indexed a page versus the date that a page itself declares.

      • madars6 hours ago |parent

        Date displayed in Google Search results is often the self-described date from the document itself. Take a look at this "FOIA + before Jan 1, 1990" search: https://www.google.com/search?q=foia&tbs=cdr:1,cd_max:1/1/19...

        None of these documents were actually published on the web by then, incl., a Watergate PDF bearing date of Nov 21, 1974 - almost 20 years before PDF format got released. Of course, WWW itself started in 1991.

        Google Search's date filter is useful for finding documents about historical topics, but unreliable for proving when information actually became publicly available online.

        • littlestymaar6 hours ago |parent

          Are you sure it works the same way for documents that Google indexed at the time of publication? (Because obviously for things that existed before Google, they had to accept the publication date at face value).

          • madars4 hours ago |parent

            Yes, it works the same way even for content Google indexed at publication time. For example, here are chatgpt.com links that Google displays as being from 2010-2020, a period when Google existed but ChatGPT did not:

            https://www.google.com/search?q=site%3Achatgpt.com&tbs=cdr%3...

            So it looks like Google uses inferred dates over its own indexing timestamps, even for recently crawled pages from domains that didn't exist during the claimed date range.

    • CGamesPlay6 hours ago |parent

      "Gamed quite easily" seems like a stretch, given that the target is definitionally not moving. The search engine is fundamentally searching an immutable dataset that "just" needs to be cleaned.

      • johng4 hours ago |parent

        How? They have an index from a previous date and nothing new will be allowed since that date? A whole copy of the internet? I don't think so.... I'm guessing, like others, it's based on the date the user/website/blog lists in the post. Which they can change at any time.

        • fragmede4 hours ago |parent

          Yes they do. It's called common crawl, and is available from your chosen hyperscaler vendor.

  • k_roy6 hours ago

    You know what's almost worse than AI generated slop?

    Every corner of the Internet now screaming about AI generated slop, whenever a single pixel doesn't line up.

    It's just another generation of technology. And however much nobody might like it, it is here to stay. Same thing happened with airbrushing, and photoshop, and the Internet in general.

    • maplethorpe6 hours ago |parent

      Is it really here to stay? If the wheels fells off the investment train and ChatGPT etc. disappeared tomorrow, how many people would be running inference locally? I suspect most people either wouldn't meet the hardware requirements or would be too frustrated with the slow token generation to bother. My mom certainly wouldn't be talking to it anymore.

      Remember that a year or two ago, people were saying something similar about NFTs —that they were the future of sharing content online and we should all get used to it. Now, they still might exist, it's true, but they're much less pervasive and annoying than they once were.

      • Daz9124 hours ago |parent

        >that they were the future of sharing content online

        nobody was saying that

        • sethops13 hours ago |parent

          People right here on HN were adamant my next house would be purchased using an NFT. And similar absurd claims about blockchain before that.

      • fragmede4 hours ago |parent

        Maybe you don't love your mom enough to do this, but if ChatGPT disappeared tomorrow and it was something she really used and loved, I wouldn't think twice before buying her a rig powerful enough to run a quantized downlodable model on, though I'm not current on which model or software would be the best for her purposes. I get that your relationship with your mother, or your financial situation might be different though.

        • maplethorpean hour ago |parent

          > Maybe you don't love your mom enough to do this

          I actually love my mom enough not to do this.

        • Yeask2 hours ago |parent

          Maybe you should talk more to your mother so she does not need a imaginary friend.

        • never_inline2 hours ago |parent

          Please tell me this is satire.

          • Yeask2 hours ago |parent

            Is just your average AI user. Too much "your are right" makes them detached from reality.

        • exasperaitedan hour ago |parent

          > I get that your relationship with your mother, or your financial situation might be different though.

          Fucking hell

    • stinosan hour ago |parent

      I don't agree it is 'almost worse' than the slop but it sure can be annoying. On one hand it seems even somewhat positive that some people developed a more critical attitude and question things they see, on the other hand they're not critical enough to realize their own criticism might be invalid. Plus I feel bad for all the resources (both human and machine) wasted on this. Like perfectly normal things being shown, but people not knowing anything about the subject chiming in to claim that it must be AI because they see something they do not fully understand.

    • rockskon6 hours ago |parent

      "You know what's almost worse than something bad? People complaining about something bad."

      • k_roy6 hours ago |parent

        Shrug. Sure.

        Point still stands. It’s not going anywhere. And the literal hate and pure vitriol I’ve seen towards people on social media, even when they say “oh yeah; this is AI”, is unbelievable.

        So many online groups have just become toxic shitholes because someone once or twice a week posts something AI generated

        • venturecruelty6 hours ago |parent

          The entire US GDP for the last few quarters is being propped up by GPU vendors and one singular chatbot company, all betting that they can make a trillion dollars on $20-per-month "it's not just X, it's Y" Markov chain generators. We have six to 12 more months of this before the first investor says "wait a minute, we're not making enough money", and the house of cards comes tumbling down.

          Also, maybe consider why people are upset about being consistently and sneakily lied to about whether or not an actual human wrote something. What's more likely: that everyone who's angry is wrong, or that you're misunderstanding why they're upset?

          • permo-w4 hours ago |parent

            I feel like this is the kind of dodgy take that'll be dispelled by half an hour's concerted use of the thing you're talking about

            short of massive technological regression, there's literally never going to be a situation where the use of what amounts to a second brain with access to all the world's public information is not going to be incredibly marketable

            I dare you to try building a project with Cursor or a better cousin and then come back and repeat this comment

            >What's more likely: that everyone who's angry is wrong, or that you're misunderstanding why they're upset?

            your patronising tone aside, GP didn't say everyone was wrong, did he? if he didn't, which he didn't, then it's a completely useless and fallacious rhetorical. what he actually said was that it's very common. and, factually, it is. I can't count the number of these type of instagram comments I've seen on obviously real videos. most people have next to no understanding of AI and its limitations and typical features, and "surprising visual occurrence in video" or "article with correct grammar and punctuation" are enough for them to think they've figured something out

            • kuschku2 hours ago |parent

              > I dare you to try building a project with Cursor or a better cousin and then come back and repeat this comment

              I always try every new technology, to understand how it works, and expand my perspective. I've written a few simple websites with Cursor (one mistake and it wiped everything, and I could never get it to produce any acceptable result again), tried writing the script for a YouTube video with ChatGPT and Claude (full of hallucinations, which – after a few rewrites – led to us writing a video about hallucinations), generated subtitles with Whisper (with every single sentence having at least some mistake) and finally used Suno and ChatGPT to generate some songs and images (both of which were massively improved once I just made them myself).

              Whether Android apps or websites, scripts, songs, or memes, so far AI is significantly worse at internet research and creation than a human. And cleaning up the work AI did always ended up being taking longer just doing it myself from scratch. AI certainly makes you feel more productive, and it seems like you're getting things done faster, even though it's not.

          • fragmede4 hours ago |parent

            Fascinatingly, as we found out from this HN post Markov chains don't work when scaled up, for technical reasons, so that whole transformers thing is actually necessary for this current generation of AI.

            https://news.ycombinator.com/item?id=45958004

        • littlestymaar6 hours ago |parent

          This kind of pressure is good actually, because it helps fighting against “lazy AI use” while letting people use AI in addition to their own brain.

          And that's a hood thing because I much as I like LLMs as a technology, I really don't want people blindly copy-pasting stuff from it without thinking.

        • rockskon6 hours ago |parent

          What isn't going anywhere? You're kidding yourself if you think every single place AI is used will withstand the test of time. You're also kidding yourself if you think consumer sentiment will play no part in determining which uses of AI will eventually die off.

          I don't think anyone seriously believes the technology will categorically stop being used anytime soon. But then again we still keep using tech thats 50+ years old as it is.