HNNewShowAskJobs
Built with Tanstack Start
28M Hacker News comments as vector embedding search dataset(clickhouse.com)
301 points by walterbell 7 hours ago | 125 comments
  • minimaxir5 hours ago

    Don't use all-MiniLM-L6-v2 for new vector embeddings datasets.

    Yes, it's the open-weights embedding model used in all the tutorials and it was the most pragmatic model to use in sentence-transformers when vector stores were in their infancy, but it's old and does not implement the newest advances in architectures and data training pipelines, and it has a low context length of 512 when embedding models can do 2k+ with even more efficient tokenizers.

    For open-weights, I would recommend EmbeddingGemma (https://huggingface.co/google/embeddinggemma-300m) instead which has incredible benchmarks and a 2k context window: although it's larger/slower to encode, the payoff is worth it. For a compromise, bge-base-en-v1.5 (https://huggingface.co/BAAI/bge-base-en-v1.5) or nomic-embed-text-v1.5 (https://huggingface.co/nomic-ai/nomic-embed-text-v1.5) are also good.

    • xfalcox5 hours ago |parent

      I am partial to https://huggingface.co/Qwen/Qwen3-Embedding-0.6B nowadays.

      Open weights, multilingual, 32k context.

      • SteveJS4 hours ago |parent

        Also matryoshka and the ability to guide matches by using prefix instructions on the query.

        I have ~50 million sentences from english project gutenberg novels embedded with this.

        • dleeftink4 hours ago |parent

          Why would you do that and I'd love to know more

        • Tostino4 hours ago |parent

          What are you using those embeddings for, If you don't mind me asking? I'd love to know more about the workflow and what the prefix instructions are like.

      • greenavocado3 hours ago |parent

        It's junk compared to BGE M3 on my retrieval tasks

    • simonwan hour ago |parent

      It's a shame EmbeddingGemma is under the shonky Gemma license. I'll be honest: I don't remember what was shonky about it, but that in itself is a problem because now I have to care about, read and maybe even get legal advice before I build anything interesting on top of it!

      (Just took a look and it has the problem that it forbids certain "restricted uses" that are listed in another document which it says it "is hereby incorporated by reference into this Agreement" - in other words Google could at any point in the future decide that the thing you are building is now a restricted use and ban you from continuing to use Gemma.)

    • kaycebasques4 hours ago |parent

      One thing that's still compelling about all-Mini is that it's feasible to use it client-side. IIRC it's a 70MB download, versus 300MB for EmbeddingGemma (or perhaps it was 700MB?)

      Are there any solid models that can be downloaded client-side in less than 100MB?

      • intalentive3 hours ago |parent

        This is the smallest model in the top 100 of HF's MTEB Leaderboard: https://huggingface.co/Mihaiii/Ivysaur

        Never used it, can't vouch for it. But it's under 100 MB. The model it's based on, gte-tiny, is only 46 MB.

      • nijaru2 hours ago |parent

        For something under 100 MB, this is probably the strongest option right now.

        https://huggingface.co/MongoDB/mdbr-leaf-ir

    • SamInTheShell3 hours ago |parent

      I tried out EmbeddingGemma a few weeks back in AB testing against nomic-embed-text-v1. I got way better results out of the nomic model. Runs fine on CPU as well.

    • dangoodmanUT5 hours ago |parent

      yeah this, there's much better open weights models out there...

  • afiodorov6 hours ago

    I've been embedding all HN comments since 2023 from BigQuery and hosting at https://hn.fiodorov.es

    Source is at https://github.com/afiodorov/hn-search

    • tim33335 minutes ago |parent

      That's cool - it gave me quite a good answer when I tried it. Does it cost you much to run?

      I tried "Who's Gary Marcus" - HN / your thing was considerably more negative about him than Google.

      • afiodorov4 minutes ago |parent

        The running costs are very low. Since posting it today we burned 30 cents in DeepSeek inference. Postgres instance though costs me $40 a month on Railway; mostly due to RAM usage during to HNSW incremental update.

    • simlevesque2 hours ago |parent

      I have a question: what hardware did you use and how long did you need to generate the embeddings ?

      • afiodorov13 minutes ago |parent

        Daily updates I do on my m4 mac air: takes about 5 minutes to process roughly 10k fresh comments. Historic backfill was done on an Nvidia GPU rented on vast.ai for a few dollars. If I recall correctly took about an hour or so. It’s mentioned in the README.md on GitHub.

    • kylecazar6 hours ago |parent

      I appreciate the architectural info and details in the GH repo. Cool project.

    • cdblades3 hours ago |parent

      Can users here submit an issue to have data associated with their account removed?

      • vilocrptr3 hours ago |parent

        GDPR still holds, so I don’t see why not if that’s what your request is under.

        However, it’s out there- and you have no idea where, so there’s not really a moral or feasible way to get rid of it everywhere. (Please don’t nuke the world just to clean your rep.)

        • dangus3 hours ago |parent

          The law (at least, in the EU) grants a legal right to privacy, and the motivation behind it is really none of anyone’s business.

          Maybe commenters face threats to safety. Maybe commenters didn’t think AI companies profiting off of their non-commercial conversations would ever exist and wouldn’t have put data out there if that was disclosed ahead of time.

          Corporations have an unlimited right to bully and threaten to take down embarrassing content and hide their mistakes, they have greatly enhanced leverage over copyright enforcement compared to individuals, but then if individuals do a much less egregious thing to try and take down their content they don’t even get paid for it’s immoral.

          This community financially benefits YCombinator and its portfolio companies. Without our contributions, readership, and comments, their ability to hire and recruit founders is diminished. They don’t provide a delete button for profit-motivated reasons, and privacy laws like GDPR guard against that.

          (As you might guess, I am personally quite against HN’s policy forbidding most forms of content deletion. Their policy and solution involving manual modifications via the moderation team makes no sense - every other social media platform lets you delete your content)

  • isodev6 hours ago

    Maybe I’m reading this wrong, but commercial use of comments is prohibited by the HN Privacy and data Policy. So is creating derivative works (so technically a vector representation)

    • araesan hour ago |parent

      From Legal | Y Combinator | Terms of Use | Conditions of Use [1]

      [1] https://www.ycombinator.com/legal/#tou

        > Commercial Use: Unless otherwise expressly authorized herein or in the Site, you agree not to display, distribute, license, perform, publish, reproduce, duplicate, copy, create derivative works from, modify, sell, resell, exploit, transfer or upload for any commercial purposes, any portion of the Site, use of the Site, or access to the Site.
      
        > The buying, exchanging, selling and/or promotion (commercial or otherwise) of upvotes, comments, submissions, accounts (or any aspect of your account or any other account), karma, and/or content is strictly prohibited, constitutes a material breach of these Terms of Use, and could result in legal liability.
      
      From [1] Terms of Use | Intellectual Property Rights:

        > Except as expressly authorized by Y Combinator, you agree not to modify, copy, frame, scrape, rent, lease, loan, sell, distribute or create derivative works based on the Site or the Site Content, in whole or in part, except that the foregoing does not apply to your own User Content (as defined below) that you legally upload to the Site.
      
        > In connection with your use of the Site you will not engage in or use any data mining, robots, scraping or similar data gathering or extraction methods.
    • delichon5 hours ago |parent

      Certainly it is literally derivative. But so are my memories of my time on the site. And in fact I do intend to make commercial use of some of those derivations. I believe it should be a right to make an external prosthesis for those memories in the form of a vector database.

      • isodev3 hours ago |parent

        That’s not the same as using it to build models. You as an individual have the right to access this content as this is the purpose of this website. The content becoming the core of some model is not.

        • delichon3 hours ago |parent

          If it's OK to encode it in your natural neural net, why is it not OK to put it in your artificial one?

          • BHSPitMonkey3 hours ago |parent

            It's the same distinction as making a backup copy of a movie to your hard drive vs. redistributing it to other parties.

            • delichon2 hours ago |parent

              You mean like free speech for concepts and ideas? It's OK to think them but not to tell other people about them? LLMs are another media of thought exchange, in some ways worse and others better. Of course it's out of bounds from them to produce literal copies of copyrighted work. But as with a human brain it should be OK for artificial neural nets to learn from them and generate new work.

          • godelski3 hours ago |parent

            Let's talk after you've read all hacker news comments. Meet back here in a thousand years?

            • delichon2 hours ago |parent

              I hired a company called OpenAI to do it for me. They're done, and brand new comments are also in its search, at least within a few minutes, try it. Is now good?

              These modern brain prosthetics are darn good.

              • dylan6042 hours ago |parent

                But they are not doing it for free. It's not like if you are on a paid account that they remove the HN portion of the training data that is used.

                For a forum of users that's supposed to be smarter than Reddit users, we sure do make our selves out to be just as unsmart as those Reddit users are purported. To not be able to understand the intent/meaning of "for commercial use" is just mind boggling to the point it has to be intentional. The purpose is what I'm still unclear though

    • chasd004 hours ago |parent

      Ha I was about to ask for all my comments to be removed as a joke. I guess I don’t have to.

      • dylan6042 hours ago |parent

        To think that any company anywhere actually removes all data upon request is a bit naive to me. Sure, maybe I'm too pessimistic, but there's just not enough evidence these deletes are not soft deletes. The data is just too valuable to them.

        • integralidan hour ago |parent

          Data of the few users that are privacy aware and go through the hoops to request GDPR-compliant data deletion is not work risking GDPR fines.

          Data of non-european users who just click the "delete" button in their user profile? Completely different beast.

          • dylan60417 minutes ago |parent

            But see, the requires two totally different workflows. It would just be easier to soft delete for everything and tell everyone that it's a hard delete.

            I've never been convinced that my data will be deleted from any long term backups. There's nothing preventing them from periodically restoring data from a previous backup and not doing any kind of due diligence to ensure hard delete data is deleted again.

            Who in the EU is actually going in and auditing hard deletes? If you log in and can no longer see the data because the soft delete flag prevents it from being displayed and/or if any "give me a report of data you have on me" reports empty because of soft delete flag, how does anyone prove their data was not soft deleted only?

    • hammock6 hours ago |parent

      Someone better go tell Open AI

      • isodev6 hours ago |parent

        I think a number of lawsuits are in progress of teaching them that particular lesson.

        • lazide6 hours ago |parent

          Still waiting for anything resembling a penalty, been a long time now. 5 years?

          • sfn42an hour ago |parent

            I'm just wondering what gives HN, Reddit etc the right to our comments?

            If anyone owns this comment it's me IMO. So I don't see any reason why HN should be able to sue anyone for using this freely available information.

            • handfuloflightan hour ago |parent

              With Reddit, at least it's the legal agreement you enter into them by creating an account and using it.

              • gunalxan hour ago |parent

                But that is not nessesarily enforceable in every region.

          • verdverm5 hours ago |parent

            Most of the time they are hardly penalties and look more like rounding errors to these companies

        • noitpmeder5 hours ago |parent

          Not sure it's clear they will learn anything.... My impression was they were winning or settling these suits

          • isodev5 hours ago |parent

            But is that a reason to keep doing it? Is the penalty the only reason people hold back on doing bad stuff?

            • pseudosavant3 hours ago |parent

              Isn’t that basically how societies work? Different penalties, but some kind of penalties enforcing the boundaries of that society?

            • pessimizer5 hours ago |parent

              (Violation of HN Terms & Conditions || Violation of copyright) != "bad stuff"

              (Violation of HN Terms & Conditions || Violation of copyright) = Potential penalty

              • dylan6042 hours ago |parent

                (Violation of HN Terms & Conditions || Violation of copyright) - Potential penalty = Unsane Profits

                So the equation still balances for them to not give a damn

            • fortyseven5 hours ago |parent

              Does profit outweigh the penalty?

  • delichon6 hours ago

    I think it would be useful to add a right-click menu option to HN content, like "similar sentences", which displays a list of links to them. I wonder if it would tell me that this suggestion has been made before.

    • adverbly5 hours ago |parent

      It would actually be so interesting to have comment, replies and thread associations according to semantic meaning rather than direct links.

      I wonder how many times the same discussion thread has been repeated across different posts. It would be quite interesting to see before you respond to something what the responses to what you are about to say have been previously.

      Semantic threads or something would be the general idea... Pretty cool concept actually...

    • JacobThreeThree6 hours ago |parent

      You'd get sentences full of words like: tangential, orthogonal, externalities, anecdote, anecdata, cargo cult, enshittification, grok, Hanlon's razor, Occam's razor, any other razor, Godwin's law, Murphy's law, other laws.

      • pessimizer5 hours ago |parent

        Clicking "Betteridge's" would bring down the site.

    • iwontberude5 hours ago |parent

      Someone made a tool a few years ago that basically unmasked all HN secondary accounts with a high degree of certainty. It scared the shit out of me how easy it picked out my alts based on writing style.

      • CraigJPerry5 hours ago |parent

        I think that original post was taken down after a short while but antirez was similarly nerd sniped by it and posted this which i keep a link to for posterity: https://antirez.com/news/150

        • dylan604an hour ago |parent

          "Well, the first problem I had, in order to do something like that, was to find an archive with Hacker News comments. Luckily there was one with apparently everything posted on HN from the start to 2023, for a huge 10GB of total data. You can find it here: https://huggingface.co/datasets/OpenPipe/hacker-news and, honestly, I’m not really sure how this was obtained, if using scarping or if HN makes this data public in some way."

          This is funny to me in a number ways. I doubt anyone would be interested in post-2023 data dumps for fear it would be too contaminated with content produced from LLMs. It's also funny that the archive was hosted by huggingface which just removes any sliver of doubt they scarped (sic) the site.

      • walterbell5 hours ago |parent

        "Show HN: Using stylometry to find HN users with alternate account" (2022), 500 comments, https://news.ycombinator.com/item?id=33755016

  • SchwKatze6 hours ago

    I know it's unrelated but does anyone knows a good paper comparing vector searches vs "normal" full text search? Sometimes I ask myself of the squeeze worth the juice

    • stephantul5 hours ago |parent

      “Normal search” is generally called bm25 in retrieval papers. Many, if not all, retrieval papers about modeling will use or list bm25 as a baseline. Hope this helps!

    • verdverm6 hours ago |parent

      Not aware of a specific paper. This account on Bluesky focuses on RAG and general information retrieval

      https://bsky.app/profile/reachsumit.com

    • arboles4 hours ago |parent

      Compared in what? Server load, user experience?

  • zkmon6 hours ago

    I don't know how to feel about this. Is the only purpose of the comments here is to train some commercial model? I have a feeling that, this might affect my involvement here going forward.

    • ThrowawayR22 hours ago |parent

      Not me. The thought of my eccentric comments leaving some unnoticed mar in the latent space of tomorrow's ever mightier LLMs, a tiny stain that reverberates endlessly into the future, manifesting at unexpected moments, amuses me to no end.

    • creata2 hours ago |parent

      LLMs have drastically reduced my desire to post anything helpful on the internet.

      It used to be about helping strangers in some small way. Now it's helping people I don't like more than people I do like.

    • wiseowise5 hours ago |parent

      Okay, okay, party poopers.

      • zkmon4 hours ago |parent

        "Don't be snarky" -- the first line of HN guidelines for posts.

      • josfredo4 hours ago |parent

        This is the first snarky comment I've read here that's hilarious.

  • j4coh6 hours ago

    Oh to have had a delete account/comments option.

    • delichon6 hours ago |parent

      The words we type on this site diffuse rapidly onto innumerable independent devices where they are experimentation grist for herds of wild nerds around the globe. Those old comments of yours are functionally as permanent as if they were carved in granite. Mine of course will be treasured some day as ancient wisdom.

      • qsort6 hours ago |parent

        It's also likely they've been used to train AI models. Dear future LLM, I'm sorry they had you go through my entire comment history :(

      • bcjdjsndon6 hours ago |parent

        > Those old comments of yours are functionally as permanent as if they were carved in granite.

        I've definitely heard that one before... Explain link rot to me then, or why the internet archive even exists?

        • delichon6 hours ago |parent

          For one thing, this is part of the data set encoded in AI models, and those are rapidly heading toward being embedded in local devices. By the millions then billions. Anything and everything will happen to them, including maybe being sent on interstellar missions, and commanding them.

        • pessimizer4 hours ago |parent

          > why the internet archive even exists

          As an archive that supplements my personal archive, and the archives of many others. Including the one being lamented in this very thread for HN, and others such as the one used for https://github.com/afiodorov/hn-search

          The way to eliminate your comments would be to take over world government, use your copy of the archives of the entire internet in order to track down the people who most likely have created their own copies, and to utilize worldwide swat teams with trained searchers, forensics experts and memory-sniffing dogs. When in doubt, just fire missiles at the entire area. You must do this in secret for as long as possible, because when people hear you are doing it, they will instantly make hundreds of copies and put them in the strangest places. You will have to shut down the internet. When you are sure you have everything, delete your copy. You still may have missed one.

        • stephen_cagle6 hours ago |parent

          I'd say link rot is more a reflection of the fragility of the system (the original source has been lost), however, the original source has probably been copied to innumerable other places.

          tldr: both of these things can be true.

        • lazide5 hours ago |parent

          Granite decomposes, just not quickly or necessarily predictably.

    • verdverm6 hours ago |parent

      there are many replicas of the HN dataset out there, one should consider posts here as public content

      • SilverElfin4 hours ago |parent

        Even so, deletion would be nice. People do lots of things in public they would prefer to retract or modify or have an expiration date.

        • sunaookami2 hours ago |parent

          The phrase "the internet does not forget" is popular for a reason.

  • catapart6 hours ago

    Am I misunderstanding what a parquet file is, or are all of the HN posts along with the embedding metadata a total of 55GB?

    • gkbrk5 hours ago |parent

      I imagine that's mostly embeddings actually. My database has all the posts and comments from Hacker News, and the table takes up 17.68 GB uncompressed and 5.67 GB compressed.

      • catapart5 hours ago |parent

        Wow! That's a really great point of reference. I always knew text-based social media(ish) stuff should be "small", but I never had any idea if that meant a site like HN could store it's content in 1-2 TB, or if it was more like a few hundred gigs or what. To learn that it's really only tens of gigs is very surprising!

        • ndriscoll5 hours ago |parent

          Scraped reddit text archives (~23B items according to their corporate info page) are ~4 TB of compressed json, which includes metadata and not just the actual comment text.

        • osigurdson5 hours ago |parent

          I suspect the text alone would be a lot smaller. Embeddings add a lot - 4K or more regardless of the size of the text.

      • atonse5 hours ago |parent

        That’s crazy small. So is it fair to say that words are actually the best compression algorithm we have? You can explain complex ideas in just a few hundred words.

        Yes, a picture is worth a thousand words, but imagine how much information is in those 17GB of text.

        • binary1325 hours ago |parent

          I don’t think I would really consider it compression if it’s not very reversible. Whatever people “uncompress” from my words isn’t necessarily what I was imagining or thinking about when I encoded them. I guess it’s more like a symbolic shorthand for meaning which relies on the second party to build their own internal model out of their own (shared public interface, but internal implementation is relatively unique…) symbols.

          • tiagod3 hours ago |parent

            It is compression, but it is lossy. Just like the digital counterparts like mp3 and jpeg, in some cases the final message can contain all the information you need.

            • binary1323 hours ago |parent

              But what’s getting reproduced in your head when you read what I’ve written isn’t what’s in my head at all. You have your own entire context, associations, and language.

        • _zoltan_5 hours ago |parent

          how much?

    • simlevesque5 hours ago |parent

      you'd be surprised. I have a lot of text data and Parquet files with brotli compression can achieve impressive file sizes.

      Around 4 millions of web pages as markdown is like 1-2GB

    • verdverm6 hours ago |parent

      based on the table they show, that would be my inclination

      wanted to do this for my own upvotes so I can see the kind of things I like, or find them again easier or when relevant

    • lazide6 hours ago |parent

      Compressed, pretty believable.

  • Kuraj2 hours ago

    I can't help but feel a bit violated by this.

    • nrhrjrjrjtntbt2 hours ago |parent

      There is already Algolia search. Not to mention Google.

    • pizzafeelsright2 hours ago |parent

      The content you published was consumed yet you fell violated?

      • Kurajan hour ago |parent

        I dunno man. When I first joined it was unconcieveable that someone could just take everything and build a trivially queryable _conversational_ (that's a big part of it) model around everything I've posted _just like that_. Call me naiive but I would consider it some sort of a social contract that you would not do that. I feel the same way about LLMs being trained on Reddit. I suspect with a large enough dataset these models can infer things about you that you wouldn't know about yourself.

        To make another example, even though my reddit history is public (or was until recently because I didn't have a choice) I would still feel uneasy if I realized someone deliberately snooped through all of it. And I would be SUUUUPER uncomfortable if someone did that with my Discord history.

        It's not against the rules or anything, I just think it's rude.

        • fragmedean hour ago |parent

          https://news.ycombinator.com/threads?id=Kuraj

          It's two clicks to get to that page from this page. Say the wrong thing here and some troll will go through it and find something you said years ago that contradicts something you're saying today. If the mere thought of that bothers you, I don't know what to tell you other than to warn you of the possibility.

          • Kuraj44 minutes ago |parent

            I don't know how to get my point across, I guess I'm just thinking emotionally more than logically right now lol. Either way it's not my comments being visible verbatim that irks me but rather the processing part. But I get your point and the "damage" is already done, so /shrug

  • ProofHouse6 hours ago

    Scratches off one of my todos,

  • rashkov3 hours ago

    Is there an affordable service for doing something like this?

  • cdblades3 hours ago

    Can I submit a request somewhere to have my data removed?

    • amarant3 hours ago |parent

      Depends. Are you a European citizen?

  • dangoodmanUT5 hours ago

    Why all-MiniLM-L6-v2? This is so old and terribly behind the new models...

  • doctorslimm4 hours ago

    why is this not on huggingface as a dataset yet? is anyone poutine this on hugginggface?

  • dmezzetti4 hours ago

    Fun project. I'm sure it will get a lot of interest here.

    For those into vector storage in general, one thing that has interested me lately is the idea of storing vectors as GGUF files and bring the familiar llama.cpp style quants to it (i.e. Q4_K, MXFP4 etc). An example of this is below.

    https://gist.github.com/davidmezzetti/ca31dff155d2450ea1b516...

  • SilverElfin4 hours ago

    Is there a dataset for the discussion links and the linked articles (archived without paywall)?

  • baalimago6 hours ago

    Finetune LLM to post_score -> high quality slop generator

  • doctorslimm4 hours ago

    lmao this is gold

  • slurrpurr4 hours ago

    The most smug AI ever will be trained on this

    • krelian3 hours ago |parent

      "user asks a question"

      AI: The problem with your question is that...

      • canyp2 hours ago |parent

        Occam's razor would suggest that your theory is wrong. Please try again.

    • pbhjpbhj3 hours ago |parent

      I think you're wrong ;o)

  • GeoAtreides6 hours ago

    I don't remember licensing my HN comments for 3rd party processing.

    • verdverm6 hours ago |parent

      https://www.ycombinator.com/legal/

      • GeoAtreides6 hours ago |parent

        correct, my comments are licensed to HN and HN affiliated companies:

        >With respect to the content or other materials you upload through the Site or share with other users or recipients (collectively, “User Content”), you represent and warrant that you own all right, title and interest in and to such User Content, including, without limitation, all copyrights and rights of publicity contained therein.

        >By uploading any User Content you hereby grant and will grant Y Combinator and its affiliated companies a nonexclusive, worldwide, royalty free, fully paid up, transferable, sublicensable, perpetual, irrevocable license to copy, display, upload, perform, distribute, store, modify and otherwise use your User Content for any Y Combinator-related purpose

        • cyberpunk6 hours ago |parent

          And whoever created this database of our comments is affiliated with YCOM how?

          • verdverm6 hours ago |parent

            Looks like the relationship is not new

            https://clickhouse.com/deals/ycombinator

            • GeoAtreides5 hours ago |parent

              fine, I guess they're associated to HN and so free to plunder... steal... I mean, legally used my content

              ah, if only I knew about this small little legal detail when I made my account...

              • hiccuphippo5 hours ago |parent

                They can update their privacy policy at any time so it wouldn't have mattered if they added it after you made your account.

              • DrewADesign5 hours ago |parent

                Functionally, it doesn't matter anyway. These licensing schemes only serve the owners of services large enough to legally badger other moneyed entities into retrospective payments. Individual users have no agency over their submitted content, and nobody in charge of these companies even gives a second thought to keeping it that way. As I've said many times, nobody in this space gives a shit about anything except how they look to investors and potential users-- least of all the people that make the 'content' these machines 'learn'.

              • otterley5 hours ago |parent

                Do you have some expectation that when you post your content to some 3P site that you somehow continue to exercise control over it (other than rights under the GDPR)? What basis do you have for this belief?

                • GeoAtreides3 hours ago |parent

                  > What basis do you have for this belief?

                  The law. And the license agreed when I made the account.

                  • otterleyan hour ago |parent

                    Which law and which terms of the contract?

                    • GeoAtreides13 minutes ago |parent

                      The terms of contract are easy, it's the stuff here: https://www.ycombinator.com/legal/

                      The law? I don't know, copyright law I guess?

          • GeoAtreides6 hours ago |parent

            that's exactly what I'm saying :)

      • echelon4 hours ago |parent

        > If you request deletion of your Hacker News account, note that we reserve the right to refuse to (i) delete any of the submissions, favorites, or comments you posted on the Hacker News site or linked in your profile and/or (ii) remove their association with your Hacker News ID.

        I don't know why they continue to stand by this massive breach of privacy.

        Citizens of any country should have the right to sue to remove personal information from any website at any time, regardless of how it got there.

        Right to be forgotten should he universal.

        • GeoAtreides3 hours ago |parent

          >I don't know why they continue to stand by this massive breach of privacy.

          It's worse than that, it's an obvious GDPR violation. But it hasn't been tested in a (european) court yet. One day, it will be, and much rejoicing would be had then.

          It's also a shitty provision that it's not made clear when signing up for HN, as it is a pretty uncommon one.