HNNewShowAskJobs
Built with Tanstack Start
Map of GitHub(github.com)
717 points by vortex_ape 9 months ago | 103 comments
  • tux39 months ago

    Somehow torvalds/linux is in Fronterra, next to JS projects, awesome-X lists, and frontend checklists.

    Either kernel hackers unexpectedly love frontend, or more likely the people that write the code don't overlap much with the people that star Github projects!

    • anvaka9 months ago |parent

      Jaccard similarity is not particularly good for "celebrity" projects.

      They are similar because they are popular, not because there is semantic relationship.

      It's the same problem I faced with the map of reddit (https://anvaka.github.io/map-of-reddit/ ) - all popular subreddits are just "similar" to each other.

      Stil works great for smaller, non-celebrity projects :D

    • supriyo-biswas9 months ago |parent

      I wonder if code embeddings might have been a better way to organize the projects, although probably infeasible given the amount of resources required to download and compute embeddings for each file.

      • machiaweliczny9 months ago |parent

        Embeddings are super cheap to compute

    • dataviz10009 months ago |parent

      Perhaps the same reason heat maps are often really the underlining population map https://xkcd.com/1138/

      • wodenokoto9 months ago |parent

        That’s why in NLP we use term frequency over inverse document frequency. It gives you a measure of common uncommon things are.

        Wonder how you’d implement that in a heat map. Just call each pixel a document and see where it takes you?

        • bravura9 months ago |parent

          People have been critiquing the collaborative filtering aspect of this work vs content analysis ("[why use stars instead of code similarity]") but there's something elegant about the simplicity of using less priors here.

          A tf*idf matrix could be applied to the star-feature matrix too. Document = github repo. Term = name of user who starred it.

          THUS, users who overstar are simply less important for computing similarities.

          This would mitigate the phenomenon of massively popular github repos being clustered together because of folks who blithely star the most well known stuff.

        • supriyo-biswas9 months ago |parent

          Winsorize the data points to remove outliers and then divide it by the population count for the case of the heatmap?

    • revskill9 months ago |parent

      Because of react ?

      • jensenbox9 months ago |parent

        That was my first reaction.

        • moffkalast9 months ago |parent

          What's your angle?

      • dbrans9 months ago |parent

        Can you elaborate?

      • odo12429 months ago |parent

        Wait, why React?

  • neonate9 months ago

    Live link: https://anvaka.github.io/map-of-github/

    • stevage9 months ago |parent

      Yeah, this should be the link - not the repo.

  • Weetile9 months ago

    "Sussex" as the name of the Among Us section had me laughing

    • throwaway1274829 months ago |parent

      The funniest one I saw was "Lispaña"

      • rcarmo9 months ago |parent

        I just noticed Halaska right next to it :)

      • rcarmo9 months ago |parent

        Olé! (^..^)

  • huevosabio9 months ago

    Surprised at how small Rustland is. Barely a province in Clouderra.

    Also, interesting how both Bevy and Veloren are in Rustland. Probably, the stars come more from the Rust community than the game dev community. Which I guess makes sense: the Rust ecosystem is still relatively small and feels like a lot of people doing X but in Rust.

    • culi9 months ago |parent

      I'm also shocked how small "nodelandia" is and that its not even its own continent. I guess we all overestimate the size of our bubbles

      • JBiserkov9 months ago |parent

        Most of the mass is concentrated in the node_modules folder.

      • brabel9 months ago |parent

        OTOH the vim and emacs lands seem to be huge!

        • redman259 months ago |parent

          Vim land seems much larger than I expected.

    • vinc9 months ago |parent

      I can see many osdev Rust projects in "PlusPlus Nation" near other kernels, which mean that "X but in Rust" might be in "X" instead of "RustLand".

    • devvvvvvv9 months ago |parent

      Not that surprised. Rust is known for being evangelized by a very loud minority.

    • jascha_eng9 months ago |parent

      The data is a from March 2023 according to OP so a lot of the more recent rust projects just won't be included yet.

      • anvaka9 months ago |parent

        Yes...

        Aiming to redo it some time in early 2025!

    • ramon1569 months ago |parent

      Happy to see bevy between them though! :)

      • huevosabio9 months ago |parent

        Tangent: not that often to see a fellow Ramon in hn :)

    • huevosabio9 months ago |parent

      Also, lol at Zig being a suburb of Rust

  • stevage9 months ago

    Very fun to be able to find my own project there (mapbox-gl-utils):

    https://anvaka.github.io/map-of-github/#12/24.78947/18.85186

  • acmeian9 months ago

    A fun minigame is trying to find a particular project using the map only, without the search feature :-)

    • huevosabio9 months ago |parent

      or start with one project and find your way to another, you can imagine there are shipping lines :)

    • anvaka9 months ago |parent

      love it =)!

  • hatmatrix9 months ago

    As a fan of Julia, surprised to see how julialang/julia has so few links. It's a niche language; how isolated it is on this map is maybe not so unrepresentative of the user or developer experience.

    • sundarurfriend9 months ago |parent

      There's a JuliaLand to the west of the island where julialang/julia is.

      The fact that julialang/julia ended up near tensorflow and opencv, and actual Julia packages ended up elsewhere, probably reflects a difference between aspirational users and real users: a lot of people who starred the Julia project itself were numeric Python users who were looking for a new Python, but then mostly stuck to Python itself, so their other stars are in the numeric Python land. Those who starred the JuliaLand packages are the actual Julia users who aptly enough ended up near Moleculandia and AstroSpace and Quantumia.

      • hatmatrix9 months ago |parent

        Ah I didn't see that.

        That explanation sounds very plausible.

  • jamala19 months ago

    Very neat and creative approach but I'm honestly conflicted whether the country/map metaphor is the best choice. In many cases the names are not that clear, so one has to zoom in to understand what they represent. It would perhaps be more interesting to do hierarchical clustering and show something like average connectiveness between the (super)clusters with lines, possibly with more descriptive/faithful LLM-generated labels for each cluster.

    • richardw9 months ago |parent

      I was pleasantly surprised that it wasn’t a heavy line drawing creation. As someone who first did those in the 90’s and almost immediately learned their limits, I think this is nice because it doesn’t overclaim. It’s just a view, not a thesis.

      I like diagrams where the axes mean something. Lines, shape, boxes/groups, distance, X vs Y, colour, thickness, texture, background, foreground. I also like simple. So often it’s lines to be fancy with no meaning. This one is just a pic, with some grouping, and it has personality. Yay?

      (Still love lines, just not everywhere always.)

    • anvaka9 months ago |parent

      I couldn't find a universal clustering algorithm yet: Frequently there is more than one way to group data that still makes sense, and as a result whichever final clustering option we choose - it will not be perfect.

      Hm... unless maybe we do some sort of quantum clustering, which could be a fun project to explore!

      It's a bit hazy now, but I remember trying hdbscan algorithm (hierarchical clustering), and on the graph of the GitHub size - I just couldn't fit it in memory.

      I did end up using something similar to hierarchical clustering (mix of louvain/leiden/my own), and that's what we see in the final map.

    • romanobro569 months ago |parent

      They could have done that, but they decided to do a map

  • gudzpoz9 months ago

    Quitlessia and NeoQuitlessia... These names are evil. (Doom Emacs lies in NeoQuitlessia instead of Emacsia, which surprisingly makes sense. :)

    • anvaka9 months ago |parent

      haha! I love vim.

      We shall not quit.

  • HellsMaddy9 months ago

    How are connections between repos determined? I checked some of my repos and don't see any references in either direction for some of the connections.

    • pierrec9 months ago |parent

      The author answered that question in the original HN post: https://news.ycombinator.com/item?id=35933981

      Basically what others are guessing, lines represent the highest similarity scores based on "stargazers", which also forms the entire map. To anyone confused, the lines only appear once you click into a specific country.

    • frereubu9 months ago |parent

      In the first line: "Dots are close to each other if they have a lot of common stargazers."

      • HellsMaddy9 months ago |parent

        That explains why they are "close to each other" but not what determines which nodes are connected by an edge.

        • tux39 months ago |parent

          I think it's the other way around. The similarity metric determines which repos have edges (possibly weighted?)

          And then some clustering algorithm makes sense of this giant graph by laying out sets of nodes that have a lot of edges to each other, close to each other

          The closeness is just layout, the edges is the data structure that determines closeness.

          • anvaka9 months ago |parent

            This is correct!

    • minimaxir9 months ago |parent

      Jaccard similarity returns a value between 0 and 1 (in this case the vast majority of the values being close to 0). I suspect there's a hard-coded threshold value to determine an edge, e.g. if Jaccard similarity between A and B is > 0.2, create an edge.

  • malux859 months ago

    I'm not sure why BinanceLand is in AILandia though, please dont encourage them XD

    • dandiep9 months ago |parent

      Clearly crypto should be a sinking ship with people swimming to the shores of other places in this metaphor

    • minimaxir9 months ago |parent

      It would make sense that there's an overlap between crypto fans and a certain subset of AI fans.

  • openrisk9 months ago

    Its good to see all those "why is X in Y?" type comments.

    Remember that feeling when deploying algorithms, especially when those affect people (which hopefully in not the case with this nice project.

    A mechanism to explain how specific results came about is as much part of the project as the more technical machine learning choices involved.

  • hirako20009 months ago

    The author of this also made other outstanding vizualisations.

    A while back ngraph blew my mind. I built a taxonomy biz off ngraph:

    https://hirako-ngraph.surge.sh/#/world/nature

    • anvaka9 months ago |parent

      Thank you for your kind words!

      That link you've shared - doesn't open for some reason

      • ristomatti9 months ago |parent

        It did for me now, perhaps too much traffic. It's pretty wild, especially on a tablet due to the gyroscope effect!

        • hirako20009 months ago |parent

          Gyro support is also Andrei's work, not mine.

          I found the (awesome) video where he presents ngraph: https://youtu.be/vZ6Yhlxv7Os

          Edit: not loading? surge.sh has been less reliable lately, will get to finish that project some day and will publish elsewhere.

  • labster9 months ago

    Wikimedia is right next to GPT Nation. I think an invasion is imminent.

  • romanroe9 months ago

    Very interesting that HTMX (bigskysoftware/htmx), which is backend-agnostic, lives in Pythonia->Djangonia and not in e.g. Fronterra.

    Does this mean that HTMX is mostly used by Django devs?

    • nasso_dev9 months ago |parent

      Despite HTMX being backend-agnostic, I heard it pairs extremely well with Django, so that's probably why! Maybe the two are particularly well fitting pieces of the web dev puzzle.

      • romanroe9 months ago |parent

        I love Django and its my primary weapon of choice. But due to its structure and philosophy it doesn't move as fast as e.g. Rails (e.g. impressive how quickly they picked up SQLite for their Solid* stuff). I guess HTMX was a very welcomed solution from the "outside" to allow more interactive frontends.

  • boxed9 months ago

    Django is in the middle of Pythonia, and not in Djangonia. Weird!

    • philipwhiuk9 months ago |parent

      If you develop on Linux you generally probably don't star linux/kernel. But you do star other projects developing on Linux.

      Ditto if you develop Django, you star Python libraries, not Django downstream plugins.

    • throwaway93t329 months ago |parent

      It's similar to how SpringBootia is almost as big as Javaland, but Spring Boot itself is in Javaland and not it's "homeland"

  • heeton9 months ago

    Lispaña is a really excellent name for a lisp country :)

    • anvaka9 months ago |parent

      Thank you =)

  • fzeindl9 months ago

    I do have the theory that the more untyped the language is, the larger the islands are: Fronterra (JavaScript), Cloudderra (YAML), AILandia (Python) are way bigger than Java, Swift, DotNet, etc. even though the prejudice saying goes that the problem of software engineering is stale old enterprise code in Java/DotNet.

    That might be the case, but the libraries seems to be more reusable!

    • brabel9 months ago |parent

      Javascript made the barrier to entry for creating a package nearly zero. In contrast, it's fairly difficult to publish something on Maven Central (the main Java repository). You need to prove you own a domain, setup a GPG key for signing, manually register with Sonatype, which is more than many people are willing to do. I think that explains it much better.

    • jrgv9 months ago |parent

      The stale old enterprise code is not in public repositories.

  • ChrisArchitect9 months ago

    Some previous discussion:

    2023

    https://news.ycombinator.com/item?id=35931402

  • matt_trentini9 months ago

    Cool visualisation!

    It was somewhat amusing that MicroPython isn't in MicroPythonia but Arduinoria...and CircuitPython is in PicoPythonia. :)

  • jamalaramala9 months ago

    Kudos to the author for the amazing idea!

    The only problem I see is that projects don't fit so nicely in the division between languages (Pythonia, Javaland, Clojuria, etc) and applications (Gamedonia, AILandia, etc). There's a lot of intersection between them.

    But the visualization is super-cool nonetheless. :)

  • LtWorf9 months ago

    But stargazers are absolutely meaningless, since most of them are bots that give stars behind payment and like random stuff to throw off detection.

    And as usual important libraries don't get as much attention as flash little leaf projects.

  • uwemaurer9 months ago

    this looks really great!

    I tried something similar a few weeks ago, using the embedding vectors of the Github project descriptions.

    https://awesome.facts.dev/3d

  • buryat9 months ago

    "Stop the war" looks like a very small territory, you don't even need to think what kind of message they send. It's so small in the grand scheme

  • bushbaba9 months ago

    Interesting that azureland is under l33t nation and not clouderia

  • djoldman9 months ago

    > In the second phase I computed exact Jaccard Similarity between each repository.

    Using what inputs? The repo seems to have only the frontend code.

    • philipwhiuk9 months ago |parent

      The star data from the first phase.

  • H2HOE9 months ago

    Why was jaccard similarity preferred here i would love to learn more about the choice process. Fantastic Work though love it

    • anvaka9 months ago |parent

      Thank you!

      I tried quite a few various similarity metrics, and Jaccard was giving me the best results. This is all very subjective, of course.

  • robertclaus9 months ago

    I've been thinking something similar for identifying ownership areas within an organization would be cool.

  • est9 months ago

    ZH.Pyscrapia had an island of its own.

  • Mustachio9 months ago

    docker-minecraft is under Adulttopia. I wonder what made it make that connection

  • rcarmo9 months ago

    Nice, but kind of weird to find piku/Piku in Fronterra.

  • shlomo_z9 months ago

    This is truly a work of art! Great job!

  • ALOHa1009 months ago

    yay anvaka reaches front page!

    fun times from reddit map

    • anvaka9 months ago |parent

      haha thank you!

  • Havoc9 months ago

    > Homelabia

    Definitely some unique naming choices there lol

  • VMG9 months ago

    Very well done, loads quickly and is usable even from mobile.

    I love this sort of concept map and I am typically disappointed by the execution.

  • HPsquared9 months ago

    "The GitHub Archipelago"

  • lizmutton9 months ago

    This is phenomenal!

  • Kylejeong219 months ago

    couldn't find any of my stuff so that means i gotta do more lol

  • robblbobbl9 months ago

    Cool

  • 71bw9 months ago

    Interesting how one fork of Magisk lands in "AndroModLand" and another in some gaming space.

  • cisrockandroll9 months ago

    FORTRAN and COBOL Programming is a part of the AI island, lol.

    • Jimmc4149 months ago |parent

      Fortran has been given new life

      https://arxiv.org/search/?query=fortran&searchtype=all&sourc...

      https://github.com/search?q=fortran+language%3AFortran+neura...

  • k__9 months ago

    Looks like AI is already trashing the place, lol.

  • oystermax9 months ago

    [dead]

  • shayansm19 months ago

    Some might say that PHP is dead (and I’d be one of them too), but there is a PHP kingdom on the map! :) I think we might have all been mistaken.

    • dmd9 months ago |parent

      I don't think anyone has ever seriously suggested PHP is dead. People may want it to be dead, but it's probably still the most-used language on the web!

    • 9dev9 months ago |parent

      Sorry, not sorry—PHP is alive, and thriving! The language runtime is getting ever faster, the packagist ecosystem in combination with composer (PHP's package manager) are rock-solid, there are event loops and application servers by now, serverless deployments are the default operation mode, and with Laravel or Symfony, there are trusted and extremely versatile frameworks available that do stuff out of the box that require lots of manual efforts with other languages.

      Add to that the support for type annotations that can go all the way from fully untyped and dynamic, to runtime-enforced primitive constraints and object types, and you'll end up with a very good choice for web applications that evolve quickly.