During many years of operating several-thousands of nodes production clusters on Kubernetes, I've never seen any of these observability tools that query kube-apiserver work at that scale. Even the popular tools like k9s make super expensive queries like listing all pods in the cluster that if you don't have enough load protections, can tip your Kubernetes apiserver over and cause an incident. If you're serious about these querying capabilities, I highly recommend building your own data sources (e.g. watch objects with a controller and dump the data in a sql db) and stop hitting apiserver for these things. You'll be better off in the long run.
This is the approach we took while building our Internal Developer Platform: watches (via client-go informers with client-side caching) to sync data into a Postgres database as JSONB. Changes are tracked using JSON patches and Kubernetes events. To avoid a watch on every resource kind, we handle this by performing incremental object fetches for the objects involved in watched events.
Getting this to perform well required several optimizations at both the Go and Postgres levels. On the Go side, we use prioritized work queues, event de-duplication, and even switched to Rust for efficient JSON diffs. For Postgres, we leverage materialized views and trigger-based optimistic locking
There is a funny parallel I see with Kubernetes that I also saw a lot with Linux in the early years. There are thousands of packages and tools you can install on Linux (think phpmyadmin for example) and new users sometimes go wild installing every single package they read about.
After a while, the more mature Linux engineers start going the other way. Ripping out as much as possible. Stripping down to the leanest build they can, for performance but also to reduce attack surface and overall complexity.
Very similar dynamic with k8s. Early days are often about scooping up every CNCF project like you're on a shopping spree. Eventually people get to shipping slim clusters running and 30mb containers with alpine or nix. Using it essentially as open source clustering for Linux.
What's surprising to me is that there's no way to listen to any object type. You have to know the "kind" beforehand, because the watch API requires it. To watch all objects in the system, you have to start a separate watch request for every type. This may in turn be expensive.
If you have direct access to Etcd (which may not be possible in a managed cloud version of Kubernetes?), putting a watch on / might scale better.
(As an aside, with the Go client API you have to jump through some hoops to even deserialize objects whose kinds' schemas are not already registered. You have to use the special "unstructured" deserializer. The Go SDK often has to deal with unknown types, e.g. for diffing, and all of the serializer/codec/conversion layers in the SDK seem incredibly overengineered for something that could have just assumed a simple nested map structure and then layered validation and parsing on top; the smell of Java programmers is pretty strong.)
The watch API has horrible user experience in all platforms. One must send a GET and keep the pipe open, waiting for a stream of responses. If the connection is lost, changes might be lost. If one misses a resource version change, then either the reconnection will fail, or a stale resource will be monitored.
The Java client does this with blocking, resulting in a large number of threads.
I truly like Kubernetes, and I think most detractors' complaints around complexity simply don't want to learn it. But the K8s API, especially the Watch API, needs some rigorous standards.
how are Kubernetes apiservers suffering this much from this kind of query? Surely even in huge systems the amount of data that would need to be traversed is super small, right?
Is this a question of Kubernetes just sticking everything into "standard" datastructures instead of using a database?
My knowledge is out of date now, but the main issues IMO are/were:
- No concept of apiserver rate limiting, by design. I see there is now an APF thingy, but still no basic API / edge rate limiting.
- etcd has bad scalability. It's a very basic, highly consistent kv store that has tiny limits (8GB limit in latest docs, with a default of 2GB). It had large performance issues throughout its life when I was using k8s, I still don't know if it's much better.
Kubernetes only lets you query resources by object type and that's only a prefix range scan on etcd database. There are no indexes whatsoever in the exhaustive LIST queries, and kube-apiserver handles serialization of the objects back and forth between multiple wire types. Over the years there has been a lot of optimizations, but you don't wanna list all pods in a 5000 node high density cluster every time you spin up client-side tools like this.
In my experience, they don't, you can just run more of them and you can stick them behind a load-balancer (regular HTTP reverse proxy). You can scale both etcd and apiserver pretty easily. Of course you have less control in cloud environments, I have less experience with that.
Pretty sure the apiserver just queries the etcd database (and maybe caches some things, not sure) but i guess it could be the apiserver itself that can't handle the data :P
I no longer know anything about Kubernetes, but share your surprise! From first principles it seems the metadata should be small.
Long ago I wanted to re-implement at least part of kubectl in Python. After all, Kubernetes has documented API... what I quickly discovered was that kubectl commands don't map to Kubernetes API. Almost at all. A lot of these commands will require multiple queries going back and forth to accomplish what the command does. I quickly abandoned the project... so, maybe I've overlooked something, but, again, my impression was that instead of having generic API with queries that can be executed server-side to retrieve necessary information, Kubernetes API server offers very specialized disjoint set of commands that can only retrieve one small piece of interesting info at a time.
This, obviously, isn't a scalable approach, but there's no "wrapper" you could write in order to mitigate the problem. The API itself is the problem.
How fun was kube-ops-view though
This is a very good point and is on the roadmap.
I'm not against replacing jq/jsonpath for the right tool, they're not the most ergonomic. What isn't clear to me though is why this isn't SQL? It's so nearly SQL, and seems to support almost identical semantics. I realise SQL isn't perfect, but the goal of this project isn't (I assume) to invent a new query language, but to make Kubernetes more easily queryable.
It's based on Cypher, which is a query languages for graph databases. The author/s probably thought the data is more graph-like than relational.
Ah. I’ve not heard of Cypher before.
I’d disagree and say that Kubernetes is much more relational that graph based, and SQL is pretty good for querying graphs anyway, especially with some custom extensions.
This does make more sense though.
Graph DBs are generalized relationship stores. SQL can work for querying graphs, but graph DB DSLs like Cypher become very powerful when you're trying to match across multiple relationship hops.
For example, to find all friend of a friend or friends of a friend of a friend: `MATCH (user:User {username: "amanj41"})-[:KNOWS*2..3]->(foaf) WHERE NOT((user)-[:KNOWS]->(foaf)) RETURN user, foaf`
I haven't tried it , but steampipe has a k8s plugin which lets you use PG/sqlite: https://hub.steampipe.io/plugins/turbot/kubernetes/tables
Reading your comment made me think that they're so close to "OSQuery for k8s", but that already seems to exist: https://www.uptycs.com/blog/kubequery-brings-the-power-of-os...
This looks great for scripting. I will say that the query language looks a bit too verbose for daily use — meaning when you're interacting with a cluster to diagnose a problem, follow a job, testing the rollout of something experimental, or similar.
For example, I'd love to be able to just do this as the whole query:
or maybe:metadata.name =~ "foo%"
or maybe:.. =~ "foo%" // Any field matches
I think a query language for querying Kubernetes ought to start with predicate-based filtering as the foundation. Having graph operators seems like a nice addition, but maybe not the first thing people generally need?$pod and metadata.name =~ "foo%" // Shorthand to filter by type
It's not quite clear who this tool is for, so maybe this is not the intended purpose?
The
at the top of the page is an immediate turn-off.brew install cyphernetes
Agree but I'm not sure why. I'm not a mac user so the initial impression is like "this isn't for you, go away". At least add a linux command alongside it!
Thanks for the feedback. Will add more commands there on rotation to show the different installation options.
Homebrew has a Linux variant, but I assume almost nobody uses it.
Personally use a Mac with Nix, and so do many of my coworkers. Assuming Homebrew, even for a Mac user, leaves a bad impression on me.
I also prefer Mac with Nix over homebrew.
Even on macOS, brew is wildly inferior to MacPorts; to be fair, brew is “blessed” by Swift Package Manager whereas MacPorts is not, but this is ironic given the guy behind MacPorts both worked at Apple and designed the original FreeBSD ports system.
go run github.com/avitaltamir/cyphernetes/cmd/cyphernetes@v0.14.0 --help
It would be good to have some example commands that can be ran right after installation, rather than having to figure out how to run the queries.
why?..
Kubernetes only runs on linux, so it follows to reason if you care about k8s you should care about linux. My experience is also that good experienced sysadmins often use linux for their own machines as well.
Targetting a tool at macOS users, and omitting linux instructions, gives the impression that the tool isn't targeted at sysadmins or hackers (i.e. at us), but rather at beginners, frontend developers, etc.
Saying it's targeted at beginners because it supports MacOS shows a lot of disconnection with what many DevOps people use these days. The year of the linux desktop has yet to arrive, and Mac is king for people in IT (at least in the US)
I have yet to meet a competent sysadmin that cares much about "desktop", and to the extent they do they mostly seem to invent their own graphical tools, with Tcl/Tk and so on.
Are they common where you live?
Brew runs on linux too..
I'm a "sysadmin". I only run Linux on my workstation. I even run NixOs on a home server. I manager Kubernetes clusters. Yet, I use Homebrew on Linux.
Most, however, do not, nor should they be expected to. Homebrew is not a safe or viable package manager, especially when better and safer package managers exist in the Linux ecosystem.
What? I love seeing this. I want to see how to get it quickly via package manager.
Not everyone uses the same package manager that you use.
This is fantastic. I’ve always enjoyed the cypher language that the neo4j team created for querying graph data. The connected k8s api objects seem like a great place to apply that lens.
I really really like Steampipe to do this kind of query: https://steampipe.io, which is essentially PostgreSQL (literally) to query many different kind of APIs, which means you have access to all PostgreSQL's SQL language can offer to request data.
They have a Kubernetes plugin at https://hub.steampipe.io/plugins/turbot/kubernetes and there are a couple of things I really like:
* it's super easy to request multiple Kubernetes clusters transparently: define one Steampipe "connection" for each of your clusters + define an "aggregator" connection that aggregates all of them, then query the "aggregator" connection. You will get a "context" column that indicates which Kubernetes cluster the row came from. * it's relatively fast in my experience, even for large result sets. It's also possible to configure a caching mechanism inside Steampipe to speed up your queries * it also understands custom resource definitions, although you need to help Steampipe a bit (explained here: https://hub.steampipe.io/plugins/turbot/kubernetes/tables/ku...)
Last but not least: you can of course join multiple "plugins" together. I used it a couple of times to join content exposed only in GCP with content from Kubernetes, that was quite useful.
The things I don't like so much but can be lived with:
* Several columns are just exposed a plain JSON fields ; you need to get familiar with PostgreSQL JSON operators to get something useful. There's a page in Steampipe's doc to explain how to use them better. * Be familiar also with PostgreSQL's common table expressions: there are not so difficult to use but makes the SQL code much easier to read * It's SQL, so you have to know which columns you want to pick before selecting the table they come from ; not ideal from autocompletion * the Steampipe "psql" client is good, but sometimes a bit counter intuitive ; I don't have specific examples but I have the feeling it behaves slightly differently than other CLI client I used.
All in all: I think Steampipe is a cool tool to know about, for Kubernetes but also other API systems.
Steampipe project lead here - thanks for the shout out & feedback multani!
I agree with your comment about JSON columns being more difficult to work with at times. On balance, we've found that approach more robust than creating new columns (names and formats) that effectively become Steampipe specific.
Our built-in SQL client is convenient, but it can definitely be better to run Steampipe in service mode and use any Postgres compatible SQL client you prefer [1].
You might also enjoy our open source mods for compliance scanning [2] and visualizing clusters [3]. They are Powerpipe [4] dashboards as code written in HCL + SQL that query Steampipe.
1 - https://steampipe.io/docs/query/third-party 2 - https://hub.powerpipe.io/mods/turbot/kubernetes_compliance 3 - https://hub.powerpipe.io/mods/turbot/kubernetes_insights 4 - https://github.com/turbot/powerpipe
I really like Steampipe too. Writing the plugins is quite fun.
This is way cool. The ability to visualize the k8s object model as a graph and query it as such makes so much sense! The hottest feature in my mind is applying this in an operator - maintaining state as defined by a simple graph query. It is much more readable, and does so with very little code. Well Done!
since cyper-based (instead of sql), is the key question whether my k8s data is more graph-like or relational?
adjacent but lots of experts here - independent of Cyphernetes or specific tooling, what are you doing to secure k8s api / kubectl / k8s control plane?
What does this offer over jq which I can also afford?
Cyphernetes seems capable of graph/relational logic.
The example on the homepage is literally "give me deployments with more than 2 replicas with pods that are not Running, and give me the IP address of the service they're serving"...
Any idea how to do that with kubectl | jq? Their solution seems elegant to me.
Can just use normal jq select filters unless I'm missing something?
the thing is you'd need 3 k8s queries, one for pods, one for deployments, one for services, then link all of them, and filter... jq helps with the filtering, kubectl can query, but you still need to join the 3 resources to answer the query...
Right, so doable just a bit more effort to do 3 queries to pipes or tmp files
This is Dropbox comment all over again. Lots of things are doable with more manual effort.
True - its a trade off like everything in life - do I want to learn yet another language syntax, or master one like jq.
Personally I feel like mastering jq has more value across a lot more things.
I am a big fan of Cypher I love this. I really wish actual Cypher supported the dot notation for nested keys.
The one thing I have been waiting for
I dunno, Kubernetes has a query language, it's called jq. As in, kubectl get pods -A -ojson | jq -r '.items[] | ...'. Cyphernetes seems simpler perhaps but it's not the 10x improvement I need to switch and introduce a new dependency.
I guess they would say that you have to send the output of that to be inputs of another kubectl command like
because one of their selling points is "no nested kubectl queries".$ kubectl logs -n foo $(kubectl get pod -n foo | awk '/Running/{print $1}')
I don't see how their queries can be more efficient than hitting the kube-apiserver multiple times, unless they have something that lives clusterside observing lifecycle events for all CRDs and answering queries with only one round-trip instead of multiple.
Or maybe they're selling "no nested kubectl queries" as an experience feature, saying that a query language is more ergonomic than bash command redirection. My brain has been warped into the shape of the shell, for better or for worse, so it's not a selling point for me.
You usually don't need that, since kubectl supports jsonpath.
I am firmly in the camp of jq because (a) I am able to bring my years of muscle memory to this problem (b) jq is without a doubt more expressive than jsonpath (c) related to the muscle memory part I have uncanny valley syndrome trying to context switch between jsonpath and jmespath (used by awscli for some stupid reason) so it's much easier to just use the one true json swiss-army tool upon the json those clis emit