Tanstack Start | MitmProxy2Swagger: Automagically reverse-engineer REST APIs

MitmProxy2Swagger: Automagically reverse-engineer REST APIs(github.com)

590 points by AbuAssar a year ago | 81 comments

Gamemaster1379 a year ago
This is a nice tool. A game I liked to play announced end of service back in 2023. They gave enough notice to let me capture some logs from their cooridinator service.
I captured them in mitmproxy and ran those through this to help me identify all the endpoints and their general structure. (A few things were a misnomer, like the examples suggesting certain values were able to be floats when they could only be integers)
I was able to get a team together and we were able to stand up private servers as a result.
- simonjgreen a year ago |parent
  Amazing! What game was this for? I was involved in the RE efforts around UO way back in the day.
  - kirici a year ago |parent
    Gundam Evolution, going by comment history.
    - ge96 a year ago |parent
      Different plot/game mechanics but armored core 6 is great if you like mecha
  - Gamemaster1379 a year ago |parent
    Gundam Evolution, as someone else noted from my comment history.
swyx a year ago
did i miss something or why are there TWO (2) "magically reverse engineer REST APIs" projects on the HN front page right now? is there some offline beef going on?
(screenshot in case this goes away https://x.com/swyx/status/1874762725383188502)
- Quarrel a year ago |parent
  Presumably, because the closed source one got some traction, so people are pointing out the open source alternative.
- littlestymaar a year ago |parent
  Likely because of this comment[1] in the other thread which made people submit this link, and when multiple independent people submit the same link in a short period of time you're very likely to end up on the front page (this exact situation happened to me once)
  [1] https://news.ycombinator.com/item?id=42568121
  - AbuAssar a year ago |parent
    Yeah, that's where I got the link from.
- mylastattempt a year ago |parent
  Offtopic and meta, but, you share a screenshot using Twitter/X? That's really bizarre to me. That is all, just had to say that.
  - swyx a year ago |parent
    how is it worse than photobucket or imgur
- TechDebtDevin a year ago |parent
  [flagged]
  - jereees a year ago |parent
    I put swyx up there with sama in the category of extremely smart people that give me the ick for reasons I cannot articulate
    - swyx a year ago |parent
      honored i guess. sorry about that.
colesantiago a year ago
Again, this is the very easy part of the reverse engineering API process that most tools can do, similar to API Parrot and the rest of them. This is not hard to do.
The hard part is that inevitably, all these internal APIs will just add aggressive CAPTCHAs, Device Check, fingerprinting, etc to prevent common drive by re'ing. Easy to add these on the defence side, and extremely difficult to bypass on the other side.
I can imagine all developer teams now upping their security with the combination of the above mentioned to prevent this.
- sebmellen a year ago |parent
  Depends on the age of the tool. We work with a lot of legacy systems that actually want us to integrate with them but don’t have the dev resources to build a proper API surface. As a result, we end up doing a lot of painful reverse engineering. These tools look promising for purposes like this.
- devjab a year ago |parent
  I curious as to why people would have a public API to begin with if they wanted to protect it from people using it. Then again, why would anyone have a public undocumented API in 2024 when a LLM can give you a cli tool to auto-generate 90% of the OpenAPI spec in a couple of hours? The last question isn't serious, I've worked in enterprise for decades and almost none of the tools organisations end up buying have good documentation for their API's. Not that those are publicly available, but still.
  - lesuorac a year ago |parent
    I think you have a misunderstanding here.
    The API needs to be "public" because the app uses the internet to communicate back to the home server.
    The API is not "public" in the sense that the app developers want anybody to use it; they just want their app to use this API. So they don't write publicly accessible documentation about it because they don't want to encourage its use.
    A tool like MitmProxy2Swagger lets you run the app and record all of its API calls so that you can use this unadvertised API.
    - devjab a year ago |parent
      Why wouldn’t you add authentication to an API you don’t want others to use?
      - ssdspoimdsjvv a year ago |parent
        The web app probably authenticates using an API as well, in which case it's trivial to add that to your shadow client as long as you have the credentials.
      - lesuorac a year ago |parent
        Laziness / skill issue.
        How many apps have you seen only do client-side protection?
- jampekka a year ago |parent
  Making a mitmproxy dump from a manual browsing session is more or less unblockable, barring some TPM or similar fuckery.
  Usage of the API even with the protocol known OTOH can be quite easily made really hard.
- mad_vill a year ago |parent
  There are many cases where users are behind a forward proxy for security/compliance reasons. Most applications need to support these types of users.
zebomon a year ago
I looked through this earlier today when I saw it mentioned in that thread about the closed source tool for the same purpose.
Having done a good bit of this type of reverse engineering the hard way over the years, it's a very exciting find. I had been talking with my partner about building something similar for the past six months. How exciting to learn that it's already out there and open source too!
tecleandor a year ago
I've used this tool in the past with success. Not perfect but it accelerates the work greatly if you can launch a mitm proxy quickly and are familiar with the tool.
I've been fighting lately with an API, though. It's not very, let's say, RESTy. It has only one endpoint, and the different "sections" of the API are defined in parameters, so MitmProxy2Swagger doesn't detect them properly :(
- quectophoton a year ago |parent
  > It's not very, let's say, RESTy. It has only one endpoint,
  To be fair, from what I understand an actual(tm) REST API would only have a single defined endpoint[1]: the entry point. With every other endpoint being discovered from the responses. And also from your message I'm guessing a URI still uniquely identifies a resource (specifically through the "query" part of the URI, instead of the more common "path").
  So, technically, assuming there's nothing too weird with that API, it seems like MitmProxy2Swagger is failing to detect a REST API.
  [1]: Corollary: If an API is RESTful, it should be possible to rename any endpoint (except the entry point) at any moment in time without prior notice, and clients would not break as long as the response types/schemas are still supported by the clients. In-flight requests might fail with a 4xx, but after a retry they should go to the correct endpoint without any code change required.
  - zdragnar a year ago |parent
    This is HATEOAS, basically the core feature of REST that very few people actually use. Most of what the industry calls REST or RESTful is just structured and inefficient RPC.
    - tecleandor a year ago |parent
      True, I almost never see the endpoint discovery thing, I almost forgot about it...
  - pests a year ago |parent
    I don't think anyone has ever used REST in the way you are using it - the sibling comment points out that HATEOAS is probably what you mean - this generally embeds links to all resources, full data navigation, next/prev links, and so on. It is true that a proper HATEOAS client should be able to navigate an endpoint completely with just a starting address.
    - quectophoton a year ago |parent
      Yeah unfortunately despite it being part of the REST definition, nowadays "REST" has become a term that means "REST but without HATEOAS". Similar to how "API" now means specifically "HTTP API that returns JSON", or "AI" now means "Generative AI specifically".
- nejsjsjsbsb a year ago |parent
  Nothing is RESTy
notcrazylol a year ago
I was wondering how it would take in graphql endpoints and convert it to swagger, since its just a single POST API with change in params. But thats more of a swagger issue than the tools. Has anyone dealt with this? Would be really helpful if you could share your ideas too :)
- asabla a year ago |parent
  Why would you tho?
  If you're working against an GraphQL based API, you should be able to pull a schema file. And use that to implement your own API.
  All you would get from an Mitmproxy is example queries and mutations. With the additional complexity of extra tooling to stich together the schema file
  - jampekka a year ago |parent
    Pulling the schema file can, and often is, disabled server side. And GraphQL APIs can, and often do, decline to serve other than persisted queries, and those can't be really inferred even with known schema.
  - notcrazylol a year ago |parent
    So I am working with a new company that has a ton of graphql queries. What I wanted to do was write an integration test for them in the fastest and easiest way possible.
    I don't want to sit and read each query to identity where it is in the user flow. So I was thinking if I run this in the background and go through a happy flow, I can get the APIs in order and write an integration test.
mkagenius a year ago
If only someone could automate[1] the clicking and navigating part by writing in plaintext something like "Open airbnb and explore all the features as much as possible" :)
1. https://github.com/BandarLabs/clickclickclick - It does that and I am one of the authors.
youngNed a year ago
perhaps a n00b question, but would this work, or is there something similar for apps, specifically android apps?
- tecleandor a year ago |parent
  I've used this specific tool to help me reverse engineer the private API of an Android App.
  The thing is, depending on how hardened the app is, you'll have to play with Android to allow this interception, mostly because of certificate pinning. Also I remember something about apps not using the system wide trusted certificates you install (IIRC).
  I remember using a rooted device with LineageOS, and downloading the APK and modifying it with a tool so the self signed certificate for the mitm proxy works with it.
  The mitm proxy docs have some links to tools that can do that [0] and you could also use an Android emulator if you don't have an extra phone to mess with it [1]
```
  0: https://docs.mitmproxy.org/stable/concepts-certificates/
  1: https://docs.mitmproxy.org/stable/howto-install-system-trusted-ca-android/
```
- whilenot-dev a year ago |parent
  A MITM proxy isn't specific to any app, it's a forward proxy for your outgoing network connection. In case of an Android app you'd need to run mitmproxy on a machine in your network and setup the connection as proxy in your Android's network settings. Then you'd need follow http://mitm.it to install mitmproxys root certificate on the Android device (to trust the connection with TLS) and off you go.
  EDIT: or rather follow the docs[0]
  [0]: https://docs.mitmproxy.org/stable/howto-install-system-trust...
- rhaps0dy a year ago |parent
  Depends on the app. If it uses some online functionality probably yes. You could also try decompilation, it’s decent on java apps like android’s.
- jazz9k a year ago |parent
  I use burp suite combined with Frida (which can remove root check and override ssl pinning).
  - nsteel a year ago |parent
    Yes, this. The Frida tools method to remove cert pinning is the only method that has worked for me. The mitmproxy docs for android (as referred to by another commenter) didn't work for any apps I tried.
a year ago
[deleted]
zython a year ago
This is so cool. Thanks for sharing !
srameshc a year ago
Obvious question: How to protect against this ?
- mathgeek a year ago |parent
  Build your API assuming anything public facing will be known. This includes anything downloaded to a device.
- K0nserv a year ago |parent
  Your first line of defence should be a secure API where an attacker doesn't gain anything by knowing it.
  You can add obfuscation, but ultimately if the client is shipped to the user you must assume an attacker can reverse engineer it.
- smallnix a year ago |parent
  What specifically do you want to protect?
- tonyhart7 a year ago |parent
  for me, we cant 100% protect again this type of usage but we can minimize with good observarbility and monitoring tools that always check if user is run this via verified way (signed app,web or etc) or RE'ing the api <<
  because guess what??? we are the creator of such system, its easy to detect bot/such case when you have good analytical data because this type of way does not give any "traces"
- bandrami a year ago |parent
  I find this confusing because the point of an API is to be known, yes? Otherwise who's accessing it?
  - quesera a year ago |parent
    It's a valid desire, but you have to be really dedicated to the effort to block it, in practice.
    You might intend your API to be consumed only by your own clients. E.g. your published mobile apps.
    A well-designed API won't allow a third-party client to do anything that your own client wouldn't allow of course. Permissions are always enforced on the back end.
    But there are many cases where a user might want a custom/different client:
    If your mobile apps are not awesome, or if they deprioritize a specific use case, or if they serve ads ... or even if your users want to automate some action in your service...
    If your service is popular enough (or you attract a certain kind of user), you will have some people building their own clients.
    - bandrami a year ago |parent
      Those sound like bad use cases for a client-server model with public endpoints, then? I mean, you could cert-pin yourself in the client app, I guess.
      - quesera a year ago |parent
        Not sure what you mean here. All endpoints are equally public.
  - kube-system a year ago |parent
    Not necessarily. A common pattern is to build a 'private API' intended to be used by one's own front-end applications. For example: most client-rendered applications, like the Airbnb example on this page.
  - nsonha a year ago |parent
    Modern APIs are actually most of the times poor man's RPC, they don't need to exist, much less known.
- soheil a year ago |parent
  [flagged]
  - RandomRandy a year ago |parent
    You can read SSL traffic if you're able to install a root certificate on your device and the website/app doesn't use certificate pinning.
    I recently used HttpToolkit to reverse engineer a REST endpoint that used SSL encryption
    - pimterry a year ago |parent
      Even if it does use certificate pinning, you can generally disable that using tools like Frida (https://frida.re) with scripts like https://github.com/httptoolkit/frida-interception-and-unpinn...
  - batch12 a year ago |parent
    This isn't true. Mitmproxy and burp can both proxy TLS. Maybe you're misunderstanding the use case.
  - iBotPeaches a year ago |parent
    A good deal of APIs don't pin SSL certs so MITM works for a solid amount of them.
  - erk__a year ago |parent
    Only as long as you cannot load your own certificates, which you are able to in a lot of cases. Though on Android you can lock certificates allowed in a app, this can be circumvented though it adds another step. I am unsure if the same is a case for Apples devices, at least you might need jailbreak there.
  - a year ago |parent
    [deleted]
construct0 a year ago
Yeah - does this get nullabilities right?
Zinkay a year ago
[flagged]
Zinkay a year ago
[flagged]
waseemmalik a year ago
[flagged]
- efilife a year ago |parent
  [flagged]
  - dang a year ago |parent
    They aren't.
    - NicholasGurr a year ago |parent
      [dead]
soheil a year ago
[flagged]
- CubsFan1060 a year ago |parent
  https://docs.mitmproxy.org/stable/overview-getting-started/#...
  Seems like the proxy handles all the SSL, and likely strips any SSL Specific headers, etc..
  But also, many, many, many companies do this exact thing. Just one example: https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?...
- daemonhorn a year ago |parent
  1) Having the TLS stack trust a "custom CA" provided by any number of debug tools (like mitmproxy or OWASP Zap) is relatively simple operation and can be done by anyone on any OS as long as you have admin/root. 2) There are a number of additional debug ways to decode the encryption from an application endpoint (e.g. https://wiki.wireshark.org/TLS and look at SSLKEYLOGFILE environment variable supported by most major TLS stacks and all major browser). Since MitmProxy2Swagger also supports HAR format ingest (e.g. https://github.com/alufers/mitmproxy2swagger#HAR ), this can easily be exported from any browser built-in debug tools (which also removes the encryption).
  Modern TLS is great, but there are limitations on what it actually provides especially around the CA trust model. These mitm tools are not designed to take random traffic from the internet you intercepted, they require privileged endpoint access to enable specific debug features or configurations.
- at0mic22 a year ago |parent
  I think HAR export consumption lets you avoid the whole MITM part if we are talking about website API detection
- eightnoneone a year ago |parent
  This is only a problem if a client application has a server certificate pinned in source code. Otherwise, you can create a cert with a privacy CA and add it to a desktop OS trusted cert store.
  - K0nserv a year ago |parent
    Adding a CA cert to the OS trust store only works if the application uses it. I've encountered apps that don't use the OS trust store or networking stack; even then it's possible to reverse engineer the traffic though[0].
    0: https://hugotunius.se/2020/08/07/stealing-tls-sessions-keys-...
- dtn a year ago |parent
  Isn't that the point of mitmproxy? https://github.com/mitmproxy/mitmproxy
- cenamus a year ago |parent
  Option b could be more about breaking into some office that happens to contain those keys ;)
- bandrami a year ago |parent
  Wait, it works fine on production APIs, what it doesn't work on is "production clients". You're deliberately man-in-the-middling yourself.
tinchox5 a year ago
Coool!
andrewstuart a year ago
This is something that would be easy to do an ordinary job of, missing lots of edge cases and not making something thorough and complete.
A really professional and thorough job would be extremely time consuming and hard.
- matthewolfe a year ago |parent
  I do this a lot for my work. A tool like this that can help get me to a nice starting point is huge. Instead of developing a mental model of the API in my head by manually looking through API requests/responses in ProxyMan, this can start me off much more quickly. From there, the edge cases can be worked out.