Clubhouse data leak: 1.3M user records leaked online for free

(cybernews.com)

306 points | by 0xmohit 1103 days ago

21 comments

  • mittermayr 1103 days ago
    I reported this to Clubhouse in February, no response whatsoever (I am not involved in this leak, just to be extra clear). Essentially anyone with the token from the iOS app (MITMproxy + SSL kill switch) can query through the entire public (records are cleaned) user profile database. It supports wildcard queries and just responds with some 20M records you can page through if you have the time. It luckily (!) doesn't expose e-mail and phone number, which is why I also agree with others here that this is only mildly interesting. The news won't care, however. I think at around 4M users or so they switched from auto-increasing IDs to a better numbering format, until then all records remain as-is (increasing).

    I think Clubhouse can fix this quite easily (limit the records returned in search!!!) and apply some harsher rate limits on a per-token basis (tokens never expire, that's another thing).

    I think they relied a bit too much on certificate pinning. Once that's bypassed, it's relatively easy to query your way through the data. If you managed to grab someone else's token (which doesn't expire), you impersonate them (without logging the other session out), and continue to show up/talk in rooms using the Agora SDK as that person.

    They also do upload phone numbers of the address book in clear-text (non-hashed), although I can see that there's not too much of a point because reverse-hashes can maybe work around this easily if not salted.

    • ramoz 1103 days ago
      I was in some of those CH convos with you. I was actually suspended for a little while and tried clearing it up with them. Sent all the details I had and the original google doc I published w/ a lot of styprs work. They never responded but I was unbanned and given some fresh invites... but yea... strange it hasn't been cleared up.

      Ultimately I think the premise is around a completely open and a transparent digital experience. Clubhouse still needs to defend against those with malicious intent and a new realm of psychographics to abuse.

      Side note: I was hooked on the app until that suspension (lasted ~2w)... I haven't been able to get back into a groove. I rarely log on anymore.

    • rsj_hn 1103 days ago
      Reading your post, it's amazing the checkboxes of failed access control efforts:

      * trying to control clients

      * obfuscating IDs

      * rate limiting data

      ...rather than the more boring yet standard approach of thinking through an access control policy and then enforcing that at the server.

    • aasasd 1103 days ago
      > Once that's bypassed

      Do you mean that you trick the app into accepting a wrong cert? How does one do that, apart from decompilation?

      • jeroenhd 1103 days ago
        Jailbreaking an iPhone and using a tool like SSL Kill Switch [1] or just plain, old Frida with a script like [2] will do the job. Jailbreaking is the hard part, especially for an up to date iPhone, after that there's loads of guides you can follow that disable certificate validation for pretty much every application. It all boils down to hooking the necessary validation functions and having the APIs lie to the app code.

        Some apps package their own crypto helpers (often with big crypto problems) to make this harder and require actual reverse engineering, but those are a pain to maintain and it's only a matter of time before someone finds a way around them. If you can extract the symbols (so if the app has not been obfuscated well) you can use Frida's API to hook those as well through any language you like. There's even an interactive Javascript console you can hook into the apps you're hooking!

        Certificate pinning is a great way to protect users' security and privacy, especially in countries with questionable governments or ISPs, but it won't protect your app's secrets.

        [1]: https://github.com/nabla-c0d3/ssl-kill-switch2 [2]: https://techblog.mediaservice.net/2020/08/ios-13-certificate...

        • mittermayr 1103 days ago
          That's exactly right. The hardest part is finding a phone that runs iOS 13+ and can be jailbroken still. I think I used an iPhone 7 or 8. If someone's really curious, it's probably even worth the $50-$100 for a used iPhone that can be jailbroken, it opens up A LOT of similar doors for investigating.
          • lukec11 1103 days ago
            FWIW, the iPhone 12 I bought 2 weeks ago came with 14.2.1 (which has 2 jailbreaks available for it, unc0ver and Taurine)

            Not sure if older iPhones would come with older versions too, I assume used ones would normally be up to date - though iPhone X and earlier are jailbreakable on all versions via checkm8

        • aasasd 1103 days ago
          > Some apps package their own crypto helpers

          Didn't even know that mobile OSes have APIs for cert validation, since that's not a part of the OS in my books. Though the motivation for shared libs is understandable (Facebook being an example of what not to do).

          I guess one drunken evening I'm gonna read through lists of the APIs just to see what kinds of stuff are crammed there.

          • jeroenhd 1103 days ago
            I'm not sure why you're surprised, Windows has come with a library for certificate validation since the late 90s. The OSX documentation library has an example of using SecureTransport all the way back to 2004, but the API is probably older. The *nix systems, with their modular nature, may be technically usable without a TLS library, but even your average RTOS comes with fully-featured TLS support built in these days.

            Mobile operating systems provide a very broad API so that access management and sandboxing is made easy. I'm not sure how things are done on the iOS side, but on Android you can enable certificate pinning application-wide by just putting an XML file with the right name in the right place and adding a key/value pair for the hostname and the pinned public key (anywhere in the validation chain, AFAIK). The same XML file also allows disabling plain text requests from your application runtime, preventing accidental data leaks to insecure networks.

            Because adding security is so easy, there's loads of apps enforcing a security setting that otherwise would be considered obscure to most application developers. Exposing an optional, application-wide API is a pretty solid idea in my book; I'm not aware of any Linux system API that can easily enforce certificate pinning on an application-wide level.

      • nerdbaggy 1103 days ago
        iOS you can use Charles app and intercept https request without any extra device.

        https://www.charlesproxy.com/documentation/ios/

      • captn3m0 1103 days ago
        You usually recompile the app, or if you have a jailbroke phone - you can do it at runtime.

        But considering how the API is documented now (and alternative third party apps exist), it might not even be necessary.

    • bitexploder 1103 days ago
      Nothing wrong with auto incrementing identifiers if actual security controls (authorization) are implemented for already authenticated users.
      • yardstick 1103 days ago
        Sequential is still bad if you don’t want to disclose the size of your customer base or other commercially sensitive information.

        Also see the German Tank Problem[1].

        1. https://en.m.wikipedia.org/wiki/German_tank_problem

        • bitexploder 1103 days ago
          Am aware. Still don’t think it’s worth the hassle for most situations. Can leak information, but context is really important. I have rarely seen it be an issue over many years of app assessments. Just something to keep in the threat model for when it’s relevant.
          • yardstick 1102 days ago
            There was one wireless ISP many years ago in a city I lived in that had a signal/reception page to see your signal to their closet tower. The URL included the customer number to identify your location. I quickly discovered it had no authorisation checks. You could easily find the exact addresses of all of their customers. Inactive/old customers returned no data.
          • shalmanese 1103 days ago
            What hassle is it? Where in your codebase do you assume sequentiality? It should be a one line change in your db configs to generate GUIDs instead of ids. You have to do it eventually anyway as sequentiality can't be assumed once you shard.
            • bitexploder 1103 days ago
              Depends on the needs I suppose. I don’t like starting off with GUIDs until it’s proven they are needed, because, as you say, it’s a simple change. Sharing does complicate the picture, but how many apps really need sharding.
              • sangnoir 1102 days ago
                > I don’t like starting off with GUIDs until it’s proven they are needed

                For security incidents, "when they are needed" will be too late to do anything. If it's all the same to you, I'd advise that you default to GUIDs.

        • jgalt212 1103 days ago
          true, but if you do sequential for users and free trials, the information leakage can be close to zero. Think about if all those AOL CDs were sequentially numbered.
          • SahAssar 1103 days ago
            That's not zero information leakage. That's just leaking another statistic that is somewhat correlated to the one you want to hide (you're leaking the production of trial AOL CDs, and production of trial AOL CDs have some correlation to number of new users).
        • jbluepolarbear 1103 days ago
          Don’t return indexes with user queries.
          • wongarsu 1103 days ago
            Usually you need some external unique identifier so you can interact with the object. Sure, that doesn't have to be the db index, but it is the convenient choice
            • azinman2 1103 days ago
              Except when you want to change databases, or grow beyond a single one, or shard what you have. If you do this you’re binding your future self.
      • sangnoir 1102 days ago
        If you follow the "defense in depth" paradigm, then sequential identifiers bad when the other controls are defeated. Sequential ID make it trivial to crawl the entire dataset - which could be the difference between "Information on 4 million users was stolen" and "information from 4 users was stolen"
  • rvz 1103 days ago
    From [0]:

    > This is misleading and false. Clubhouse has not been breached or hacked. The data referred to is all public profile information from our app, which anyone can access via the app or our API.

    So just like what happened to Parler and LinkedIn. A so-called 'data breach' of its public data via scraping.

    But last time I checked on the private API in a GitHub repo, Clubhouse is using integer IDs which are not random alphanumberic strings for its users.

    This can essentially be scraped by a while loop, incrementing all the way to whoever last signed up.

    Did Clubhouse even implement rate limiting to combat this?

    [0] https://twitter.com/joinClubhouse/status/1381066324105854977

    • sschueller 1103 days ago
      Does anyone remember the ATT "Hack"? These two just used curl to get e-mail address and ICC-ID of ATT iPad users which where publicly accessible. [1] It was still labeled a hack and went through the brain dead media that way. Instead of ATT getting in trouble Auernheimer got a 41 months sentence and the judge also ordered him and Spitler to pay $73,000 in restitution.

      [1] https://www.wired.com/2013/03/att-hacker-gets-3-years/

      • soulofmischief 1103 days ago
        weev had a lot of friends come forth to defend his actions, but then he went full neonazi in prison and all of that support disappeared.

        He now runs admin for The Daily Stormer and somehow finds a way to keep popping up in the worst places, can't seem to shake the guy.

        Additionally, he didn't even do the legwork for the AT&T op, but really wanted to take credit for something for cool hacker cred, and it came back and bit him. I keep company with several people involved in that ordeal.

      • FDSGSG 1103 days ago
        If this wasn't a hack, would a SQL injection have been a hack? Where do you draw the line?

        What if they had exploited the heartbleed bug, would that have been a hack?

        • sbarre 1103 days ago
          I think this comes down to an oft-repeated discussion of what we (society) consider the "proper" securing of data.

          If a company leaves data available in a manner that is accessible without using any kind of vulnerability, but rather allows the unintended (ab)use of a poorly implemented service, then that's not a "hack", that's on the company, and they should be held accountable.

          Personally I think an SQL injection still falls under the above. Securing public endpoints against long-known and easily mitigated vulnerabilities is 100% the company's responsibility..

          There is no "we couldn't have prevented this" bullshit defense in that case.

          • FDSGSG 1103 days ago
            The company probably deserves punishment for negligence, but that should have zero impact on how we view the actions of the "hacker".

            >If a company leaves data available in a manner that is accessible without using any kind of vulnerability, but rather allows the unintended (ab)use of a poorly implemented service, then that's not a "hack", that's on the company, and they should be held accountable.

            Is it "theft" if you leave your keys in your car and I take it?

            • strogonoff 1103 days ago
              IMO the “car with keys inside” analogy is not great for poorly implemented infosec. The latter is both more benign (no physical property directly stolen) and at the same time worse (the scale means multitudes of people will become vulnerable to further attacks, identity theft, doxxing and so on, rather than just one person losing the means of movement). It’s just different qualitatively.

              What should have impact on how we view the actions of a hacker is the context of said actions, not just the “hacking” part.

              If the hacker exploited company’s infosec negligence to profit in some way (say, by selling the data), or irresponsibly disclosed the data or the vulnerability possibly causing harm to affected users, it is one thing.

              Otherwise, the same sequence of hacker’s actions that exploits the vulnerability does not compare to stealing a car with keys inside—it is (another faulty analogy warning) more like looking at a car through some magical looking glass that highlights the keys left inside by the owner.

            • sbarre 1103 days ago
              It certainly is.. but now let's go talk to your insurance company and see how they feel about covering the loss.
    • Kye 1103 days ago
      It's only a problem if you think it's a problem for someone to trivially build a social graph for every person on your exclusive social network with lots of high profile people.

      So...it's a problem.

    • eswat 1103 days ago
      Not a great response on their part since the article they reference in the tweet does not say that they have been breached or hacked. Only that there's a limited dataset of users out there and that Techmeme reached out to Clubhouse to know if they are aware of any breaches of their systems.

      Pretty bad optics if the other stuff is true: incremental IDs, no rate limiting, tokens that don't expire.

    • mvanaltvorst 1103 days ago
      Correct, and judging from someone else in this thread, it was even possible to use wildcard matching to get access to an entire list of users at once.
  • benja123 1103 days ago
    I understand why people are saying this is not a breach and I tend to agree. I do think there are some basic measures you can put in place to make this kind of abuse harder.

    The real problem is that most users don’t understand when they sign up for a service like clubhouse, what information is public, how easy it is for bad actors to get access to that information and how this information can be used to harm them later (phishing, identity theft etc.).

    Who should be educating the average non technical user about the risk of agreeing to share you information publicly and even if they knew would it actually change anything.

    Personally, I have hit the point where I have accepted that all my -and my families information is public and for that reason with people like my parents I tend focus on teaching them to avoid falling for phone scams and phishing.

  • lovedswain 1103 days ago
    I guess a leak requires private data to be exposed, this is just a collection of public data.
    • p49k 1103 days ago
      Is it public info who invited you to the Clubhouse app? If not, that would assume some kind of breach, since that info is part of the leak.
      • saurik 1103 days ago
        > Is it public info who invited you to the Clubhouse app?

        Yes: that is public info. This is all no more a "leak" than the original service is a "leak" of itself.

      • ThomPete 1103 days ago
        Everyone can see who invited you to Clubhouse down in the bottom of the profile.
      • Kiro 1103 days ago
        It's not only public, it's central to the whole concept. You can always walk up your tree to see who the original member in your line is.
    • xyst 1103 days ago
      I agree. This "hack" is the equivalent of any search engine indexing public Facebook or LinkedIn profiles.
    • gabipurcaru 1103 days ago
      Same as the recent FB and Linkedin incidents. It's all scraped data. Doesn't mean that collecting public data at scale is not something bad
      • kostarelo 1103 days ago
        Which one are you referring to? The recent millions of contacts that were exposed from FB contained phone numbers. It had my phone number and it's not public.
  • jtokoph 1103 days ago
    It looks like someone just scraped all of the public profiles.
    • Zealotux 1103 days ago
      It looks more like a SQL dump to me. The data doesn't seems to be too critical however.
      • tW4r 1103 days ago
        How so? It is exactly the data you see when you open any clubhouse profile in the app

        Almost as if there was an endpoint /profiles/id that someone just scrapped by using id 0..9999999999

        • vidarh 1103 days ago
          One of the first places I worked they had that.

          For private data.

          Guess their user id and you could get someones whole contact list, access their voicemail, or start a 30 person conference call which could dial out internationally with calls billed to the affected user...

          The entire top management had user ids below 100...

          I found the problem because on login all it set was a cookie with the userid, and so of course I tried changing it.

          When I alerted my manager to the problem they put in place 'encryption' of said cookie.

          It was base64 encoding.

          They were shocked when I broke that too.

          Writing this now it sounds invented, but it's not. To be fair this was more than 20 years ago, and a lot of developers did not yet have any understanding of security, so they at least had a shred of an excuse.

          I left that company first chance I got.

          • Cheezmeister 1103 days ago
            > 'encryption' of said cookie...It was base64 encoding.

            Made me chuckle.

            • vidarh 1103 days ago
              I never figured out what thought process led to them considering base64 a security feature. I mean, I could tell just by looking at the cookie it was base64, but I expected that meant they'd encrypted it and then base64 encoded the result. But no. It made me treat every bit of code I was handed with extreme caution.
      • frombody 1103 days ago
        If I were collecting a large amount of data, I would most likely store it in a database.
        • eurasiantiger 1103 days ago
          Active data yes, archived data goes into the warehouse.
    • bloudermilk 1103 days ago
      Yup, this seems to be the case. I don’t know how this could be characterize as a leak?
      • eswat 1103 days ago
        It's definitely a grey area here. Clubhouse strongly encourages using real names on their platform. That can be considered personal information that should be protected by security controls that they seem to lack, based on what others have mentioned here, that could have limited this "leak".
  • monkey_monkey 1103 days ago
    Perhaps we need to add a term such as "harvesting", to better distinguish between hacks/leaks and mass aggregation of public profile data.
  • hashhar 1103 days ago
    Looks like it's a scrape of public profile information from Clubhouse.

    Also it reads more like an advertisement for the author's services.

    I'd like to see a more credible source.

    • coldcode 1103 days ago
      That someone who wrote an iOS app with such a lame concept of security that anyone could dump the entire database (even if its only "public" data) in a script is not surprising, as most startups and even big companies don't give a crap about security. I've seen this way too often. If you are so cavalier about security in simple rest queries, imagine what lurks beneath not yet discovered.
  • rogers18445 1103 days ago
    This data seems to have been public and free for a while... Here: https://www.kaggle.com/johntukey/clubhouse-dataset
    • xyst 1103 days ago
      The generated graph is interesting. I guess everyone in the middle are early adopters and people with high numbers of followers. Then the clusters on the outer edges are people catering to a niche audience. Then those niche audiences spawn their own microcosm
  • brown9-2 1103 days ago
    As long as we are talking about “leaks”, someone at Clubhouse might want to look into being compliant with California’s Consumer Privacy Act:

    https://twitter.com/wbm312/status/1360014416087945222?s=21

  • ulzeraj 1103 days ago
    This is the cherry on the top for their policy that requires real names. People never learn.
    • asimjalis 1103 days ago
      They also know your birthdate and phone number. The only thing they don’t know is the name of your first pet.
      • judge2020 1103 days ago
        These things weren’t exposed via the API.
  • swiley 1103 days ago
    I'm completely done with centralized social media "apps." I'm not signing up for any more and other than HN I've stopped using all of them and recommended that my friends do the same (surprisingly, many have listened.)
  • asimjalis 1103 days ago
    I feel this validates their decision to only release on iOS first. On Android there would be even fewer barriers to this kind of scraping.
  • cblconfederate 1103 days ago
    I think we're watching the implosion of the cloud. Those leaks are not even illegal, yet they will lead to a lot of spam, a lot of phishing, and a lot of other clumsy actions by clumsy actors that will alienate users and make them more reluctant to give their information next time. At least, i hope we re post peak cloud and falling fast into the norm of the internet the way it was meant to be: pseudonymous
    • runeks 1103 days ago
      I don’t get it. This “leaked” information looks like something that would be displayed on a public website for each user. As far as I can see it’s just public information, like user names and avatars on e.g. stack overflow.
      • cblconfederate 1103 days ago
        The difference is it's all in a SQL dump that any kiddo can use to spam. Ease of use matters
        • zepto 1103 days ago
          Do you think this additional spam will be noticeable on top of what is already there?
          • cblconfederate 1103 days ago
            phone/sms spam is definitely more noticeable and actually hard to block
  • _trampeltier 1103 days ago
    2021 the year of leaks ..
    • o_m 1103 days ago
      Not really. It seems like we have redefined what a leak is.
    • quickthrower2 1103 days ago
      Hopefully this is a good omen for Julian
  • girlinIT 1101 days ago
    Honestly, I Clubhouse will fail. Telegram also launched a feature with podcasts. And in order to listen to lectures, you need some kind of material other than audio, so that everything is better assimilated and remembered. For this reason I use https://audext.com/
  • williesleg 1103 days ago
    It's always leaked, it's just about who has it.
  • tonetheman 1103 days ago
    All of their devs must be too busy working on an Android app to fix these minor security bugs... :)
  • kulikalov 1103 days ago
    Why is this a leak? Looks like someone scraped the data any user has access to. If this is a leak, then the cybernews.com feed should be filled with linkedin/if/fb/etc leaks every minute.