Ask HN: Create embeddings efficiently for an AI notes app with E2EE

Hi,

I am building a notes application that automatically finds related notes - thinkdeli.com. I want to implement end-to-end encryption sometime soon.

To find related notes, I create embeddings, add them to an index, and find the nearest neighbors.

I am creating embeddings of the user's notes locally using transformers.js. But setting up the pipeline takes up a lot of memory (around 400+ MB on Chrome). This makes it somewhat impractical to use on older devices.

Is there a more efficient way to create embeddings locally?

Creating embeddings via an API will be more efficient, but that would mean sending users unencrypted notes over the cloud to the service. Edit - What I mean is this. The user's notes must be unencrypted and readable as plain text on the server to create embeddings. This defeats the purpose of end-to-end encryption.

I would appreciate any pointers. Thanks a lot!

3 points | by satyajeetjadhav 9 days ago

4 comments

  • Someone 7 days ago
    > The user's notes must be unencrypted and readable as plain text on the server to create embeddings.

    Consult a security expert before doing this, but here’s an idea: encrypt each word of the text, send the encrypted tokens over the wire, and then use an embedder trained on text encrypted with that method.

    If you use an asymmetric encryption method, you could even throw away the private key.

    The result still would be a substitution cypher on words, so it would not resist frequency analysis and it won’t help at all that, if your users manage to extract the key, they can encrypt text to figure out the mapping, but it would protect against people ‘accidentally’ looking at text of your users.

    Periodically switching the encryption key wouldn’t be that hard.

  • rahimnathwani 8 days ago
    Which embedding model are you using?

    Perhaps pick one with lower memory usage from this list?

    https://huggingface.co/spaces/mteb/leaderboard

  • pmtolk 9 days ago
    • satyajeetjadhav 9 days ago
      Sorry, I should have phrased the last part of the problem better. I already use https.

      The user's notes must be unencrypted and readable as plain text on the server to create embeddings. This defeats the purpose of end-to-end encryption.

  • innethread 8 days ago
    I’m not sure if there are implementations for browsers, but look into embeddings with homomorphic encryption.