• axg11 11 days ago
    First off, I want to make the disclaimer that this project is very much an MVP and the landing page/front end is quite ugly.

    I'm sharing tingy - this is a service that allows you to upload a video and query it using a text description. For each upload, you can make two queries and they can be any text that you wish (in English).

    Here's an example: You have 4 hours worth of security footage. During those 4 hours you know that someone stole a bike. You could query the video for "person riding bike".

    I'm looking for some test users - please reach out here if you would like to trial tingy. I would be happy to set you up with a free account.

    • grepfru_it 10 days ago
      Very interesting. I do this with my security cameras right now but mainly to find out who left poop in front of my house or who left the trash can behind my garage door. The second use case is much easier than the first since a lot of dogs walk past my house, a lot of dogs poop in front of my house, but not a lot of dogs walk by that leave poop behind (or sometimes the poop is out of view).

      Interesting idea nonetheless

      • axg11 10 days ago
        In this example you could query for "poop" and then look at the resulting plot for a sustained period of time with high poop probability? :)
        • grepfru_it 9 days ago
          how do you distinguish between poop and mud? :)

          its great in theory but real world creates harder challenges. the real answer is to keep track of all the dogs that pass by and then compare the before/after scenes for differences. i'm doing this with my garage door camera for detecting water (for a completely different problem)

          • axg11 9 days ago
            Perhaps analyze for the combination of dogs + poop? The dog probability will be transient/a spike, whereas the poop probability will be constant (after appearance of the dog).
      • gotostatement 10 days ago
        I could see that there are some users who would want an ongoing subscription, such as a security office who looked through security videos on a regular basis to investigate thefts. But I have to think that in a lot of cases there will be people who just want to search through a video or two on a one-time basis - why not a per-video, non-subscription pricing option?
        • axg11 10 days ago
          Interesting point, any idea at what level you would price this?
          • gotostatement 8 days ago
            I cant speak to the economics on your side, but from a consumer point of view, the main issue I'm having is: how do I know this thing will even work? I don't want to shell out $10 and wait for processing time only to find out it doesn't even work

            Maybe one way is to let users upload it for free, interactively search the first 10s for free, and then require payment to look through the rest of the video. If I felt confident that it was going to work, and I had a real reason to search through the video, I would probably pay between $1 and $5, depending on the length of the video. Much more than that and I'd rather just scrub through it myself. I guess the length of the video is roughly proportional to how much I'm willing to pay to not have to scrub through it myself.

            The other thing is that it's much easier to scrub through and find an image, rather than voice. If I want to find a bike, I can just scrub through at 10x and look for bikes, a bike isn't going to just appear in a single frame and disappear in the blink of an eye. If it also searched voice, it would be worth even more.

      • chirau 10 days ago
        Nice to see someone pursue this. I built something similar at a hackathon in NYC a few years back and won. I remember the tricky part was that in addition to uploading videos, I wanted people to just use a URL, like a YouTube video or something and process that. But the Vision API only worked with videos that were in storage on a bucket or something. Spent the whole night building a workaround.
        • axg11 10 days ago
          Did you use an API from one of the big cloud providers? Tingy is based on a custom variant of CLIP, which is quite computationally expensive but worth it for the general querying ability.
        • avatar042 10 days ago
          Interesting idea. As others have suggested it would really help if you have an accompanying video to support the claims.

          A few thoughts/questions here:

          1. What markets and use-cases were you thinking of when building out this MVP? The applications could be broad enough, but it seems like you expect CLIP to handle bespoke query results and hope that they return a result that is relevant. Also what might be interesting to test if you search for something that doesn't exist in the video, can you handle that well-enough (assuming it's just a simple threshold you're picking to identify relevant search results)? 2. Licensing is something that has always piqued my curiosity when it comes to ML-based apps. Do you have a sense of the commercial-use for models such as CLIP, especially when the datasets that they were probably trained on were not permitted for commercial-use? This also applies to the raw video data uploaded by the user.

          • axg11 10 days ago
            Some potential markets:

            - home security

            - searching through long home videos

            - production companies with large video archives (this would require more tooling)

            I am unsure whether to focus on one of these groups or to go for a more generic tool. I'll add a video demo to the landing page. So far, for all the tests I've performed the ML model can generalize well enough to cover this range of uses.

            Licensing: I need to research this further. I'm not sure how the licensing changes due to the fact that I've also fine-tuned the model on my own data.

            • avatar042 10 days ago
              Thanks for the info on markets. What made you consider fine-tuning further on your own data? Was CLIP not sufficiently good enough to test the market?

              FWIW I recall having seen something similar with Google Cloud's Video Intelligence API (https://towardsdatascience.com/building-an-ai-powered-search...). Building something generic would make it especially hard to get right, especially if your users want high precision-recall from their search results.

              Re: licensing, the world of startups is somewhat of a wild-west these days with folks offering pre-trained models as-a-service without really thinking about the licensing implications (both on the dataset and model front). Huggingface is a classic example, and they seem to suggest that it's perfectly OK to fine-tune and use commercially (https://github.com/huggingface/transformers/issues/3357#issu...), but I'm not certain that their lawyers would put it the same way.

              • axg11 9 days ago
                Pre-trained CLIP gets you 95% of the way there, so you're correct, fine-tuning isn't necessary to test the market. The one downfall of pre-trained CLIP is that it hasn't been trained on still images from videos. These have a different noise characteristic and contain considerably more motion blur than your average image used for training.
          • davidatbu 10 days ago
            This looks dope! Any chance you'd be willing to share a bit about the core tech underneath? Ie, assuming this is neural net based, which architecture/paper/repo did you use? Did you have to do any training / finetuning? ..etc.

            I totally understand if you'd like to keep some/all of this secret, but I thought it's worth a shot :)

            • axg11 10 days ago
              For sure, here's a few tidbits:

              ML: fine-tuned CLIP model. Each video frame is embedded using CLIP and then the image embedding is compared against the text query embedding

              Architecture: everything is serverless using AWS lambda. Basic flow is: video upload to storage, lambda for converting video to still frames, perform ML inference on each frame, aggregate inference results and create customer output.

              • davidatbu 10 days ago
                Thank you so much for these pointers! I'm trying to keep up with the SoTA both in terms of research and production models in ML, and this is super helpful. Best of luck!
            • axg11 10 days ago
              Use the code HNDISCOUNT to get the first month free (100% off). Feel free to cancel anytime during the first month, you won't be charged.
              • rontoes 11 days ago
                Interesting idea - I would like to try it out.
                • axg11 11 days ago
                  Great! Please email support@tingy.video
                • XCSme 11 days ago
                  Allowing users to try a few searches before registering could increase your sells. In its current form, this is not a valid Show HN either, as you can't try the product without paying.
                  • axg11 11 days ago
                    Ah I didn’t realise that was the rule - I will work on adding a free trial