Ah yes. This has to do with the thresholding 'bug' I discovered sometime ago. I will update the algorithm soon. Happens with color images in the preprocessing stage during the conversion of query image to a binary image, especially in images with flat/palette colours.
I eventually came up with a contrived set of heuristics to tackle this problem as you can see in the example below and managed to get more get accurate thresholding more than 90% of the times for pathological cases like these with the right set of weights. --- https://imgur.com/a/XMhdnjH
It was the other way for me, I linked it to the duckduckgo icon svg and the horse started. I thought it was a loading animation (and took more than a few seconds) so I threw it to another monitor and continued reading HN.
...30 minutes later the horse is still running and I'm like 'wtf? what does a horse have to do with the DDG logo?' close tab. read comments...
It turns out the app doesn't handle svg (it is actually in the to do list) and returned a 500, but the failure was never presented to the user.
I scraped it from their website and then asked for their permission by sharing this link with them. They appreciated that I linked all the icons to their website and gave their consent to make this public.
The MPEG-7 dataset is what most researchers use to benchmark shape similarity algorithms. There are couple of other datasets that I used that I can't recollect now. These datasets are relatively simple with a single shape as opposed to logos, icons that comprise multiple elements in different configurations.
I would test on the MPEG-7 dataset to begin with and once the precision and recall values are good enough go ahead with testing on logos and icons. I must've manually tested the algorithm more than a 100,000 times probably because that was the only way to do with untagged datasets. Quite tedious indeed. This version gives out pretty decent results about 7-8 out of 10 times I'd say.
I suspect it has to do with the lower resolution. I'm using nearest neighbors interpolation for resizing images and have noticed similar behaviour before. Would be great if you can try with higher resolution versions(preferably > 200px) of the same images and let me know the results.
A closer inspection of the results actually shows some of the results aren't that bad a match. Results ordered 1, 4, 5, 7 and 7 in particular vaguely have the same outline as that of the query image. If I have to score this result, I wouldn't give it more than a 3 out of 10 for sure.
:) Please feel free to share the SVGs. I will convert them to PNGs and test them out. I will add SVG support real soon. Right now I've put an exception handler that passes an empty array as query if an image format that can't be decoded is thrown at it :|
If you are referring to entering those words in the searchbox, yes I should've put in some warnings/checks there to enter a valid image URL. Will fix it soon. And yes I should make the site secure too. Thanks for letting me know.
PS: You can explore company logos here http://compute.vision/brands/index.html . It's implemented using an older iteration of the algorithm and performance isn't that great compared to the one used with the icons database.