• mceachen 136 days ago

    Getting image and video orientation correct, which should be trivial, is decidedly not. In writing PhotoStructure [1], I discovered

    1) Firefox respects the Orientation tag using a standard [2] CSS tag, but Chrome doesn't [3]

    2) That videos sometimes use "Rotation", not "Orientation" (and Rotation is encoded in degrees, not the crazy EXIF rotation enum). Oh, and some manufacturers use "CameraOrientation" instead, just for the LOLs [4]

    3) That the embedded images in JPEGs, RAW, and videos sometimes are rotated correctly, and sometimes they are not, depending on the Make and Model of the camera that produced the file. If Orientation or Rotation say anything other than "not rotated," you can't trust what's in that bag of bits.

    [1] https://blog.photostructure.com/introducing-photostructure/

    [2] https://www.w3.org/TR/css-images-4/#image-notation

    [3] https://bugs.chromium.org/p/chromium/issues/detail?id=158753

    [4] https://exiftool-vendored.js.org/interfaces/makernotestags.h... (caution, large page!)

    • miahi 136 days ago

      And after all that, editing (simply rotating) the photo in different applications has different results. Some change only the Orientation tag, others change the actual data, some seem to change both so it's still incorrect when opened in other viewers; and then there's the embedded thumbnail (but that is rarely used), so the result is a mess.

      I'm interested in your PhotoStructure application, just subscribed to the beta!

      • mceachen 135 days ago

        When you rotate a photo or video in PhotoStructure, you can have it persist rotation/orientation by updating the file directly (PhotoStructure uses exiftool under the hood), but it's not the default out of concern for unknown bugs that may invalidate the original file in some way.

        By default it just writes the new orientation to an XMP or MIE sidecar. The downside of this approach is that most applications don't respect sidecars.

      • asveikau 136 days ago

        > That videos sometimes use "Rotation", not "Orientation" (and Rotation is encoded in degrees, not the crazy EXIF rotation enum)

        Not technically degrees. MP4 encodes it as a matrix. (Now, there are matrices corresponding to degree transformations...)

      • byuu 136 days ago

        This has caused me quite a bit of trouble with my image gallery as well, I've taken to just using exiftool to remove all rotation from my photos, which breaks half of them, and then manually hard-rotating them using ImageMagick, which is technically not a lossless operation if I'm understating it correctly.

        I really don't understand why it was decided that most photo viewing applications would honor EXIF rotation, but web browsers would not.

        • leni536 136 days ago

          Use jpegtran, it can rotate jpg images losslessly.


          • mceachen 136 days ago

            Make sure you rtfm carefully.

            Many years ago I accidentally deleted all the metadata from many images that needed rotating.

            • acqq 136 days ago

              > -copy comments

              > Copy only comment markers. This setting copies comments from the source file but discards any other data that is inessential for image display.

              > The default behavior is -copy comments.

              Argh! Thanks! So also my "lossless" transformations .. weren't.

        • goberoi 136 days ago

          For those interested in how EXIF is treated in other apps, a dated but informative article from 2012: https://www.daveperrett.com/articles/2012/07/28/exif-orienta...

          • jcims 136 days ago

            They should just put a gravity vector in the image.

            • chmullig 136 days ago

              I’m pretty sure iPhones do put it in. You should exif dump an iPhone XS image. INSANE what’s in there.

            • gregmac 136 days ago

              This would (maybe) help with a common thing my wife does when taking videos: Start recording in portrait mode, then realize you did that, and rotate the phone 90 degrees to get widescreen video (but without restarting the recording).

              When you play it back on a phone (with auto-orientation mode on), starting from holding the phone in portrait mode (as you normally do):

              * it starts playing back as portrait, which looks fine

              * the video rotates (because the camera was physically rotated), so now you're watching a widescreen video that's 90 degrees off

              * Your natural reaction is to flip the phone 90 degrees to make down "down" again, but this changes the phone into widescreen mode, and because it thinks it's playing a portrait-style video, it changes to portrait-in-widescreen mode, and now the video is again tilted 90 degrees but 1/3 the size with huge black bars on either side

              If you play it back on a computer/TV, you get the same end result: a widescreen video that's rotated 90 degrees, and 1/3 the size with huge black bars on either side.

              • taneq 136 days ago

                Can't help you with the video-taking technique but when playing back the videos, if you hold the phone so that the video matches the screen, then rotate it so the screen is upwards, you can then spin it so it looks the right way up as long as you keep the screen vertical enough.

                • PinguTS 136 days ago

                  In these cases I just enable rotation lock.

                  • ralfd 136 days ago

                    Haha! I guess one has to cut the video and rotate manually one part?

                  • microcolonel 136 days ago

                    Might as well put in acceleration and rotation while we're at it.

                    • edoceo 136 days ago

                      Pitch and yaw

                      • mceachen 136 days ago

                        You're joking, but many smartphones encode gyroscope readings in the metadata.

                        And uptime seconds (!?)

                        And estimated distance to subject, AGPS information, GPS acquisition time, depth field metadata, current battery level, operating system version, ...

                        • ronsor 136 days ago

                          We can use both sets of names, better yet, mix them!

                          • fooker 136 days ago

                            Ah, the famous ropitchaw vector!

                      • dotancohen 136 days ago

                        Won't help when filming in the ISS. Microgravity or not, I would like to view the result in the same orientation as the crewmember who filmed it.

                        • wccrawford 136 days ago

                          And how would the camera know without gravity?

                      • GWSchulz 136 days ago

                        “You can’t trust that bag of bits.” That made me giggle.

                        • olah_1 135 days ago

                          Have you submitted a post just about PhotoStructure yet?

                          I've been looking for a replacement for Google Photos. It's the only thing keeping me on Google products at this point.

                          • savolai 135 days ago

                            Shameless plug: JPEG Autorotate 3 allows you both view orientation of the raw jpeg data and preview how it looks with orientation data applied. :)


                            • martin-adams 136 days ago

                              Wow, your description of the problem of photo organisation is exactly. I've signed up for the beta!

                            • 6gvONxR4sf7o 136 days ago

                              One of my pet peeves in ML/stats/data science is people who hardly look at their data. Unless there are privacy reasons not to, then you really need to look at some data. You'll learn so much more from looking at a few hundred samples than you will from different metrics. You'll get a feel for how complex the problem is, or whether something simple will do. Check your assumptions. You might even realize that your images are sideways.

                              • pinouchon 136 days ago

                                As Karpaty said:

                                1. Become one with the data.

                                The first step to training a neural net is to not touch any neural net code at all and instead begin by thoroughly inspecting your data. This step is critical. I like to spend copious amount of time (measured in units of hours) scanning through thousands of examples, understanding their distribution and looking for patterns. Luckily, your brain is pretty good at this. One time I discovered that the data contained duplicate examples. Another time I found corrupted images / labels. I look for data imbalances and biases. I will typically also pay attention to my own process for classifying the data, which hints at the kinds of architectures we’ll eventually explore. As an example - are very local features enough or do we need global context? How much variation is there and what form does it take? What variation is spurious and could be preprocessed out? Does spatial position matter or do we want to average pool it out? How much does detail matter and how far could we afford to downsample the images? How noisy are the labels?

                                source: http://karpathy.github.io/2019/04/25/recipe/

                                • Izkata 136 days ago

                                  It's not directly related to this topic, but my favorite example of this sentiment is Anscombe's Quartet [0], four sets of datapoints that have (almost) the same statistical values, but a very obviously different layout when simply viewed together on a graph.

                                  Plus an animated version that includes a T-Rex [1].

                                  [0] https://en.m.wikipedia.org/wiki/Anscombe%27s_quartet

                                  [1] https://www.autodeskresearch.com/publications/samestats

                                  • JorgeGT 135 days ago

                                    Off topic, but each time I accidentally open the Wikipedia mobile layout in the desktop I'm amazed at how much better it is. Is there a way to always redirect to the mobile version?

                                • avip 136 days ago

                                  That’s what my first numeric mentor taught me. You have to look at raw data. The first q he’d ask any programmer was “did you look at the data, the actual data?”. He was a PhD in Physics and his approach really sticked with me.

                                  But it’s not always straightforward to “look at data”

                                  • bonoboTP 136 days ago

                                    And also at the results and the intermediate steps. Visualize everything. Don't just look at the final evaluation metric but dive in and see where things go wrong.

                                    Do just a few bad predictions skew your score? What does the best prediction look like? What does the worst look like?

                                    Are all your results just shifted by 2 pixels to the left due to some bug? Are there mislabeled examples in the test set? Etc. etc.

                                    • barry-cotter 136 days ago

                                      Stuck. Everything else is perfect idiomatic English.

                                  • mlthoughts2018 136 days ago

                                    Just looking at data and descriptive statistics is one of the first things a person is taught in machine learning, data science and statistics coursework. It’s a major skill in the field that is emphasized all the time.

                                    Practitioners frequently do cursory data analysis and data exploration to gain insight into the data, corner cases and which modeling approaches are plausible.

                                    Just to give some examples, Bayesian Data Analysis (Gelman et al), Data Analysis Using Regression and Multilevel/Hierarchical Models (Gelman and Hill), Doing Bayesian Data Analysis (Kruschke), Deep Learning (Goodfellow, Bengio, Courville), Pattern Recognition and Machine Learning (Bishop) and the excellent practical blog post [0] by Karpathy all list graphical data checking, graphical goodness of fit investigation, descriptive statistics and basic data exploration as critical parts of the model building workflow.

                                    If you are seeing people produce models without this, it’s likely because companies try to have engineers do this work, or hire from bootcamps or other sources that don’t produce professional statisticians with grounding in proper professional approach to these problems.

                                    When people mistakenly think models are commodities you can copy paste from some tutorials, and don’t require real professional specialization, then yes, you get this kind of over-engineered outcome with tons of “modeling” that’s disconnected from the actual data or stakeholder problem at hand.

                                    [0]: https://karpathy.github.io/2019/04/25/recipe/

                                    • anoncareer0212 136 days ago

                                      The article explains why it's practically impossible to see that (check your assumptions! :P )

                                      • ken 136 days ago

                                        It’s easy to print out the dimensions of the image when it’s loaded. I always log the size (pixels, rows, etc) of any data I load.

                                        • m12k 136 days ago

                                          Well, after you've read in your data, read them back out and display them on the screen.

                                          • toxik 136 days ago

                                            This actually is generally true for any computer programming debugging. You need to look at the data as it appears to your program, to catch bugs in your preprocessing pipelines.

                                      • dimatura 136 days ago

                                        A critical piece of wisdom I find myself repeating to younger students, handed down from an advisor: "always look at the pixels". (Sometimes replacing pixels with voxels/lidar points/meshes, etc.)

                                        • gbrown 136 days ago

                                          Parallel issue: people who fit complex models when simple descriptive statistics would be more useful/a better fit.

                                          • mlthoughts2018 136 days ago

                                            This is a byproduct of hiring bootcamp grads or tasking a modeling project to engineers who read some tutorials. People think they can scan a few Jupyter notebooks and then professionally solve statistics problems.

                                            People wonder why it’s hard and expensive to hire ML engineers... because they actually solve these problems with craft. Meaning, they systematically grow understanding of the data, start with simple models, and have well articulated reasons explaining cases when complexity is justified.

                                            • ma2rten 136 days ago

                                              It's the machine learning version of people who use hadoop if they could have used command-line utilities.

                                            • heyyyouu 136 days ago

                                              Yes, thank you. This can't be stated enough or loudly enough! You absolutely have to manually check, then check it again -- even with privacy issues, find a way to be non-identifiable and check it. Data is such a huge factor in whether your project will fail or not, yet too many people don't give it the respect (and dare I say love) it deserves.

                                              • faceshapeapp 136 days ago

                                                For one of my projects, I looked and manually labeled thousands of pictures, yet this issue never came up until people were actually using pictures coming from their smartphones.

                                                • AceJohnny2 136 days ago

                                                  somehow, the Allegory of the Cave comes to mind... ;)

                                                  • tomaskafka 135 days ago

                                                    Yes. It's a method I use:

                                                    Did you look at the actual data?


                                                    Then they are wrong. Find a way to cheaply visualize data/state and call me back when you're done.

                                                    • rajacombinator 136 days ago

                                                      100%. One of the most common mistakes too.

                                                    • enlyth 136 days ago

                                                      I ran into this headache while writing a web app.

                                                      I made an image upload widget that provided a preview, and when users selected the "take a picture" option on their phones, I showed the preview with a blob link and CSS background-image property. The images were showing sideways on some phones.

                                                      I looked at the EXIF data of those photos and of course Orientation: 90 showed up.

                                                      It was easy to fix on the backend when processing the images but I struggled to do it in a performant way on the front-end. One solution involved reading the EXIF data with JS and rotating with canvas and .toBlob(), but proved too slow for large 10MB photos as it blocks the main UI thread.

                                                      One thing I thought of is just reading the orientation and the using CSS transforms to rotate the preview, but I never got around to trying it.

                                                      • paultopia 136 days ago

                                                        This shows up even in basic websites! My partner, who is an artist, ended up with some of her portfolio broken seemingly at random in different browsers---and for some of it, "up" wasn't visually obvious and she'd rotated some of the images in apps in addition (preview, adobe things, etc.), so there wasn't a good way to just change everything. ended up having to do a ton of work to strip exif data on an image by image basis.

                                                        Basically entirely because of this ridiculous issue where chrome refuses to respect exif tags.[1]

                                                        [1] https://stackoverflow.com/questions/42401203/chrome-image-ex...

                                                        • kyle-rb 136 days ago

                                                          Squoosh.app, a webapp from Google for compressing and resizing images, has an article about using WebAssembly for exactly this purpose: https://developers.google.com/web/updates/2019/02/hotpath-wi...

                                                          • acomjean 136 days ago

                                                            "In squoosh we wrote a JavaScript function that rotates an image buffer by multiples of 90 degrees. While OffscreenCanvas would be ideal for this, it isn't supported across the browsers we were targeting, and a little buggy in Chrome."

                                                            Wait, I know its a big company, but you're at the same company...

                                                          • RandallBrown 136 days ago

                                                            Your second approach is almost exactly what I've done in an iPhone video editing app I've been working on to deal with video previews.

                                                            Rather than reencode the video just to preview it, I can just apply the same transforms (scale, rotation, translation) to the view that's displaying the preview. I then mask that view with another view of the same size so it doesn't go outside the edges.

                                                            Of course I still need to encode the video with those transforms if I want it to show up in their camera roll later.

                                                            • Waterluvian 136 days ago

                                                              Try web workers! I've been doing all kinds of crazy expensive spatial data validation in many parallel threads. It's amazing. I think off screen canvas in workers is gaining browser support recently.

                                                              • bschwindHN 136 days ago

                                                                If you don't need to actually modify the image data you shouldn't be doing it at all. The correct solution is to specify a rotational transform so the hardware will do the heavy lifting on the GPU where this kind of computation belongs.

                                                                • inetknght 136 days ago

                                                                  > Try web workers!

                                                                  NO, please do not try web workers. I don't want any more running on my device than absolutely necessary.

                                                                  • Franciscouzo 136 days ago

                                                                    That ship has sailed, if you don't want people running arbitrary code on your device, you should disable JavaScript

                                                                  • BubRoss 136 days ago

                                                                    You don't want to use all your cores to run things faster?

                                                                    • dralley 136 days ago

                                                                      It would be nice if they only made things faster. In practice it seems like they use all my cores to do more things, things that I don't want or care about them doing.

                                                                      • inetknght 136 days ago

                                                                        Nope. I want things to look plain and simple. I want to use fewer cores for longer battery life.

                                                                        • BubRoss 136 days ago

                                                                          The faster a CPU finishes, the faster the CPU can sleep, which is how you save battery life. The more cores you use the lower the CPU frequency can be, which saves power, since frequency increases do not use power linearly.

                                                                          I think it goes without saying that javascript workers have nothing to do with the design and layout of a webpage.

                                                                          • inetknght 135 days ago

                                                                            > The faster a CPU finishes, the faster the CPU can sleep, which is how you save battery life.


                                                                            > The more cores you use the lower the CPU frequency can be, which saves power, since frequency increases do not use power linearly.

                                                                            More cores in use has little to do with frequency and more to do with heat. More heat means more thermal throttling which lowers frequency. Lower frequency means that the CPU doesn't sleep sooner.

                                                                            > I think it goes without saying that javascript workers have nothing to do with the design and layout of a webpage.

                                                                            Yup. That's exactly why I don't want them. Why should I execute something which doesn't, and shouldn't, have anything to do with rendering page content?

                                                                            Don't get me wrong, I'm fine with using more cores if it's actually beneficial. But every use I've ever seen for a web/service/javascript worker has always been user hostile by taking what should be done on a server and offloading it onto the user's device instead.

                                                                            • BubRoss 135 days ago

                                                                              Using all your cores for the same workload would mean it finishes faster or finishes in the same time with significantly lower frequency. It saves power and heat. Your example would mean using more cores for the same amount of time, which makes no sense in this comparison.

                                                                              Do you also buy single core computers to save power?

                                                                              It is clear that what you are saying has nothing to do with javascript features at all and just boils down to not liking bloated web pages.

                                                                  • missblit 136 days ago

                                                                    Was rotating with CSS transforms an option?

                                                                    That way you'd only need to read the EXIF data, but wouldn't need to go all the way to rotating the pixels yourself.

                                                                    • dgritsko 136 days ago

                                                                      Have run into this myself when building the same "preview upload" feature. It's annoying that it's not something that is just handled by the browser, it feels like it should be a supported feature of the "img" tag or something.

                                                                      • robocat 136 days ago

                                                                        > One solution involved reading the EXIF data with JS and rotating with canvas and .toBlob()

                                                                        At my previous job I did the same thing, although I never noticed a significant slowdown. I also made the file size smaller since we wanted to have predictable upload times and mitigate excessive usage of storage space.

                                                                        The other reason was that the EXIF data was wierd on some devices and the back end library didn't rotate them correctly.

                                                                        • acomjean 136 days ago

                                                                          I had this problem with the artist open studio website.

                                                                          Oddly it seems like it wasn't as big a problem until cell phones. ( I honestly couldn't tell which way is up for some images, being abstract.)

                                                                          we ended up using JS library called "croppie", but I'm not sure it helps with large images.

                                                                          • avip 136 days ago

                                                                            As we speak I need a file upload widget that will reject some images based on exif. What’s your recommendation for client side js exif extraction?

                                                                          • LeoPanthera 136 days ago

                                                                            ImageMagick will "apply" the exif rotation to the image data for you.

                                                                                convert input -auto-orient output
                                                                            Or in place:

                                                                                mogrify -auto-orient *
                                                                            • rkagerer 136 days ago

                                                                              Even outside of machine vision this issue has caused me all sorts of headaches. Especially when switching between tools that honor or ignore the tag.

                                                                              "But the tricky part is that your camera doesn’t actually rotate the image data inside the file that it saves to disk."

                                                                              Cameraphones are so powerful these days. I don't understand why the app can't simply include a behavior setting to always flip the original pixels.

                                                                              • joosters 136 days ago

                                                                                It might not be up to the app? If the phone has hardware-accelerated JPEG compression, then potentially the image will already be compressed in the 'wrong' orientation before the app gets its hands on it. So rotating the image data could involve re-compressing the image again, leading to quality loss. Or if you choose to get the raw sensor data instead, and rotate the data before doing the compression yourself, you might lose out on the hw acceleration entirely.

                                                                                (I've not developed any camera apps, so this is just a guess!)

                                                                                • pwg 136 days ago

                                                                                  > So rotating the image data could involve re-compressing the image again, leading to quality loss.

                                                                                  With JPEG, no, 90 degree rotations can be accomplished in a lossless manner.

                                                                                  See jpegtran from the libjpeg library. A version of its manpage is here:



                                                                                  "... It can also perform some rearrangements of the image data, for example turning an image from landscape to portrait format by rotation.

                                                                                  jpegtran works by rearranging the compressed data (DCT coefficients), without ever fully decoding the image. Therefore, its transformations are lossless: there is no image degradation at all, which would not be true if you used djpeg followed by cjpeg to accomplish the same conversion. ..."

                                                                                • kllrnohj 136 days ago

                                                                                  Because the data sources are unrelated. The camera sensor is hooked up to the ISP which is hooked up to a hardware JPEG encoder. This is necessary in order to get those hyper fast shots off.

                                                                                  You'll notice that an orientation sensor is no where in that list. So what happens is the camera hardware spits out a JPEG. The app then combines it with the orientation sensor & produces the EXIF headers. It could choose to decode, rotate, re-encode, but that's slow (~100ms) and hurts shot-to-shot latency. And it loses quality. And, hey, since everything supports EXIF orientation anyway, why bother?

                                                                                  • kchamplewski 136 days ago

                                                                                    > It could choose to decode, rotate, re-encode

                                                                                    Or it could simply rotate without decoding or re-encoding, which has the added advantage of being lossless.

                                                                                    Obviously it's still added processing time and (probably more importantly) development time, so it's generally not worth bothering, however it's important to point out that JPEG rotation can (in the case of 90 degree increments) be done losslessly.

                                                                                  • not2b 136 days ago

                                                                                    The phone is accurately recording the image, as well as the orientation. The bug happens when the EXIF information is stripped. Sure, phone apps could add an option to physically rotate the image, as a bug workaround, but it's not surprising that they don't do this.

                                                                                    • uoaei 136 days ago

                                                                                      It is surprising, because it seems like such an obvious fix to the problem of stripped EXIF data. Couldn't be that hard to implement a user setting which tells the app to rotate the file itself before saving.

                                                                                      This is a major oversight for the companies who develop camera apps. The major ones even have whole teams dedicated to that single app.

                                                                                      • cookingrobot 136 days ago

                                                                                        It would be trivial to rotate a bitmap with no loss of quality, but you can’t rotate an already compressed jpg without changing the image itself, which reduces quality.

                                                                                        • mrob 136 days ago

                                                                                          Assuming the image size is a multiple of the block size (8x8, 16x8, or 16x16 depending on the chroma subsampling), lossless rotation of JPEGs is possible, e.g. with jpegtran:


                                                                                  • on_and_off 136 days ago

                                                                                    Funny, it looks like a common headache.

                                                                                    Lots of other complaints in this thread and also something I have encountered on Android with a small side feature allowing users to upload some pictures.

                                                                                  • jeroenhd 136 days ago

                                                                                    I'm not sure how good your algorithm is if it can only work in a specific orientation. A slightly tilted image can already cause problems for such an algorithm and many people have a hard time getting a picture with only a few degrees of rotation.

                                                                                    It would probably be better to deal with this in an elegant way, e.g. set up the algorithm to work regardless of orientation. This seems like a (mostly) solved problem: https://d4nst.github.io/2017/01/12/image-orientation/

                                                                                    • gugagore 136 days ago

                                                                                      You might not want rotation invariance in your algorithm. Our own perceptual system is not rotation-invariant, most obviously for faces in this illusion: https://en.wikipedia.org/wiki/Thatcher_effect

                                                                                      • H8crilA 136 days ago

                                                                                        Forcing rotation in the training phase would be a good way to regularize the network. But of course that will lower the accuracy.

                                                                                        • cookingrobot 136 days ago

                                                                                          That’s not an option if you’re trying to detect left-arrows vs right-arrows, on street signs for ex.

                                                                                          • jeroenhd 136 days ago

                                                                                            Depends. Based on the linked article, the picture can be completely upside down if it was taken in landscape mode. The article I linked is capable of rotating it upright regardless. After rotating to the right orientation, left and right are suddenly perfectly workable.

                                                                                            Of course using exif data for such rotations is easier, but a tilted picture of a tilted sign can create a lot of tilt that human vision copes with fine but an orientation dependent network cannot.

                                                                                        • sandos 136 days ago

                                                                                          If the algoritm fails with a rotated image, then I would claim its a bad algorithm that is over-fit and not at all generalising what it has learnt. What about slight rotation? Where does it start failing, and would a human?

                                                                                          Also, making an algorithm for detecting rotated images should be easy if it affect the results so much.

                                                                                        • planis 136 days ago

                                                                                          Topic of image orientation always reminds me about this article: https://www.daveperrett.com/articles/2012/07/28/exif-orienta...

                                                                                          • quietbritishjim 136 days ago

                                                                                            One of the common image processing libraries does take into account EXIF rotation: OpenCV [1] I would tend to use that over manually rotating using the code from the article. Although beware that OpenCV cannot open quite as many formats as Pillow, most notably it cannot open GIFs due to patent issues. You can get a prebuilt wheel of OpenCV by pip installing the package opencv-python [2]

                                                                                            [1] https://docs.opencv.org/trunk/d4/da8/group__imgcodecs.html#g...

                                                                                            [2] https://pypi.org/project/opencv-python/

                                                                                            • mark-r 136 days ago

                                                                                              There's a big photography site I use that treats the EXIF inconsistently. I rotate the image in my editor to work on it, then save and upload. In some contexts it looks OK, but in other contexts the site rotates the image again and it's wrong. I don't want to strip the EXIF because it has interesting information such as the camera model and lens, and the exposure settings. My editor doesn't correct the EXIF rotation setting, so I have no choice but to use a utility to strip that single value from the file before I upload it.

                                                                                              • eggie5 136 days ago

                                                                                                CNNs are translation and scale invariant thanks mostly to the pooling operation. Good data augmentation (rotating images for example) would have build a model more robust to this effect.

                                                                                                • Fission 136 days ago

                                                                                                  This is functionally a coordinate system problem. Thankfully, it's pretty easy here. Just wait until we start getting more models for 3D data. I worked with 3D data for the longest time, and it was incredibly painful — different libraries can use wildly different coordinate systems (e.g. I've seen the up-direction be +z, -z, +y, and -y). At that point, it's nontrivially difficult to even figure out what the right way to convert between coordinate systems is.

                                                                                                  • m3kw9 136 days ago

                                                                                                    Train a model that recognizes L,R, or U side way images to reorient them all. Run it once.

                                                                                                    • anilakar 136 days ago

                                                                                                      Some websites also strip EXIF data from user submitted content but do not rotate the image to match the now-missing information about which side is up.

                                                                                                      • layoutIfNeeded 136 days ago

                                                                                                        Why don’t you train your classifiers to be orientation-agnostic? My brain certainly had no problem recognizing the geese on the sideways image...

                                                                                                        • gwern 136 days ago

                                                                                                          Data augmentation can have unwanted consequences. For example, horizontal or vertical flipping, what could be more harmless? You can still recognize stuff when it's upside-down, can't you? It's a great data augmentation... Unless, of course, your dataset happens to involve text or symbols in any way, such as the numbers '6' and '9', or left/right arrows in traffic signs, or such obscure letters as 'd' and 'b'.

                                                                                                          • thaumasiotes 136 days ago

                                                                                                            > Unless, of course, your dataset happens to involve text or symbols in any way, such as the numbers '6' and '9', or left/right arrows in traffic signs, or such obscure letters as 'd' and 'b'.

                                                                                                            If your dataset consists of nothing but isolated 'd's and 'p's in unknown orientation, you won't be able to classify them correctly because that is an impossible task. But it would be more common for your dataset to consist of text rather than isolated letters, and in the context of surrounding text it's easy to determine the correct orientation, and therefore to discriminate 'd' from 'p'.

                                                                                                            • gwern 132 days ago

                                                                                                              So it's not a problem, except when it is. Good to know.

                                                                                                              Incidentally, how does that work for mirroring, when all that surrounding text gets mirrored too? (Consider the real example of the lolcats generated by Nvidia's StyleGAN, where the text captions are completely wrong, and will always be wrong, because it looks like Cyrillic - due to the horizontal dataflipping StyleGAN has enabled by default...)

                                                                                                          • pgeorgi 136 days ago
                                                                                                            • bacon_waffle 136 days ago

                                                                                                              Interestingly, your brain actually isn't orientation agnostic: http://nancysbraintalks.mit.edu/video/what-you-can-learn-stu...

                                                                                                              • 6gvONxR4sf7o 136 days ago

                                                                                                                Why make it harder than it has to be?

                                                                                                                • variaga 136 days ago

                                                                                                                  I'd say you're not making the problem "harder" - rather, requiring that object detection be orietation-agnostic makes the problem exactly as hard as the problem actually is. Allowing the network to train only on images with a known, fixed, (correct) orientation makes the object detection too _easy_, so the network results will likely fail if you feed it any real-world data.

                                                                                                                  i.e. you should be training your image application with skew/stretch/shrink/rotation/color-pallete-shift/contrast-adjust/noise-addition/etc. applied to all training images if you want it to be useful for anything other than getting a high top-N score on the validation set.

                                                                                                                  • nightfly 136 days ago

                                                                                                                    A goose is still a goose, even if not viewed from the "normal" orientation.

                                                                                                                  • klyrs 136 days ago

                                                                                                                    Yeah... this article really looks like blame-shifting to me. I'm imagining a future wherein we have bipedal murderbots, but we're still training AI with "properly oriented" images... a bot trips over a rock, and starts blasting away at mid-identified objects.

                                                                                                                    • TeMPOraL 136 days ago

                                                                                                                      Well, even humans find it more difficult to process upside-down faces or objects. There's power in assumptions that are correct 99% of the time.

                                                                                                                      Regardless, the article isn't really shifting blame, in so much as explaining what's happening in the real world, with the real tools. The tools don't care about EXIF. Consumer software uses EXIF to abstract over reality. A lot of people playing with ML don't know about either.

                                                                                                                      • princeb 136 days ago

                                                                                                                        you may not know what face it is, but you do know that it is a face right? it's like saying you know someone is in front of you if he were standing in up, in your clear view, but the moment he's lying on a couch in centre of your vision you can't tell with certainty if the thing on the couch is a human being?

                                                                                                                        • TeMPOraL 136 days ago

                                                                                                                          I could tell, but I'm under impression that my 4 months old kid still has problems with that, or at least had when she was 2 months old.

                                                                                                                          I think this is closer to the performance we should expect of current neural models - a few months old child, not an adult. NNs may be good at doing stuff similar to what some of our sight apparatus does, but they're missing additional layers of "higher-level" processing humans employ. That, and a whole lot of training data.

                                                                                                                  • peterhj 136 days ago

                                                                                                                    Bonus round:

                                                                                                                    (1) Find the ImageNet (ILSVRC2012) images possessing EXIF orientation metadata.

                                                                                                                    (2) Of those images, find which ones have the "correct" EXIF orientation.

                                                                                                                    • yeldarb 135 days ago

                                                                                                                      Has anyone done this yet? I'd be curious to know if their dataset is "cleaned" or not.

                                                                                                                      • peterhj 135 days ago

                                                                                                                        Spoiler, ish:

                                                                                                                        The last time I measured ImageNet JPEGs with EXIF orientation metadata, the number of affected images was actually quite small (< 100, out of a dataset of 1.28M). There are also some duplicates, but altogether it seems fairly "clean."

                                                                                                                    • spullara 136 days ago

                                                                                                                      When I built a consumer site that processed tons of photos I ran into this all the time. Ended up doing it all myself by parsing the exif data and doing the rotation. Also ended up writing some pretty extensive resize code that worked much better than what was built-in and more like the great scaling effects you see in Preview.app.

                                                                                                                      • kccqzy 136 days ago

                                                                                                                        It always bothers me that many libraries still use nearest-neighbor to resize images instead of some kind of Lanczos filtering.

                                                                                                                      • unnouinceput 136 days ago

                                                                                                                        Hi Adam, if you read this, you have a small mistake.

                                                                                                                        Your quote: "This tells the image viewer program that the image needs to be rotated 90 degrees counter-clockwise before being displayed on screen"

                                                                                                                        should be:

                                                                                                                        "This tells the image viewer program that the image needs to be rotated 90 degrees clockwise before being displayed on screen". Cheers.

                                                                                                                      • theon144 135 days ago

                                                                                                                        >Most Python libraries for working with image data like numpy, scipy, TensorFlow, Keras, etc, think of themselves as scientific tools for serious people who work with generic arrays of data. They don’t concern themselves with consumer-level problems like automatic image rotation — even though basically every image in the world captured with a modern camera needs it.

                                                                                                                        What a snotty attitude. The tools are already complex enough to take on responsibilities of parsing the plethora of ways a JPEG can be "rotated". This thread is a testament to the non-triviality of the issue and I certainly don't want a matrix manipulation or machine learning library to bloat up and have opinions on how to load JPEGs just so someone careless out there can save a couple lines.

                                                                                                                        • kwhitefoot 135 days ago

                                                                                                                          If your computer vision application can't even recognize which way up a picture like the ones in the article is then surely you have bigger problems then EXIF orientation.

                                                                                                                          As others have said: what about pictures that are simply not aligned with the horizon?

                                                                                                                          • sicariusnoctis 135 days ago

                                                                                                                            All the top architectures are not rotation-invariant. I think some of the blame lies in how successful CNNs are (as Hinton claims).

                                                                                                                          • faceshapeapp 136 days ago

                                                                                                                            I ran into this issue while developing https://www.faceshapeapp.com where user uploads their photo and face detector is run on top of the picture. Shortly after launching about 10% of users complained having rotated images, after some debugging, I discovered that it was due to exif rotation.

                                                                                                                            You could parse the exif and rotate the image using canvas, but thankfully, there's already a JS library which does it for you: https://github.com/blueimp/JavaScript-Load-Image.

                                                                                                                            • NeoBasilisk 136 days ago

                                                                                                                              Would it really be too much of a burden for phones to just save photos at the correct orientation now? I understand the hardware limitations that were present in 2007, but surely these can't still be a factor?

                                                                                                                              • ryandrake 136 days ago

                                                                                                                                Is there any good reason to save a photo in an orientation that does not match the orientation that the device is being held in? Shouldn’t up be up? If that results in a NxM photo, save it NxM. If it results in a MxN photo save it that way!

                                                                                                                                The only edge case is a camera pointed straight up or straight down. Or a camera in space.

                                                                                                                                • TeMPOraL 136 days ago

                                                                                                                                  Correct orientation is relative to the photographer, not to the ground. Most of the times the photographer is in the usual, vertical orientation but sometimes you really want to take a shot at an angle, and you have to fight with your phone to do it correctly. I really dislike these kinds of "convenience" optimizations.

                                                                                                                                  • uoaei 136 days ago

                                                                                                                                    Coming back to ancient software design principles:

                                                                                                                                    > The user is always right

                                                                                                                                    Let them have a setting. It's really that easy.

                                                                                                                                  • anchpop 136 days ago

                                                                                                                                    > Or a camera in space.

                                                                                                                                    Or freefall

                                                                                                                                  • kllrnohj 136 days ago

                                                                                                                                    Would it really be too much of a burden for python image libraries to decode jpegs to the correct orientation? I understand lack of knowledge about exif in the 2000s, but surely these days that can't still be a problem?

                                                                                                                                  • jonshariat 136 days ago

                                                                                                                                    Isn't it better to train the model that way though? In fact it might be good to load every image in different orientations to add to the training data.

                                                                                                                                    • lolc 136 days ago

                                                                                                                                      Only goes to show how far we have to go still when computers don't even tilt their heads automatically when given rotated input.

                                                                                                                                      • mister_hn 136 days ago

                                                                                                                                        Well, if you are using the input Image as JPEG. The information is stored in a tag, you can remove it with any kind of language. There's also a C++ library for it: https://github.com/mayanklahiri/easyexif

                                                                                                                                        • octorian 136 days ago

                                                                                                                                          I should add that this very problem also exists with video. What makes it worse is that "smartphone" video apps started regularly using video orientation metadata years before desktop video apps even seemed to acknowledge that it was a thing.

                                                                                                                                          • BEEdwards 136 days ago

                                                                                                                                            It's the simplest way to do it and has so few downsides that it's not going to change until someone invents a googly eye camera that rotates with the phone.

                                                                                                                                          • brlewis 136 days ago

                                                                                                                                            As an alternative to the python code in the article, there's this command-line tool: https://linux.die.net/man/1/exiftran

                                                                                                                                            • superjan 135 days ago

                                                                                                                                              I just could not stop laughing reading this. It does remind me, that my professor once showed us a slide of the coastline of africa, which none of us recognized until he rotated it to the correct orientation.

                                                                                                                                              • UglycupRawky 136 days ago

                                                                                                                                                Who thought Exif Orientation was a good idea? It's a needless complication of the file format. It's not like iPhones (or any camera) lacks the computing power to rotate the image before saving it.

                                                                                                                                                • bradfa 135 days ago

                                                                                                                                                  Wikipedia says EXIF was released in 1995 (https://en.wikipedia.org/wiki/Exif). If you were shooting with a DSLR, say 6 Mega pixel with 8 bits per pixel, the raw output would be 18MB in size (https://en.wikipedia.org/wiki/Kodak_DCS_400_series). In order to rotate this raw 6Mp image you would need 36MB of RAM (input and output buffers of the same size, non-overlapping). Then, after the rotation you could perform JPEG, so that the rotation is lossless. Finally, you could store the JPEG image to disk.

                                                                                                                                                  36MB of RAM just for raw image buffers would have been quite expensive in 1995. Simply tagging some extra data onto the image to say which orientation it should be presented in takes almost no extra memory or processing within the camera, some big desktop PC could easily rotate the uncompressed JPEG to perform a "lossless" rotation after the fact (ie: uncompress JPEG in wrong orientation, rotate, present to user).

                                                                                                                                                  • bradfa 135 days ago

                                                                                                                                                    Technically, you wouldn't need a full 18MB for the output buffer so long as you perform the JPEG in-line with the rotation and are willing to deal with slicing the image into swaths. So in theory you could get away with like a 1MB output buffer but then your rotation time would depend on your JPEG timing and you couldn't take another picture with the main raw buffer until rotation and JPEG were both complete. It's a tradeoff, time versus memory.

                                                                                                                                                • pequalsnp 135 days ago
                                                                                                                                                  • ensiferum 136 days ago

                                                                                                                                                    Maybe the first step before doing image classification or object detection images is to do a image rotation detector ;)

                                                                                                                                                    • rhizome 136 days ago

                                                                                                                                                      Sure, that would reduce the problem space by one, but that ain't the reason CV doesn't work.

                                                                                                                                                      • raveenb 136 days ago

                                                                                                                                                        some of the object detection ai systems do actually rotate the images in 4 to 8 ways like in fast.ai. when trained with such input data, the ai detects the object in almost all orientations.

                                                                                                                                                        • yellowapple 136 days ago

                                                                                                                                                          Sounds like the training sets should include more rotated samples.

                                                                                                                                                          • moonbug 136 days ago

                                                                                                                                                            that or train models to be rotationally invariant.

                                                                                                                                                            • secraetomani 136 days ago

                                                                                                                                                              Surely someone can create a neural network that reorients the image, even if the Exif orientation is wrong (JPEG lets you can do this without re-encoding). Sounds like it should be a very simple problem for the vast majority of "regular" images.

                                                                                                                                                              • machinelearning 136 days ago

                                                                                                                                                                Or how about just use the damn exif information to orient it correctly as the article outlines? The actual article is far less interesting than what the title seems to imply. While the title suggests a problem with computer vision. This is more of a programmer logic error.

                                                                                                                                                                • magicalhippo 136 days ago

                                                                                                                                                                  There are many times my phone detects I'm holding it horizontally while taking a picture, but I'm actually trying to take a vertical picture, and vice versa.

                                                                                                                                                                  For those images, the exif information is technically correct, but actually wrong.

                                                                                                                                                                  • SiVal 136 days ago

                                                                                                                                                                    EXIF info is often missing or wrong. Photos that have been passed around, going through various services, apps, scanners, screenshots, etc. along the way, frequently have their EXIF info stripped (e.g., for privacy when exported or uploaded), written incorrectly originally (e.g., you scan an old B&W print, put it in sideways to fit in the scanner, and the scanner invisibly includes EXIF data assuming it was correctly oriented), or rewritten incorrectly (e.g., stripped EXIF is replaced with default orientation).

                                                                                                                                                                    • machinelearning 136 days ago

                                                                                                                                                                      If exif information is incorrect, that's just programmer error at the firmware/library level.

                                                                                                                                                                      • ska 136 days ago

                                                                                                                                                                        exif is a bit of a swamp, and may well be missing entirely.

                                                                                                                                                                        • secraetomani 136 days ago

                                                                                                                                                                          I've encountered plenty of pictures with incorrect EXIF orientation, mostly caused by holding the phone at a weird angle.

                                                                                                                                                                          For example, if I hold the phone parallel with the ground (aimed at the floor), what is the correct EXIF orientation?

                                                                                                                                                                        • fireattack 136 days ago

                                                                                                                                                                          JPEG lossless rotation can only be done if the dimensions (height and width in px) of the image are multiples of MCU (typically 8x8 for 4:4:4 or 16x16 for 4:2:0).

                                                                                                                                                                          • zaat 136 days ago

                                                                                                                                                                            For the uninitiated, MCU is not Marvel Cinematic Universe as the search results might try to confuse you. It's Minimum Coded Units, and a nice explanation is over here:


                                                                                                                                                                            Edit: the link also closed for me a forgotten open loop about the meaning of an annoying error message Windows XP era. Thanks for making me look it up!

                                                                                                                                                                          • klyrs 136 days ago

                                                                                                                                                                            Sure, that's a good enough excuse. I look forward to defeating government facial recognition algorithms by holding my head at an angle.

                                                                                                                                                                          • swiley 136 days ago

                                                                                                                                                                            Better (in this case) would be to use algorithms that take advantage of rotational symmetry. What do you do when the phone is at 45 degrees?

                                                                                                                                                                          • Kenji 136 days ago

                                                                                                                                                                            I hold the - apparently controversial - view that if your ML algorithm cannot detect a clean picture of a goose when you rotate it by 90°, your ML algorithm is garbage. How is this of any use when it's so easily fooled?

                                                                                                                                                                            • logicallee 136 days ago

                                                                                                                                                                              it's kind of funny, but makes sense. if you think about it the gold standard for computer vision is humans but people are pretty bad at reading upside down text (if you're on mobile you can try it but don't forget to lock orientation first!). the same is true of upside down images as the famous "Thatcher effect" illusion shows: http://thatchereffect.com/ shows it pretty clearly.

                                                                                                                                                                              • irascible 136 days ago

                                                                                                                                                                                Why isn't a "detect and fix orientation" NN, the first stage in the machine vision pipeline?

                                                                                                                                                                                • fake_satire 136 days ago

                                                                                                                                                                                  "Computer Science Expert".



                                                                                                                                                                                  • 08027470333 135 days ago


                                                                                                                                                                                    • 08027470333 135 days ago


                                                                                                                                                                                      • 08027470333 135 days ago


                                                                                                                                                                                        • 08027470333 135 days ago


                                                                                                                                                                                          • codesushi42 136 days ago

                                                                                                                                                                                            Nope. It is not working because of your own dumb reasoning of not augmenting the training data to produce different orientations.


                                                                                                                                                                                            • machinelearning 136 days ago

                                                                                                                                                                                              Also known as "programmer error".

                                                                                                                                                                                              • acdha 136 days ago

                                                                                                                                                                                                What value do you think this comment contributed? The entire article clearly describes it as a commonly mistake and is trying to raise awareness so people stop repeating it.

                                                                                                                                                                                                • machinelearning 135 days ago

                                                                                                                                                                                                  I think it contributed the warning that the title is clickbait. Its very clear based on the discussion in the thread that people who've just read the title are discussion computer vision and machine learning strategies to solve this problem when the article describes an almost elementary programming issue.

                                                                                                                                                                                                • lugg 136 days ago

                                                                                                                                                                                                  Aka PBKAC

                                                                                                                                                                                                • egfx 136 days ago

                                                                                                                                                                                                  Nonsense. Image recognition engines are very capable of detection in even at the most extreme angles. Sure it won’t rotate the image for you but it will certainly tell you there is a goose in it. What tricks image recognition technology more then anything is lighting.

                                                                                                                                                                                                  • egfx 136 days ago

                                                                                                                                                                                                    Let me clarify and say googles IR engine is capable. I actually ran the first set of tests developed around image recognition technology. I worked at Neven Vision which got acquired by google in 2006.

                                                                                                                                                                                                  • MrStonedOne 136 days ago

                                                                                                                                                                                                    This article is all wrong.

                                                                                                                                                                                                    At the core, there is something that is converting this jpeg or misc encoded data into raw encoded data, and this process MUST account for the orientation.

                                                                                                                                                                                                    Either the app is reading the image, converting it before passing it to the CV/ML/AI library, and this conversion step needs to respect this tag, and either transfer the tag or apply it to the transformed object; OR, the CV/ML/AI library is getting in encoded image data, and it needs to check for this tag.

                                                                                                                                                                                                    Those are the two options, either the CV/ML/AI library sees the tag, and should consider it, or it doesn't, and the library that is stripping it away shouldn't be doing that

                                                                                                                                                                                                    • laughinghan 136 days ago

                                                                                                                                                                                                      This article helps people use the libraries they have, and it does so correctly. If you want to fix the libraries, submit a PR, don't complain about this article.