Mt. St. Helens abstract, winter. Super resolution of this Canon 5D Mark III image from 2013 yields an 88.5 MP file ripe for a super large canvas print. Photo: Rishi Sanyal
Back in March, Senior Principal Scientist Eric ‘I like to mess around with pixels’ Chan published a blog post on Adobe.com outlining the strides he and his team had made at Adobe using machine learning to make your photos better. Along with collaborators Michaël Gharbi and Richard Zhang of Adobe Research, Eric and the team produced a tool that extracts more detail from your photos without the typically concomitant noise penalty, by throwing machine learning at the problem.
Let’s try something: in the slider below, move the little circle with the arrows left and right. Does moving it to the right feel like an ‘Enhance!’ super-power? Does moving it to the left feels like smearing vaseline over your lens? If so, read on to find out how machine learning can perhaps make your images better – even images you’ve already shot.
The problem
Adobe’s goal was simple: to increase the amount of detail in your photos. No one ever complained about more detail; we photographers apply sharpening all the time, and sometimes even upsample our photos to higher resolution for the purposes of larger prints, higher resolution and bigger displays, to name a couple of use-cases. But simple sharpening algorithms are rather crude: the popular unsharp mask is really just a command that says ‘make brights brighter and make darks darker’, but only in high frequency portions of the image, leaving lower frequencies like areas of smooth skies alone.
Unsharp mask. In the ‘intensity profile’ at bottom right, note the dip in darks and the boost in brights at the edge boundary. This is how unsharp mask creates the impression of increased sharpness. Image credit: Wikipedia
There’s a limit to how discriminating such an approach can be: what is high enough frequency to be sharpened and what is low enough frequency to be left alone? Quite often, sharpening of detail leads to sharpening of noise. Furthermore, simple ‘brights brighter, darks darker’ contrast boosts around high frequency content can only net you so much ‘real’, new information or detail increase. What if, instead, we could sharpen and upsample based on context?
Machine learning
Here’s where machine (ML) learning comes in. At its simplest, ML is an attempt to train an algorithm to learn and do something it wasn’t previously capable of. Think raising a child: With some structure and a whole lot of examples, you can train your child to do things he or she never imagined possible.
The idea is this: your child has some rudimentary understanding of, say, how to sing do-re-mi-fa-so. You, in your infinite auricular wisdom, sing do-re-mi-fa-so back, but in tune. This is the ‘ground truth’. You ask your child to try and match it. They try, they fail, and you keep singing your ‘ground truth’. You give some guidance along the way – you tell them when they’re wrong – but eventually they learn how to sing do-re-mi-fa-so.
Your child learns through this iterative process, as neural networks do. You take the input (the child’s rudimentary understanding of how to sing a scale), show the neural network (brain) the ground truth (an in-tune scale), and iterate until the neural net can take the input and produce the output.
Applying ML to upsampling
When it comes to super resolution, the idea is to take a whole lot of low-resolution and high-resolution image pairs, and train a neural network how to translate one to the other.
Input (left): undemosaiced Raw capture. Output (right): the demosaiced output; this is the ground truth. The job of the deep CNN is to learn how to take the input on left and produce the output on right. Image credit: Adobe
Adobe used a database of millions of such image pairs, with subjects ranging from natural details like trees, foliage to manmade patters like fabrics, housing materials, to… who knows? The idea is that with enough different examples, the model will learn how to upsize detail in any type of image, be it a landscape, cityscape, portrait, or whatever.
The particular type of neural network that Adobe trained was a ‘deep convolutional neural network’, which essentially attempts to determine what the output pixel for any input pixel should be based on neighboring input pixels. This isn’t too difficult to understand: if you see an image of one eyelash, you may not know what it is, but ‘zoom out’ and see it in the context of an entire eyelid, and you’ll recognize it for what it is. That the fate of a pixel is determined by its neighbors is nothing new: simple bicubic upsampling algorithms also consider adjacent pixels, but such algorithms do so only in a static and simplistic manner (‘weight this neighboring pixel this much and that neighboring pixel that much’).
A neural network can operate in a far more sophisticated manner, and gets better with iterative training. Training on a diversified set of images also keeps the neural network from being pigeon-holed into optimizing for only one, or one type of, image. Below, you can see the progress a neural network makes as it is trained and learns how to translate the low-res undemosaiced monochromatic Raw input to the final higher-res color output.
Training a deep convolutional neural network to demosaic and upsample Raw input to high-resolution output. Undemosaiced, low-res input Raw (upper left). Final, demosaiced high-resolution result (bottom right). Image credit: Adobe
The proof of the pudding…
While we were flattered when Eric used DPReview studio scene test shots to demonstrate some of the improvements in textures, the ‘proof of the pudding’ was in examples of real-world photographs. To that end, we at DPReview went back through our archives to find some of our favorite photographs – particularly ones taken on older, lower-resolution cameras – to see how the super-resolution pipeline handles them.
First we’ll take a look at this shot I took in Iceland, which provides a bunch of both high and low contrast natural foliage textures in areas of sun and shadow. In the slider below, as long as you’re viewing on a HiDPI display in a scaled mode or a smartphone, you’re getting a 1:1 magnification view of a 42 MP Sony a7R II image upscaled 2x linearly (in each dimension) using super resolution – at full size that’s roughly a 170 MP image! We’re comparing it to a version upscaled the same amount using Adobe’s most advanced prior upscaling algorithm, ‘Preserve Details 2.0’:
Note how details and shapes barely visible in the ground cover suddenly take on rather definitive shape and form. Next we’ll take a look at another favorite of mine, a 220mm F22 diffraction-tastic photo I took in the dead of winter, an abstract and oft-overlooked view of the basin of Mt. St. Helens, inspired by the work of the great, late Johsel Namkung. This time we’re looking at the super resolution result downscaled – using bicubic sharper – back down to the Canon 5D Mark III’s original 22 MP resolution. Since we’ve optimized this slider for HiDPI displays, if you’re having trouble appreciating the difference, we recommend magnifying your browser to 200%, or pinch-and-zooming your smartphone display.
Comparing an original Canon 5D Mark III image (right) to a 2x upscaled super resolution result downscaled back – using bicubic sharper – to native camera resolution (left). 1:1 magnification only on HiDPI displays using scaled modes. Magnify browser to 200% or pinch and zoom your smartphone display to enhance differences.Download: Original (22 MP) | Super Res downscaled to native res using bicubic sharper (22 MP)
There are patterns in the snowpack one can’t even make out in the original that become obvious in the super resolution result. Meanwhile, tree trunks and branches tighten up. Diffraction? What diffraction?
Finally, before we move on, let’s take a look at what super resolution can do with yesteryear’s 3.1 MP Canon EOS D30. Below is, again, a super resolution result downscaled – using bicubic sharper – back to the camera’s original 3.1 MP resolution (left) compared to the original (right).
There’s an incredible amount of detail extracted from the foreground grass as well as the tree branches and foliage. Tree trunks suddenly tighten up, as in our last image. Pay close attention to the tops of the snow-covered mountains in the distance. Despite having extremely low contrast, you begin to make out the trees against the snow in the super resolution result. Any traditional sharpening attempts to bring out that level of detail from such low contrast areas would surely lead to considerable noise; yet, there is no noise introduced in the smooth skies at the top right of the image after super resolution upscaling. Precisely because this isn’t just a sharpening algorithm.
… is in the eating
Our page would quickly fill up were we to only present slider comparisons (and sliders only look at one small portion of the total image), so below we present a whole set of images from a bunch of us on the team, upscaled and downscaled in various ways, for your viewing pleasure and analysis. Interestingly, you’ll find some results more or less compelling than others, and learn a thing or two about Adobe’s new tool along the way. We want this to be a ‘choose your own adventure’, so we encourage you to browse our tool and, when you’ve found a particularly compelling comparison, click on ‘Go to full screen mode’ at the top right of the widget and paste the link of the pop-up window into the comments below to share.
To use our comparison tool: first use the drop-down menu up top to select between any number of photographs (‘scenes’) shot by the DPReview team. After selecting a scene, you’ll note the photographer denoted above each image crop. To the right of the author drop-down is one that allows you to choose between four options: ‘Original’ is an Adobe Camera Raw (ACR) conversion at the camera’s native resolution; ‘Preserve Details 2.0’ is the result of 4x upscaling (2x in each linear dimension) using Adobe’s best upscaling algorithm prior to ‘super resolution’; ‘Super Resolution’ uses the current ML-based algorithm to upscale the image 4x with all edits preserved. Finally, ‘Super Resolution (Downsampled)’ resamples the Super Resolution result back to the camera’s native resolution, for those times when you don’t need more pixels, but just want better pixels (while ‘Downsampled’ uses bicubic resampling, ‘Downsampled sharper’ uses bicubic sharper for generally more pleasing higher acuity results).
Below are a few interesting findings. Note: the following in-line links dynamically change the state of the widget above. Mobile users will want to switch to our desktop site as the widget and links do not function on mobile.
First, the super resolution algorithm is phenomenal at teasing out more resolution out of natural textures, far better than the Preserve Details 2.0 upscaling algorithm as you can see in this 2x result here, where microcontrast is improved throughout. Natural textures contained even within shadowy areas of low contrast are pulled out with uncanny prowess. The lead abstract for this article was shot at F22, but super resolution removes the effects of diffraction and distinctly identifies the red branches of the naked, leaf-less trees. You’ll start to think you’re seeing individual skin cells on people’s faces – just don’t show your models zoom-ins.
It’s not all sparkles and glitter though. We see no appreciable increase in detail in Carey’s Kathmandu sunrise shot (Nikon D700, ISO 200), nor in Dale’s Amazonian rainforest shot (Canon Rebel XT, ISO 800). Along similar lines, my Seattle cityscape sunset shot at ISO 100 with a Canon 5D Mark II shows only a slight improvement in detail when downscaling the super resolution result using bicubic sharper, and it’s a wash when using bicubic downsampling. Compare that to the dramatic improvement using ‘only’ bicubic downsampling in my a7R II example. (Bicubic sharper applies some sharpening upon downscaling to combat the often perceived softening introduced by rescaling, but can lead to overshoot or noise)
All signs point to super resolution working best with low noise images, and we wonder if the training set consisted mainly of noise free images. All the shots we’ve just been talking about have heightened noise characteristics: Carey’s D700 and my 5D Mark II had low dynamic range, which we tried to exploit in our images such that even base ISO images were noisy. Meanwhile Dale’s older generation APS-C sensor guaranteed that ISO 800 would be, relatively speaking, noisy. Further supporting the hypothesis that the super resolution pipeline work well with low noise pixels are the spectacular results with the 3.1 MP Canon EOS D30: those big pixels having whopping pixel-level dynamic range and, therefore, low pixel noise.
Finally, for fun we shot the same macro scene with the Nikon D3S and the Z7, since upscaling the D3S 2x linearly results in a file of similar resolution to the Z7. There’s not much in focus here, but what is in focus in the upscaled D3S image is at least comparable in detail level to the native Z7 shot. With the caveat that detail smaller than that contained in the image may be better resolved by the Z7. Of course, one should keep in mind that the Z7 has potential for even more detail once it’s run through the super resolution pipeline.
Return to Mobile site
What’s next?
If Marc Levoy’s arrival at Adobe from Google, where he spearheaded numerous computational approaches to improve camera quality, is any indication, Adobe is clearly doubling down on the future of imaging. And if these super-resolution results are any indication, machine learning will have a large part to play in those advances. Sharpening and resolution are just one aspect of image quality, but there are so many more: color, contrast, dynamic range, noise and, perhaps more importantly: artistic or rendering intent. It’s not hard to imagine a future where all of these image (and video!) attributes are augmented using machine learning; in fact, it’s already here.
In a manner similar to that outlined here, you can train noise reduction algorithms using noisy (low-light) images as input and low-noise ‘ground truth’ versions produced from better (e.g. larger sensor) imagers. A better demosaicing algorithm can be trained using pairs of Bayer Raws and ground-truth RGB images made with the RGGB color filter array removed and a color light wheel to sample red, green and blue values at each pixel.
Computational photography needn’t be about creating artificial images or replacing the creative skills involved in editing… it can help execute and grow one’s own creative intent
Researchers are doing precisely this. One can easily imagine more creative applications, like training a neural network to process images in the style of your favorite photographer using pairs of unedited and edited images. Or to even process images as you tend to process them, by training the neural network using images you’ve edited as ‘ground truths’ for the original Raws.
The possibilities are endless. And we don’t know about you but, while we eagerly anticipate the future, we’ll be having some fun making huge prints with all the extra pixels Adobe’s super resolution grants us! Let us know in the comments what excites you the most, and what you’d like to see out of computational approaches to imaging.
Samples
All images in our Image Comparison Tool above are presented at full resolution in the following gallery. All conversions performed in Adobe Camera Raw / Lightroom. ‘Original’ indicates the image at its original resolution; ‘Super Resolution’ is the result of 2x (in each linear dimension) upscaling using the ML-trained algorithm currently described; and ‘Preserve Details 2.0’ is the result of similar 2x upscaling using what was previously the most sophisticated upsampling algorithm available. Finally ‘Super Resolution downsampled to camera native resolution using bicubic’ and ‘Super Resolution downsampled to camera native resolution using bicubic sharper’ simply takes the Super Resolution output and resizes it back to the camera’s native resolution. This gives you a result that’s the same total # of pixels as the ‘Original’ file, but with “better” pixels – as if you’d shot with a similar resolution system with higher resolving power.
Notes from DPReview editors:
Barney: I’ve been working with digital cameras since around 2001, and in the past 20 years I’ve used just about every kind of camera out there, from cheap point and shoots to expensive professional bodies. These two images were taken on two of my favorite ‘low resolution’ bodies of yesteryear.
Farmstead in Montana (Canon EOS D30 – 3MP): This shot was taken on my EOS D30 in 2015, on a video shoot with Canon. I brought the camera along as a conversation-starter with the late, great Chuck Westfall who accompanied us for part of the shoot. I didn’t get a lot of time to shoot with it (and its 15 year-old batteries barely lasted long enough for 20 exposures) but I quite like this shot, and it shows just how much detail that early Canon CMOS sensor could capture, despite having ‘only’ 3MP.
Vale of York, UK (Nikon D3S – 12MP): Shot from the top of Sutton Bank, overlooking the vale of York, the scene depicted in this photograph is what I see when I think of home. The structures in the middle distance are the windblown remains of High Barn: a favored destination for the Britton family on clifftop walks, year-round.
Carey: Back in 2013, I embarked on my fourth trip to Nepal. I went with my father, who runs a 501(c)3 nonprofit with the aim of providing education and opportunities to Nepalese children and young adults with disabilities. Though we spent lots of time volunteering, checking in at project sites and attending meetings, we still found time for some sightseeing and trekking.
Kathmandu at sunrise and Woman in Kathmandu Market (Nikon D700 – 12MP): These two images, of Kathmandu at sunrise (turns out jet-lag is good for something after all) and of a woman in a city market, were captured on my trusty Nikon D700 and AF-D 35mm F2 lens. These images always remind me of the sense of loosely organized chaos that pervades so many aspects of life in Kathmandu.
Dale: The point at which I became serious about digital photography happened to coincide with two years spent living in South America. To this day, I’m thankful that I made the switch to digital to capture those important memories.
Amazon rainstorm (Canon EOS Digital Rebel XT – 8MP): I spent several months living in the Amazon jungle and it remains one of my favorite places to photograph to this day. This image stands out because it reminds me of the torrential downpours I experienced almost daily. I was at a friend’s house near Iquitos, Peru, and grabbed this shot as an afternoon storm started. To me, it captures the essence of Amazon rain.
Pan flute (Canon EOS Digital Rebel XT – 8MP): I was with a friend who was teaching himself to play the pan flute when we decided to take a break on a grassy slope below the erupting Tungurahua volcano near the town of Baños, Ecuador. I distinctly remember listening to his gentle flute music while watching an ash cloud rise into the sky just a few miles away.
Rishi: There were of course too many images I wanted to run through the super resolution pipeline, but only so much time before publishing this article. The ones I included here were the intersection of ‘photos I liked that I wanted to & had the time to run through the pipeline’ and ‘photos that showed something interesting about the super resolution algorithm’. Here are some brief summaries for the shots I haven’t already touched upon:
Iceland: This photo was taken on our honeymoon in the fall of 2015. We’d been driving under clouds along the Ring Road for much of the day until finally the clouds broke. The mossy fields of volcanic rock by the side of the road were bathed in sunlight through a hole in the clouds for a brief moment. I captured the shot on my Sony a7R II and a Canon EF 24-70mm F2.8L II (still the best 24-70 for that system, IMHO) quickly before the hole in the clouds closed back up, returning the landscape to an all-too-familiar grey
Wide-angle Portrait: This was taken during the shoot for our video: Wide-angle portraiture with the Sigma 24-35mm F2 DG HSM Art. I’m a fan of taking pictures of people with wider angle lenses, like 35mm or 24mm. As long as you take into consideration subject placement, distance, height and the like, these portraits can often have more depth, feel more intimate, and tell more of a story than long telephoto, completely blurred-background portraits. This is not one of those photos! It’s not even my favorite photo from the shoot. It just happens to be the most in focus one (super resolution will magnify the most minute errors in focus, as will the 50MP resolution of the 5DS R!). Super resolution does well with the natural, random textures of skin, hair and lashes, and if you shoot shallow depth-of-field, sharp areas will get crisper while soft areas will remain soft.