Edge-Aware Smoothing, Intrinsic Image Decomposition, L1 Sparsity Model

Image and Video Computing



Exemplar-Based Image and Video Stylization Using Fully Convolutional Semantic Features

Feida Zhu, Zhicheng Yan, and Yizhou Yu
IEEE Transactions on Image Processing, Vol 26, No 7, 2017, (PDF)

Color and tone stylization in images and videos strives to enhance unique themes with artistic color and tone adjustments. It has a broad range of applications from professional image postprocessing to photo sharing over social networks. Mainstream photo enhancement softwares, such as Adobe Lightroom and Instagram, provide users with predefined styles, which are often hand-crafted through a trial-and-error process. Such photo adjustment tools lack a semantic understanding of image contents and the resulting global color transform limits the range of artistic styles it can represent. On the other hand, stylistic enhancement needs to apply distinct adjustments to various semantic regions. Such an ability enables a broader range of visual styles. In this paper, we first propose a novel deep learning architecture for exemplar-based image stylization, which learns local enhancement styles from image pairs. Our deep learning architecture consists of fully convolutional networks (FCNs) for automatic semantics-aware feature extraction and fully connected neural layers for adjustment prediction. Image stylization can be efficiently accomplished with a single forward pass through our deep network. To extend our deep network from image stylization to video stylization, we exploit temporal superpixels (TSPs) to facilitate the transfer of artistic styles from image exemplars to videos. Experiments on a number of datasets for image stylization as well as a diverse set of video clips demonstrate the effectiveness of our deep learning architecture.



An L1 Image Transform for Edge-Preserving Smoothing and Scene-Level Intrinsic Decomposition

Sai Bi, Xiaoguang Han, and Yizhou Yu
SIGGRAPH 2015, [BibTex], (PDF, Supplemental Materials)

Code Release
    Code for L1 image flattening and edge-preserving smoothing can be downloaded here or from GitHub.

Data Download
    Edge-preserving smoothing results
    Image flattening and intrinsic decomposition results on the Instrinsic-Images-in-the-Wild database (Baidu Cloud, Google Drive)

Identifying sparse salient structures from dense pixels is a long-standing problem in visual computing. Solutions to this problem can benefit both image manipulation and understanding. In this paper, we introduce an image transform based on the L1 norm for piecewise image flattening. This transform can effectively preserve and sharpen salient edges and contours while eliminating insignificant details, producing a nearly piecewise constant image with sparse structures. A variant of this image transform can perform edge-preserving smoothing more effectively than existing state-of-the-art algorithms. We further present a new method for complex scene-level intrinsic image decomposition. Our method relies on the above image transform to suppress surface shading variations, and perform probabilistic reflectance clustering on the flattened image instead of the original input image to achieve higher accuracy. Extensive testing on the Intrinsic-Images-in-the-Wild database indicates our method can perform significantly better than existing techniques both visually and numerically. The obtained intrinsic images have been successfully used in two applications, surface retexturing and 3D object compositing in photographs.



Automatic Photo Adjustment Using Deep Neural Networks

Zhicheng Yan, Hao Zhang, Baoyuan Wang, Sylvain Paris, and Yizhou Yu
ACM Transactions on Graphics, Vol 35, No 2, 2016, [BibTex] (PDF, Supplemental Materials)

Photo retouching enables photographers to invoke dramatic visual impressions by artistically enhancing their photos through stylistic color and tone adjustments. However, it is also a time-consuming and challenging task that requires advanced skills beyond the abilities of casual photographers. Using an automated algorithm is an appealing alternative to manual work but such an algorithm faces many hurdles. Many photographic styles rely on subtle adjustments that depend on the image content and even its semantics. Further, these adjustments are often spatially varying. Existing automatic algorithms are still limited and cover only a subset of these challenges. Recently, deep learning has shown unique abilities to address hard problems. This motivated us to explore the use of deep neural networks in the context of photo editing. In this paper, we formulate automatic photo adjustment in a way suitable for this approach. We also introduce an image descriptor accounting for the local semantics of an image. Our experiments demonstrate that training deep neural networks using these descriptors successfully capture sophisticated photographic styles. In particular and unlike previous techniques, it can model local adjustments that depend on image semantics. We show that this yields results that are qualitatively and quantitatively better than previous work.



Audeosynth: Music-Driven Video Montage

Zicheng Liao, Yizhou Yu, Bingchen Gong, and Lechao Cheng
SIGGRAPH 2015, [BibTex], (PDF, Project Webpage)

We introduce music-driven video montage, a media format that offers a pleasant way to browse or summarize video clips collected from various occasions, including gatherings and adventures. In music-driven video montage, the music drives the composition of the video content. According to musical movement and beats, video clips are organized to form a montage that visually reflects the experiential properties of the music. Nonetheless, it takes enormous manual work and artistic expertise to create it. In this paper, we develop a framework for automatically generating music-driven video montages. The input is a set of video clips and a piece of background music. By analyzing the music and video content, our system extracts carefully designed temporal features from the input, and casts the synthesis problem as an optimization and solves the parameters through Markov Chain Monte Carlo sampling. The output is a video montage whose visual activities are cut and synchronized with the rhythm of the music, rendering a symphony of audio-visual resonance.



ColorSketch: A Drawing Assistant for Generating Color Sketches from Photos

Guanbin Li, Sai Bi, Jue Wang, Ying-Qing Xu, and Yizhou Yu
IEEE Computer Graphics and Applications, Vol 37, No 3, 2017, PDF

A color sketch creates a vivid depiction of a scene using sparse pencil strokes and casual colored brush strokes. In this paper, we introduce an interactive drawing system, called ColorSketch, for helping novice users generate color sketches from photos. Our system is motivated by the fact that novice users are often capable of tracing object boundaries using pencil strokes, but have difficulties to choose proper colors and brush over an image region in a visually pleasing way. To preserve artistic freedom and expressiveness, our system lets users have full control over pencil strokes for depicting object shapes and geometric details at an appropriate level of abstraction, and automatically augment pencil sketches using color brushes, such as color mapping, brush stroke rendering as well as blank area creation. Experimental and user study results demonstrate that users, especially novice ones, can generate much better color sketches more efficiently with our system than using traditional manual tools.



Example-Based Image Color and Tone Style Enhancement

Baoyuan Wang, Yizhou Yu, and Ying-Qing Xu
SIGGRAPH 2011, [BibTex], (PDF, Supplemental Materials)

Color and tone adjustments are among the most frequent image enhancement operations. We define a color and tone style as a set of explicit or implicit rules governing color and tone adjustments. Our goal in this paper is to learn implicit color and tone adjustment rules from examples. That is, given a set of examples, each of which is a pair of corresponding images before and after adjustments, we would like to discover the underlying mathematical relationships optimally connecting the color and tone of corresponding pixels in all image pairs. We formally define tone and color adjustment rules as mappings, and propose to approximate complicated spatially varying nonlinear mappings in a piecewise manner. The reason behind this is that a very complicated mapping can still be locally approximated with a low-order polynomial model. Parameters within such low-order models are trained using data extracted from example image pairs. We successfully apply our framework in two scenarios, low-quality photo enhancement by transferring the style of a high-end camera, and photo enhancement using styles learned from photographers and designers.



Data-Driven Image Color Theme Enhancement

Baoyuan Wang, Yizhou Yu, Tien-Tsin Wong, Chun Chen, and Ying-Qing Xu
SIGGRAPH Asia 2010, [BibTex], (PDF, Supplemental Materials)

It is often important for designers and photographers to convey or enhance desired color themes in their work. A color theme is typically defined as a template of colors and an associated verbal description. This paper presents a data-driven method for enhancing a desired color theme in an image. We formulate our goal as a unified optimization that simultaneously considers a desired color theme, texture-color relationships as well as automatic or user-specified color constraints. Quantifying the difference between an image and a color theme is made possible by color mood spaces and a generalization of an additivity relationship for two-color combinations. We incorporate prior knowledge, such as texture-color relationships, extracted from a database of photographs to maintain a natural look of the edited images. Experiments and a user study have confirmed the effectiveness of our method.




Speaker-Following Video Subtitles

YONGTAO HU, JAN KAUTZ, YIZHOU YU, and WENPING WANG
ACM Transactions on Multimedia Computing, Communication and Applications, Vol 11, No 2, 2014 , (PDF)

We propose a new method for improving the presentation of subtitles in video (e.g., TV and movies). With conventional subtitles, the viewer has to constantly look away from the main viewing area to read the subtitles at the bottom of the screen, which disrupts the viewing experience and causes unnecessary eyestrain. Our method places on-screen subtitles next to the respective speakers to allow the viewer to follow the visual content while simultaneously reading the subtitles. We use novel identification algorithms to detect the speakers based on audio and visual information. Then the placement of the subtitles is determined using global optimization. A comprehensive usability study indicated that our subtitle placement method outperformed both conventional fixed-position subtitling and another previous dynamic subtitling method in terms of enhancing the overall viewing experience and reducing eyestrain.




Single-View Hair Modeling for Portrait Manipulation

Menglei Chai, Lvdi Wang, Yanlin Weng, Yizhou Yu, Baining Guo, and Kun Zhou
SIGGRAPH 2012, PDF

Human hair is known to be very difficult to model or reconstruct. In this paper, we focus on applications related to portrait manipulation and take an application-driven approach to hair modeling. To enable an average user to achieve interesting portrait manipulation results, we develop a single-view hair modeling technique with modest user interaction to meet the unique requirements set by portrait manipulation. Our method relies on heuristics to generate a plausible high-resolution strand-based 3D hair model. This is made possible by an effective high-precision 2D strand tracing algorithm, which explicitly models uncertainty and local layering during tracing. The depth of the traced strands is solved through an optimization, which simultaneously considers depth constraints, layering constraints as well as regularization terms. Our single-view hair modeling enables a number of interesting applications that were previously challenging, including transferring the hairstyle of one subject to another in a potentially different pose, rendering the original portrait in a novel view and image-space hair editing.



Interactive Image Segmentation Based on Level Sets of Probabilities

Yugang Liu and Yizhou Yu
IEEE Transactions on Visualization and Computer Graphics, Vol 18, No 2, 2012, [BibTex], PDF

In this paper, we present a robust and accurate levelset based algorithm for interactive image segmentation. The level set method is clearly advantageous for image objects with a complex topology and fragmented appearance. Our method integrates discriminative classification models with the level set method to better avoid local minima. Our level set function approximates a posterior probabilistic mask of a foreground object. The evolution of its zero level set is driven by three force terms, region force, edge field force, and curvature force. These forces are based on a probabilistic classifier and an unsigned distance transform of salient edges. We further apply expectation-maximization to improve the performance of both the probabilistic classifier and the level set method over multiple passes. Experiments and comparisons demonstrate the superior performance of our method.



A Subdivision-Based Represenation for Vector Image Editing

Zicheng Liao, Hugues Hoppe, David Forsyth, and Yizhou Yu
IEEE Transactions on Visualization and Computer Graphics, Vol 18, No 11, 2012 (spotlight paper), PDF

Vector graphics has been employed in a wide variety of applications due to its scalability and editability. Editability is a high priority for artists and designers who wish to produce vector-based graphical content with user interaction. In this paper, we introduce a new vector image representation based on piecewise smooth subdivision surfaces, which is a simple, unified and flexible framework that supports a variety of operations, including shape editing, color editing, image stylization, and vector image processing. These operations effectively create novel vector graphics by reusing and altering existing image vectorization results. Because image vectorization yields an abstraction of the original raster image, controlling the level of detail of this abstraction is highly desirable. To this end, we design a feature-oriented vector image pyramid that offers multiple levels of abstraction simultaneously. Our new vector image representation can be rasterized efficiently using GPU-accelerated subdivision. Experiments indicate that our vector image representation achieves high visual quality and better supports editing operations than existing representations.



Patch-Based Image Vectorization with Automatic Curvilinear Feature Alignment

Tian Xia, Zicheng(Binbin) Liao, and Yizhou Yu
SIGGRAPH Asia 2009, PDF

Raster image vectorization is increasingly important since vectorbased graphical contents have been adopted in personal computers and on the Internet. In this paper, we introduce an effective vectorbased representation and its associated vectorization algorithm for full-color raster images. There are two important characteristics of our representation. First, the image plane is decomposed into nonoverlapping parametric triangular patches with curved boundaries. Such a simplicial layout supports a flexible topology and facilitates adaptive patch distribution. Second, a subset of the curved patch boundaries are dedicated to faithfully representing curvilinear features. They are automatically aligned with the features. Because of this, patches are expected to have moderate internal variations that can be well approximated using smooth functions. We have developed effective techniques for patch boundary optimization and patch color fitting to accurately and compactly approximate raster images with both smooth variations and curvilinear features. A real-time GPU-accelerated parallel algorithm based on recursive patch subdivision has also been developed for rasterizing a vectorized image. Experiments and comparisons indicate our image vectorization algorithm achieves a more accurate and compact vector-based representation than existing ones do.



Lazy Texture Selection Based on Active Learning

Tian Xia, Qing Wu, Chun Chen, and Yizhou Yu
The Visual Computer, Vol 26, No 3, 2010, PDF

It imposes a great challenge to select desired textures and textured objects across both spatial and temporal domains with minimal user interaction. This paper presents a method for achieving this goal. With this method, the appearance of similar texture regions within an entire image or video can be simultaneously manipulated. The technique we developed applies the active learning methodology. The user only needs to label minimal initial training data and subsequent query data. An active learning algorithm uses these labeled data to obtain an initial classifier and iteratively improves it until its performance becomes satisfactory. A revised graph cut algorithm based on the trained classifier has also been developed to improve the spatial coherence of selected texture regions. A variety of operations, such as color editing, matting and texture cloning, can be applied to the selected textures to achieve interesting editing effects.



Hierarchical Tensor Approximation of Multi-Dimensional Visual Data

Qing Wu, Tian Xia, C. Chen, H.-Y. Lin, H. Wang and Yizhou Yu
IEEE Transactions on Visualizationa and Computer Graphics, Vol 14, No 1, 2008, PDF

Visual data comprise of multi-scale and inhomogeneous signals. In this paper, we exploit these characteristics and develop a compact data representation technique based on a hierarchical tensor-based transformation. In this technique, an original multi-dimensional dataset is transformed into a hierarchy of signals to expose its multi-scale structures. The signal at each level of the hierarchy is further divided into a number of smaller tensors to expose its spatially inhomogeneous structures. These smaller tensors are further transformed and pruned using a tensor approximation technique. Our hierarchical tensor approximation supports progressive transmission and partial decompression. Experimental results indicate that our technique can achieve higher compression ratios and quality than previous methods, including wavelet transforms, wavelet packet transforms, and single-level tensor approximation. We have successfully applied our technique to multiple tasks involving multi-dimensional visual data, including medical and scientific data visualization, data-driven rendering and texture synthesis.



Out-of-Core Tensor Approximation of Multi-Dimensional Matrices of Visual Data

Hongcheng Wang, Qing Wu, Lin Shi, Yizhou Yu and Narendra Ahuja
SIGGRAPH 2005, [BibTex], PDF

Tensor approximation is necessary to obtain compact multilinear models for multi-dimensional visual datasets. Traditionally, each multi-dimensional data item is represented as a vector. Such a scheme flattens the data and partially destroys the internal structures established throughout the multiple dimensions. In this paper, we retain the original dimensionality of the data items to more effectively exploit existing spatial redundancy and allow more efficient computation. Since the size of visual datasets can easily exceed the memory capacity of a single machine, we also present an out-of-core algorithm for higher-order tensor approximation. The basic idea is to partition a tensor into smaller blocks and perform tensor-related operations blockwise. We have successfully applied our techniques to three graphics-related data-driven models, including 6D bidirectional texture functions, 7D dynamic BTFs and 4D volume simulation sequences. Experimental results indicate that our techniques can not only process out-of-core data, but also achieve higher compression ratios and quality than previous methods.



Feature Matching and Deformation for Texture Synthesis

Qing Wu and Yizhou Yu
SIGGRAPH 2004, PDF

One significant problem in patch-based texture synthesis is the presence of broken features at the boundary of adjacent patches. The reason is that optimization schemes for patch merging may fail when neighborhood search cannot find satisfactory candidates in the sample texture because of an inaccurate similarity measure. In this paper, we consider both curvilinear features and their deformation. We develop a novel algorithm to perform feature matching and alignment by measuring structural similarity. Our technique extracts a feature map from the sample texture, and produces both a new feature map and texture map. Texture synthesis guided by feature maps can significantly reduce the number of feature discontinuities and related artifacts, and gives rise to satisfactory results.