Signal Processing - November 2017 - 126

Cross-domain generative models enable adaptation in the image
space itself. Evolution in cross-domain generative models will
provide robust mechanisms for accurately mapping the image
spaces across domains, thereby alleviating the need for labeled
target data. Classifier and feature adaptation can also be applied
on top of image translation to enhance domain adaptation. Crossdomain generative modeling is a relatively new frontier in computer vision research with promising results in domain adaptation.

model, then algorithms that are developed using this data set
can be applied to any domain adaptation problem where the
same domain shift is observed. This is one primary reason
highlighting the need for introducing new data sets for domain
adaptation based on modeling domain shift.
The standard data sets for computer vision-based domain
adaptation are facial expression data sets CKPlus [103] and
MMI [104], digit data sets SVHN [105], USPS, and MNIST
[102], head pose recognition data sets PIE [56], object recognition data sets COIL [56], Office [49], and Office-Caltech [30].
These data sets were created before deep learning became
popular and are insufficient for training and evaluating deep
learning-based domain adaptation approaches. A deep-learning model with millions of parameters requires millions of
images for training. Current approaches fine-tune pretrained
deep networks with these small data sets to avoid overfitting
issues. The current data sets are small with a limited number of
categories and limited variation. For instance, the most popular object-recognition data set Office has 4,110 images across
31 categories. In addition, the image statistics of the three
domains in Office are nearly identical.
Due to some inconsistencies in the Office data set [33], [96],
recent approaches evaluate their models using MNIST, modifiedMNIST, and SVHN data sets [33], [81], [85], [89], [94]. Recently,
a couple of data sets have been introduced for deep-learningbased domain adaptation. Office-Home is an object recognition
data set that can be used to evaluate deep-learning algorithms
for domain adaptation [65]. The Office-Home data set consists
of four domains, with each domain containing images from
65 categories of everyday objects and a total of around 15,500
images. Castrejon et al. [8] introduce a multimodal domain
adaptation data set CMPlaces with RGB, sketches, clipart, and
textual descriptions of indoor scenes with 205 categories and
millions of images. However, these data sets do not address all
of the concerns regarding data sets, and newer and larger data
sets are necessary based on modeling domain shift. Evolution
in data sets and the evolution of models for domain shift need to
complement each other.

Current forms of adaptation merely align the marginal distributions of the source PS (X ) and the target PT (X ). The popular
MMD measure from [52] is often applied to align the marginal distributions of the source and target data, as described in the instance
selection approach [55]. The goal of domain adaptation is not
merely aligning the domains but also being able to use the models
trained on the source on the target. In most cases, the domain adaptive models are created for classification. It would, therefore, make
more sense to align the joint distributions PS (X, Y ) with PT (X, Y )
rather than merely the marginal distributions. The alignment of
joint distributions will make a classifier trained on the source an
effective classifier for the target.
The challenge with this approach is that target labels are not
available in unsupervised domain adaptation. The workaround is
to impute the target data labels and refine them iteratively. There
has been work in this regard as in [56], where the joint distributions
are aligned in a spectral method using kernel-PCA by imputing
the labels and refining them over multiple iterations. A deeplearning approach has also been attempted in this regard in [112],
using a transductive approach to learn the target labels while also
minimizing the joint domain discrepancy. As discussed in the
section "Miscellaneous Hierarchical Methods," Sener et al. [94]
develop a deep-learning approach that imputes the labels for the
target in a transductive learning environment. The aforementioned approaches use the predicted target data labels to ensure
joint distribution alignment. Conditional generative models along
with joint distribution alignment could usher in the next wave of
domain adaptation models.

Cross-domain generative models

Person-centered models

Generative models like GANs are currently very popular in the
computer vision research community [80]. They have a wide
range of applications, including image superresolution [106], textto-image generation [107], [108], image-to-image translation [91],
and conditional image generation [109], [110]. Adversarial methods have been successfully applied in domain adaptation in the
form of cross-domain image generation. Cross-domain generative
models transform images from one domain into images in another
domain [91]. These models can be applied to transform labeled
source images into target images. These transformed images are
then used to train a target classifier [81], [88], [89]. One can argue
that cross-domain generative models learn a transformation, mapping the images from one domain into another. This is a unique
procedure to achieve domain adaptation in the space of images.
Current procedures in domain adaptation learn to adapt either
the classifiers [10], [47], [49], [111] or the features [32], [50], [63].

Very soon, computing is going to become all pervasive. The
environment is plugged with computing devices, and an average person carries quite a few smart-devices like a phone, watch,
wristband, etc. Can this computing be adapted to every user?
Computing that adapts to a user's needs and idiosyncrasies can
be called person-centered computing [113]. This would mean
that personal devices would model their interaction and response
based on the user's needs rather than a one-size-fits-all approach
where users train themselves to adapt to their devices to effectively interact with them. This paradigm, where the user and the
device adapt to each other, is termed coadaptation.
These personalized devices will need to be designed to have
core functional components making them applicable to a broad
range of users. In addition, they must also have coadaptive components that help customize the device at an individual user level.
The device must adapt to the user based on patterns gathered

126

Joint distribution models

IEEE SIGNAL PROCESSING MAGAZINE

November 2017

Table of Contents for the Digital Edition of Signal Processing - November 2017