Human Face Recognition

Adaptive Framework



Adaptive framework

                  Our goal is to implicitly learn how similar the novel and training (or gallery) illumination conditions are, to appropriately emphasize either the raw input guided face comparisons or of its filtered output.

                  Let { X1 ,.,XN } be a database of known individuals, X novel input corresponding to one of the gallery classes and ρ( ) and F( ), respectively, a given similarity function and a quasi illumination-invariant filter. We then express the degree of belief μ that two face sets X and  Xi  belong to the same person as a weighted combination of similarities between the corresponding unprocessed and filtered image sets:

 

                    

 

 

In the light of the previous discussion, we want α* to be small (closer to 0.0) when novel and the corresponding gallery data have been acquired in similar illuminations, and large (closer to 1.0) when in very different ones. We show that α* can be learnt as a function:

                                 

                                             α* = α* (μ)

 

where μ is the confusion margin - the difference between the similarities of the two Xi most similar to X . The value of α* (μ) can then be interpreted statistically the optimal choice of the mixing coefficient α given the confusion margin μ. Formalizing this we can write

                                

                                                                                                                          

                                                               

or equivalently,

                                

Under the assumption of a uniform prior on the confusion margin, p(μ)

 

                                         

and        

                                 

Learning the α - function

To learn the a-function α* (μ) as defined in (3), we first need an estimate of the joint

probability density p(α, μ) as per (6). The main difficulty of this problem is of practical

nature: in order to obtain an accurate estimate using one of many off-the-shelf density

estimation techniques, a prohibitively large training database would be needed to ensure a

well sampled distribution of the variable μ. Instead, we propose a heuristic alternative

which, we will show, will allow us to do this from a small training corpus of individuals

imaged in various illumination conditions. The key idea that makes such a drastic reduction in the amount of training data possible, is to use domain specific knowledge of the properties of p(α, μ) in the estimation process. Our algorithm is based on an iterative incremental update of the density, initialized as a uniform density over the domain α, μ Є [0,1], see Figure 7. Given a training corpus, we iteratively simulate matching of an "unknown" person against a set of provisional gallery individuals. In each iteration of the algorithm, these are randomly drawn from the offline training database. Since the ground truth identities of all persons in the offline database are known, we can compute the confusion margin μ(α) for each α = k ∆ α, using the interpersonal similarity score defined in (1). Density is then incremented at each ((k ∆ α ,μ (0)) proportionally to μ (k ∆ α) to reflect the goodness of a particular weighting in the simulated recognition. The proposed offline learning algorithm is summarized in Figure 6 with a typical evolution p(α, μ) in Figure 7.

 

 

Figure 8.

Typical estimates of the α -function plotted against confusion margin μ. The

estimate shown was computed using 40 individuals in 5 illumination conditions for a

Gaussian high-pass filter. As expected, α* assumes low values for small confusion margins and high values for large confusion margins (see (1))

CamFace with 100 individuals of varying age and ethnicity, and equally represented

genders. For each person in the database we collected 7 video sequences of the

person in arbitrary motion (significant translation, yaw and pitch, negligible roll),

each in a different illumination setting, see Figure 9 (a) and 10, at l0 fps and 320 x

240 pixel resolution (face size ≈ 60 pixels) .

ToshFace kindly provided to us by Toshiba Corp. This database contains 60 individuals of varying age, mostly male Japanese, and 10 sequences per person. Each sequence

corresponds to a different illumination setting, at l0 fps and 320 x 240 pixel

resolution (face size ≈ 60 pixels), see Figure 9 (b).

Face Video freely available2 and described in [14]. Briefly, it contains 11 individuals and 2 sequences per person, little variation in illumination, but extreme and uncontrolled

Faces96 the most challenging subset of the University of Essex face database, freely available from http://cswww.essex.ac.uk/mv/allfaces/ faces96 .html. It contains 152

individuals, most 18-20 years old and a single 20-frame sequence per person in 196

x 196 pixel resolution (face size ≈ 80 pixels). The users were asked to approach the

camera while performing arbitrary head motion. Although the illumination was kept constant throughout each sequence, there is some variation in the manner in which faces were lit due to the change in the relative position of the user with respect to the lighting sources, see Figure 9 (d).For each database except Faces96, we trained our algorithm using a single sequence per person and tested against a single other sequence per person, acquired in a different session (for CamFace and ToshFace different sessions correspond to different illumination conditions). Since Faces96 database contains only a single sequence per person, we used the first frames 1-10 of each for training and frames 11-20 for test. Since each video sequence in this database corresponds to a person walking to the camera, this maximizes the variation in illumination, scale and pose between training and test, thus maximizing the recognition challenge.

 

Results

To establish baseline performance, we performed recognition with both MSM and CMSM using raw data first. A summary is shown in Table 3.1. As these results illustrate, the Cam- Face and ToshFace data sets were found to be very challenging, primarily due to extreme variations in illumination. The performance on Face Video and Faces96 databases was significantly better. This can be explained by noting that the first major source of appearance variation present in these sets, the scale, is normalized for in the data extraction stage; the remainder of the appearance variation is dominated by pose changes, to which MSM and CMSM are particularly robust.Next we evaluated the two methods with each of the 6 filter-based face representations.. Confirming the first premise of this work as well as previous findings, all of the filters produced an improvement in average recognition rates. Little interaction between method/filter combinations was found, Laplacian-of-Gaussian and the horizontal intensity derivative producing the best results and bringing the best and average recognition errors down to 12% and 9% respectively.

 

Conclusions

                 Here a novel framework for automatic face recognition in the presence of varying illumination, primarily applicable to matching face sets or sequences has been described. The framework is based on simple image processing filters that compete with nprocessed greyscale input to yield a single matching score between individuals. By performing all numerically consuming computation offline, our method both (i) retains the matching efficiency of simple image filters, but (ii) with a greatly increased robustness, as all online processing is performed in closed-form. Evaluated on a large, real-world data corpus, the proposed framework was shown to be successful in video-based recognition across a wide range of illumination, pose and face motion pattern change .

 

Want To Know more with

Video ???

Contact for more learning: webmaster@freehost7com

 

 

 

 

Home

The contents of this webpage are copyrighted  2008 www.freehost7.com
 All Rights Reserved.