Abstract

Our approach is to take in multiple images of the same group and output a morphed image containing the best faces from across the input photos. To do so, we score for eyes open and quality of smile. We then intend on transforming and replacing those faces in the highest scoring initial image, returning that.

As of now, we're able to detect faces and smiles. Our work on eye detection is also significantly underway, with some more work on the accuracy to be done. Below is an example of the detection at work.

Teaser Figure

The first two group pictures represent group photos with flaws, and the third shows an ideal composite of the positive aspects of both group pictures. In the resulting image, both individuals are smiling at the camera with their eyes open.

First Group Picture

Second Group Picture

Fixed Group Picture (Result)

Introduction

Motivation

We’ve all faced the struggle of coordinating a group photo: somehow, someone’s always blinking, or looking at their phone, or talking. Usually, we take more than a few photos and hope for the best— one’s gotta do the trick. Sometimes, none of the pictures turn out well; maybe someone closed their eyes once, maybe someone didn't smile, etc. We want to resolve this issue by being able to combine multiple elements (faces) from photos to produce the best final image, which people can use for their own goals.

Application

Given that you have multiple photos, and parts of those photos are good, we want to be able to create the best photo, removing the need for a compromise. Why choose a subpar image when you can have the ideal? Implemented, this system would take several photos in (some/all of which may include “flaws”), cherry-pick the best parts, and combine them to form the overall best photo. For example, if the best input photo had one person frowning, then we would replace that person’s face with one of the other inputs in which that person was smiling, effectively returning an image with the prime versions of each input image.

Novelty

This concept has been worked on by a number of academics and industry researchers. An example is Microsoft Group Shot, which has its roots in 2003 paper "Image Stacks" and 2004 paper Interactive Digital Photomontage. They appear to primarily use graph-cut optimization to find the best seams and then work on image alignment to come up with the best composite. We're looking specifically for faces, not other features like lighting, depth of field, or unwanted extras. additionally, we do more analysis on the actual quality of the facial expression.

Face, Eye, and Smile Detection | Testing & Evaluation

Approach

We intend to train classification models to recognize the common positive and negative elements in photos, which should inform our knowledge of “good” and “bad” segments of our input images. (Before this, we’ll have to identify faces, for which there exist pre-trained models/algorithms.) With this as a basis, we can detect which image is already the best (has the fewest negative elements relative to other photos), pick a better version of the weaker segments, and restitch them into our base image. This can be done with homography transformations, using edge and feature detection to find notable common points in the images and create a homography matrix.

Experiments + Results

Our setup is as follows: We take a series of input images. From those images, we want to count the number of faces present (per image), and then find the face with the optimal eye and smile score. To "score" these faces, we find the face with the most open eyes. For smile detection, we locate the mouth and score for optimal curvature of the mouth. We currently have working code for face detection, smile detection, eye detection.

Taking Input

In order to correlate faces with each other, we have created a python script called main.py which takes in a directory with a set of group pictures as well as the number of faces that are present within the picture. After running, the script outputs an image titled best.jpg into the directory.

Face Detection

As mentioned in the proposal, a number of resources exist to aid in Face Detection. We used opencv's trained haar cascade detector, which is a pre-trained object detector. It can be found here. This pre-trained Face Detection model has issues with picking up the correct number of faces across different sets of group images with constant parameters. Basically, there were parameter numbers that would work better to detect faces in each different image but a constant number for MinNeighbors didn't work across all images (shown in Eye Detection & Evaluation section). For example, the parameter MinNeighbors = 6 for the Haar Cascade detector would produce 2 detected faces for one group photo but 5 for another photo.

In order to mitigate this, we have added a parameter in our main.py file that requires the number of faces in the picture. We then loop through a set of numbers for the value of the parameter for MinNeighbors (4-30) in the face detection's classifier prediction. Once the correct number of faces has been detected after trying all possible MinNeighbor parameter inputs, we then run smile and eye detection to score the faces for replacement.

Smile Detection & Evaluation

In order to detect smiles more reliably, we found that it helped greatly to detect the mouth region and perform smile detection on that region rather than simply detecting a smile on a face. For this, we used a haar cascade to detect mouths once a face was found using the previous face detection code.

In order to determine if an individual is smiling or not, we found based on prior research that it helps to detect the curvature in the mouth by identifying corners within the image. To accomplish this, we used the Harris Corner Detection algorithm, which used eigenvalues from the difference in intensities in order to determine if a region contains a corner or edge, or neither. We used the OpenCV implementation of the Harris algorithm, allowing us to easyily detect corners in an image.

Next, once we found the corners in the mouth region, we fitted a polynomial of degree 2 to the points, essentially determining the curvature of the corners, which often corresponded to the mouth region. This gave us three values for the polynomial fit, each value corresponding to the respective coefficient of the polynomial (Ax^2 + Bx + C -> [A, B, C]). Then, to determine if these coefficient represented a smile, we fit a model to a dataset of images of which some had smiles, and some didn't. Each image was reduced to its polynomial representation for a mouth, and those coefficients were the features used with K Nearest Neighbors to determine if a new face contained a smile or not. This dataset was built using 603 non-smile images and 600 smile-images of various individuals.

Below are examples of images where the subject has a smile in one and not a smile in the other. The polynomial fit and detected smile value is shown on the image.

Original Image

Detected Face from Image

Polynomial Fit and Smile Detection on Mouth Region

Original Image

Detected Face from Image

Polynomial Fit and Smile Detection on Mouth Region

Eye Detection & Evaluation

We started by using these two haar classifiers, also from opencv: eyes and eyes+eyeglasses. This was able to detect with varying accuracy eyes that were reasonably wide open (hard quints went undetected, as well as closed eyes-- something we wanted to detect.)

The results below from the Mid-Project Update show the difficulty the classifier had in detecting all eyes and faces correctly with a constant set of input parameters to the predict() function. As such, we decided to use the same Haar classifier for faces rather than build our own model; instead, we iterate through all the possible parameter inputs to the classifier prediction function and then pick the parameters that most closely represent the number of faces in the image (discussed in face detection section).

With regards to specifically the eye detection, we have used the haardcascase_eye classifier which works best on fully open eyes. This ends up working for our use case because we would prefer to score faces higher whenever their eyes are clearly open and easily detected.

These modifications have resulted in some great results for face replacement which are discussed in the Restitching the Composite section. We have also shown examples of less than optimal results.

Face/eye detection using the eyes classifier

Face/eye detection using the eyes_eyeglasses classifier

Face/eye detection using the eyes classifier

Face/eye detection using the eyes_eyeglasses classifier


Face/eye detection using the eyes classifier

Face/eye detection using the eyes_eyeglasses classifier

Face/eye detection using the eyes classifier

Face/eye detection using the eyes_eyeglasses classifier

Face Scoring

For scoring, we detected whether or not the face has a smile or not and then normalize it by the number of total smiles in the image. This is to see how 'positive' of a smile the individual has in the picture. The smile score contribution will lie in [0,1]. Then, we detect the number of eyes that are open in the face. We multiply this number by 0.75 to get a value in the range of [0,1.5]. This is so that a face with two open eyes will have more importance than the smile because closed eyes in an image are less ideal than no smile.

Restiching the Composite

Factors/constraints assumed in the creation of our solution:
  1. The members in each group photo will remain the same, with no new members added or removed
  2. No single image will be an optimal image, that is, not everybody will be smiling or have their eyes open for each image (This allows room for improvement when stitching the new image)
  3. Each member of the group will have one photo where they have an optimal face
    1. This can be modified later to combine positive features from different faces if the solution is trivial given an optimal face for each person
  4. Other factors in the picture (etc. poses, ordering of members, setting, lighting) will all remain constant
    1. Once again this may be modified to take photos in different lightings if the solution is trivial otherwise
When calculating the optimality of each face, we can add more factors if we decide there are additional factors apart from smiles and open eyes that contribute to an optimal face. Because group photo optimality is largely a qualitative goal, measuring success will also be mostly qualitative. When assessing each stitched image, we will determine whether the stitching was completed effectively based on these guidelines:
  1. The image contains the optimal face for each person from the collection of images
  2. Each face replaces the old one (if necessary) properly without translational errors
  3. The area around the replaced image has minimal or no aberrations (discontinuities, the old face sticking out, colors accurate)


Qualitative Results

Below are some examples of the original images and the final composite image where there was an optimal results:

First Group Picture

Second Group Picture

Third Group Picture

Fourth Group Picture

Fifth Group Picture

Final Optimal Picture

Above, we can see that the two faces on the right are replaced while the one on the right was not replaced with a smiling one. I assume that this means our scoring is working as intended; the two open eyes (as opposed to squinted eyes) outweighed a detected smile.

First Group Picture

Second Group Picture

Third Group Picture

Final Optimal Picture

Above, we see basically perfect face replacement. Although it doesn't seem like any faces were replaced at first, there is a face in the front row and four in the middle row that were replaced with a marginally better face subsection.

First Group Picture

Second Group Picture

Third Group Picture

Fourth Group Picture

Fifth Group Picture

Final Optimal Picture

Here, we see that the face on the left and right were replaced with slightly better photos due to lack of smiles in different photos. Due to orientation of the face and body, there is an issue of the faces slightly mismatching their boundaries but the function of our program was achieved in replacing the less optimal face with a better one.

First Group Picture

Second Group Picture

Third Group Picture

Final "Optimal" Picture - Didn't work very well

Above is an example of our model working incorrectly but not in the face replacement which works well when faces are detected properly. Here we see that then number of faces are incorrectly detected across the group pictures meaning that different faces are mismatched and replaced with faces that are not theirs. Above, two faces are placed in incorrect positions.



Generally, we correctly are able to score and replace faces whenever the face classifier parameter tuning algorithm provides us with optimal detected faces. This is seen in the first 3 sets of images. However, the last set of images shows what happens when the tuning doesn't result in proper face detection.

Conclusion

Through this process, we see success in some cases and potential points of improvements in others. The current implementation is able to score photos based on the smiles and eyes, and restitch and image with the best photos.

We currently take in a directory of multiple images and the number of faces, outputting a "best" image with the best faces from across the input photos. We modified our previous work to try re-detecting faces multiple times until the correct (or as close to as possible) number of faces are detected. At the midpoint, we set our goals on adding functionality to identify the best faces (effective scoring) and replacing them in the best image. We were able to do this with reasonable success (as demonstrated in qualitative results). We wanted to evaluate the restiching by observing abberations, translational errors, and optimal face. There are definitely instances in which not all of these are satisfied, but at large, it works. To improve upon these would take additional morphing and devising a system to compare facial features, which we were not able to get to in this project.

Future Work

In the future, we'd like to be able to compare faces in photos to each other by the actual similarity of the face's features as opposed to where in the image it was located. Right now, we require there are no pose changes across photos, and that all the same faces are detected across the images. Whenever that falters (sometimes faces go undetected in certain images but not others) the faces get switched up. This can be seen in the last example above.
In terms of specifics in improving our approach and refining our process one improvement to make is improving the restitching algorithm. Right now the visibility of the squares from restitching is obvious to an extent when one zooms into the picture. In certain cases this is true regardless of zooming in. As stated before, currently the restitching process involves the faces location rather than comparing features. This leads to irregularities in face shape. This can be fixed by possibly using edge detection to determine where the heads and neck line up rather than simply using location to replace the square. We could also use this method to get rid of the excess information outside the face edge and lose the out of place square shape. Another issue in restitching comes up when the incorrect face was placed on a different person. This can be fixed by comparing the restitching location to the face to replace with using some of variance or distance method.

Resources

Datasets

  • Smile Detection from Face Images
    This dataset contains labels describing whether they are smiles or not based on the faces from the dataset below.
  • Labeled Faces in the Wild
    This dataset contains images that correspond to the labels from the dataset above.
  • Closed Eyes in the Wild
    This dataset contains images of faces where eyes are closed or open. Additionally, it contains images of open and closed eyes for training purposes.
  • Existing Code

  • Python OpenCV Face Detection
    This code uses OpenCV to detect faces in Python. It outlines a method using a Cascade Classifier and the parameters to be used to detect faces most effectively.
  • Using OpenCV for eye detection
    This code uses OpenCV for face and eye detection. Used it as a framework for eye detection and then modified to make it more accurate and score.
  • Referenced Papers

  • Image Stacks
  • Interactive Digital Photomontage
  • Smile Detection Using Hybrid Face Representation
    This paper describes methods of detecting smiles by segmenting faces and using SVMs
  • Smile Identification via Feature Recognition and Corner Detection
    This paper describes the process of detecting optimal smiles by detecting corners on the mouth region of a face and fitting a polynomial to detect mouth curvature.
  • Code

    Github Repo

    Demo

    Video coming later.