mexca.video

Facial feature extraction from videos.

Module Contents

Classes

VideoDataset

Custom torch dataset for a video file.

FaceExtractor

Combine steps to extract features from faces in a video file.

Functions

cli()

Command line interface for extracting facial features.

Attributes

EMPTY_VALUE

Value that is returned if no faces are detected in a video frame.

mexca.video.EMPTY_VALUE[source]

Value that is returned if no faces are detected in a video frame.

class mexca.video.VideoDataset(video_file: str, skip_frames: int = 1, start: float = 0, end: Optional[float] = None)[source]

Bases: torch.utils.data.Dataset

Custom torch dataset for a video file.

Parameters
  • video_file (str) – Path to the video file.

  • skip_frames (int, default=1) – Only load every nth frame.

  • start (float, default=0) – Start of the subclip of the video to be loaded (in seconds).

  • end (float, optional, default=None) – End of the subclip of the video to be loaded (in seconds).

file_name

Name of the video file.

Type

str

video

Loaded video frames.

Type

torch.Tensor

video_fps

Frames per second.

Type

int

video_frames

Indices of loaded frames.

Type

numpy.ndarray

property duration: float[source]

Duration of the video (read-only).

__len__() int[source]

Number of loaded video frames.

__getitem__(idx: int) Dict[str, torch.Tensor][source]

Get an item from the data set.

Parameters

idx (int) – Index of the item in the dataset.

Returns

Dictionary with ‘Image’ containing the video frame (T, H, W, C) and ‘Frame’ containing the frame index.

Return type

dict

class mexca.video.FaceExtractor(num_faces: Optional[int], min_face_size: int = 20, thresholds: Tuple[float] = (0.6, 0.7, 0.7), factor: float = 0.709, post_process: bool = True, select_largest: bool = True, selection_method: Optional[str] = None, keep_all: bool = True, device: Optional[torch.device] = None, embeddings_model: str = 'vggface2', au_model: str = 'xgb', landmark_model: str = 'mobilefacenet')[source]

Combine steps to extract features from faces in a video file.

Parameters
  • num_faces (int, optional) – Number of faces to identify.

  • min_face_size (int, default=20) – Minimum size required for detected faces (in pixels).

  • thresholds (tuple, default=(0.6, 0.7, 0.7)) – Face detection thesholds.

  • factor (float, default=0.709) – Factor used to create a scaling pyramid of face sizes.

  • post_process (bool, default=True) – Whether detected faces are post processed before computing embeddings.

  • select_largest (bool, default=True) – Whether to return the largest face or the one with the highest probability if multiple faces are detected.

  • selection_method ({None, 'probability', 'largest', 'largest_over_threshold', 'center_weighted_size'}, optional, default=None) – The heuristic used for selecting detected faces. If not None, overrides select_largest.

  • keep_all (bool, default=True) – Whether all faces should be returned in the order of select_largest.

  • device (torch.device, optional, default=None) – The device on which face detection and embedding computations are performed.

  • embeddings_model ({'vggface2', 'casia-webface'}, default='vggface2') – Pretrained Inception Resnet V1 model for computing face embeddings.

  • au_model ({'xgb', 'svm'}, default='xgb') – Pretrained model for predicting facial action unit activations.

  • landmark_model ({'mobilefacenet', 'mobilenet', 'pfld'}, default='mobilefacenet') – Pretrained model for detecting facial landmarks.

Notes

For details on the available pretrained models for facial action unit and landmark detection, see the documentation of py-feat. The pretrained action unit models return different outputs: ‘xgb’ returns continous values (0-1), whereas ‘svm’ returns binary (0, 1) values.

property detector: facenet_pytorch.MTCNN[source]

The MTCNN model for face detection and extraction. See facenet-pytorch for details.

property encoder: facenet_pytorch.InceptionResnetV1[source]

The ResnetV1 model for computing face embeddings. See facenet-pytorch for details.

property clusterer: spectralcluster.SpectralClusterer[source]

The spectral clustering model for identifying faces based on embeddings. See spectralcluster for details.

property extractor: feat.detector.Detector[source]

The model for extracting facial landmarks and action units. See py-feat for details.

__call__(**callargs) mexca.data.VideoAnnotation[source]

Alias for apply.

detect(frame: Union[numpy.ndarray, torch.Tensor]) Tuple[List[torch.Tensor], Union[List[numpy.ndarray], numpy.ndarray], Union[List[numpy.ndarray], numpy.ndarray]][source]

Detect faces in a video frame.

Parameters

frame (numpy.ndarray or torch.Tensor) – Batch of B frames containing RGB values with dimensions (B, W, H, 3).

Returns

  • faces (list) – Batch of B tensors containing the N cropped face images from each batched frame with dimensions (N, 3, 160, 160). Is None if a frame contains no faces.

  • boxes (numpy.ndarray or list) – Batch of B bounding boxes of the N detected faces as (x1, y1, x2, y2) coordinates with dimensions (B, N, 4). Returns a list if different numbers of faces are detected across batched frames. Is None if a frame contains no faces.

  • probs (numpy.ndarray or list) – Probabilities of the detected faces (B, N). Returns a list if different numbers of faces are detected across batched frames. Is None if a frame contains no faces.

encode(faces: torch.Tensor) numpy.ndarray[source]

Compute embeddings for face images.

Parameters

faces (torch.Tensor) – Cropped N face images from a video frame with dimensions (N, 3, H, W). H and W must at least be 80 for the encoding to work.

Returns

Embeddings of the N face images with dimensions (N, 512).

Return type

numpy.ndarray

identify(embeddings: numpy.ndarray) numpy.ndarray[source]

Cluster faces based on their embeddings.

Parameters

embeddings (numpy.ndarray) – Embeddings of the N face images with dimensions (N, E) where E is the length of the embedding vector.

Returns

Cluster indices for the N face embeddings.

Return type

numpy.ndarray

extract(frame: Union[numpy.ndarray, torch.Tensor], boxes: Union[List[numpy.ndarray], numpy.ndarray]) Tuple[List[List[numpy.ndarray]], List[numpy.ndarray]][source]

Detect facial action units and landmarks.

Parameters
  • frame (numpy.ndarray or torch.Tensor) – Batch of B frames containing RGB values with dimensions (B, H, W, 3).

  • boxes (numpy.ndarray or list) – Batch of B bounding boxes of the N detected faces as (x1, y1, x2, y2) coordinates with dimensions (B, N, 4) or list of B elements with (N, 4).

Returns

  • landmarks (list) – Batch of B facial landmarks for N detected faces as (x, y) coordinates with dimensions (68, 2). Is None if a frame contains no faces.

  • aus (list) – Batch of B action unit activations for N detected faces with dimensions (N, 20). Is None if a frame contains no faces.

compute_confidence(embeddings: numpy.ndarray, labels: numpy.ndarray) numpy.ndarray[source]

Compute face label classification confidence.

Parameters
Returns

confidence – Confidence scores between 0 and 1. Returns numpy.nan if no label was assigned to a face.

Return type

numpy.ndarray

apply(filepath: str, batch_size: int = 1, skip_frames: int = 1, process_subclip: Tuple[Optional[float]] = (0, None), show_progress: bool = True) mexca.data.VideoAnnotation[source]

Apply multiple steps to extract features from faces in a video file.

This method subsequently calls other methods for each frame of a video file to detect and cluster faces. It also extracts facial landmarks and action units.

Parameters
  • filepath (str) – Path to the video file.

  • batch_size (int, default=1) – Size of the batch of video frames that are loaded and processed at the same time.

  • skip_frames (int, default=1) – Only process every nth frame, starting at 0.

  • process_subclip (tuple, default=(0, None)) – Process only a part of the video clip. Must be the start and end of the subclip in seconds.

  • show_progress (bool, default=True) – Enables the display of a progress bar.

Returns

A data class object with extracted facial features.

Return type

VideoAnnotation

mexca.video.cli()[source]

Command line interface for extracting facial features. See extract-faces -h for details.