mexca.video
Facial feature extraction from videos.
Module Contents
Classes
Custom torch dataset for a video file. |
|
Combine steps to extract features from faces in a video file. |
Functions
|
Command line interface for extracting facial features. |
Attributes
Value that is returned if no faces are detected in a video frame. |
- exception mexca.video.NotEnoughFacesError(msg: str)[source]
Bases:
ExceptionLess detected faces than num_faces.
Cannot perform clustering if samples are less than the number of clusters.
- Parameters:
msg (str) – Error message.
- class mexca.video.VideoDataset(video_file: str, skip_frames: int = 1, start: float = 0, end: Optional[float] = None)[source]
Bases:
torch.utils.data.DatasetCustom torch dataset for a video file.
Only reads the frame timestamps of the video but not the frames themselves when initialized. Decodes the video frame-by-frame.
- Parameters:
- video_pts
Timestamps of video frames.
- Type:
- video_frames_idx
Indices of video frames.
- Type:
- video_frames
Indices of loaded frames.
- Type:
- __getitem__(idx: int) Dict[str, torch.Tensor][source]
Get an item from the data set.
Loads the video frame into memory.
- class mexca.video.FaceExtractor(num_faces: Optional[int], min_face_size: int = 20, thresholds: Tuple[float] = (0.6, 0.7, 0.7), factor: float = 0.709, post_process: bool = True, select_largest: bool = True, selection_method: Optional[str] = None, keep_all: bool = True, device: Optional[torch.device] = None, max_cluster_frames: Optional[int] = None, embeddings_model: str = 'vggface2', au_model: str = 'xgb', landmark_model: str = 'mobilefacenet')[source]
Combine steps to extract features from faces in a video file.
- Parameters:
num_faces (int, optional) – Number of faces to identify.
min_face_size (int, default=20) – Minimum size required for detected faces (in pixels).
thresholds (tuple, default=(0.6, 0.7, 0.7)) – Face detection thesholds.
factor (float, default=0.709) – Factor used to create a scaling pyramid of face sizes.
post_process (bool, default=True) – Whether detected faces are post processed before computing embeddings.
select_largest (bool, default=True) – Whether to return the largest face or the one with the highest probability if multiple faces are detected.
selection_method ({None, 'probability', 'largest', 'largest_over_threshold', 'center_weighted_size'}, optional, default=None) – The heuristic used for selecting detected faces. If not None, overrides select_largest.
keep_all (bool, default=True) – Whether all faces should be returned in the order of select_largest.
device (torch.device, optional, default=None) – The device on which face detection and embedding computations are performed.
max_cluster_frames (int, optional, default=None) – Maximum number of frames that are used for spectral clustering. If the number of frames exceeds the maximum, hierarchical clustering is applied first to reduce the frames to this number. This can reduce the computational costs for long videos.
embeddings_model ({'vggface2', 'casia-webface'}, default='vggface2') – Pretrained Inception Resnet V1 model for computing face embeddings.
au_model ({'xgb', 'svm'}, default='xgb') – Pretrained model for predicting facial action unit activations.
landmark_model ({'mobilefacenet', 'mobilenet', 'pfld'}, default='mobilefacenet') – Pretrained model for detecting facial landmarks.
Notes
For details on the available pretrained models for facial action unit and landmark detection, see the documentation of py-feat. The pretrained action unit models return different outputs: ‘xgb’ returns continous values (0-1), whereas ‘svm’ returns binary (0, 1) values.
- property detector: facenet_pytorch.MTCNN[source]
The MTCNN model for face detection and extraction. See facenet-pytorch for details.
- property encoder: facenet_pytorch.InceptionResnetV1[source]
The ResnetV1 model for computing face embeddings. See facenet-pytorch for details.
- property clusterer: spectralcluster.SpectralClusterer[source]
The spectral clustering model for identifying faces based on embeddings. See spectralcluster for details.
- property extractor: feat.detector.Detector[source]
The model for extracting facial landmarks and action units. See py-feat for details.
- __call__(**callargs) mexca.data.VideoAnnotation[source]
Alias for apply.
- detect(frame: Union[numpy.ndarray, torch.Tensor]) Tuple[List[torch.Tensor], Union[List[numpy.ndarray], numpy.ndarray], Union[List[numpy.ndarray], numpy.ndarray]][source]
Detect faces in a video frame.
- Parameters:
frame (numpy.ndarray or torch.Tensor) – Batch of B frames containing RGB values with dimensions (B, W, H, 3).
- Returns:
faces (list) – Batch of B tensors containing the N cropped face images from each batched frame with dimensions (N, 3, 160, 160). Is None if a frame contains no faces.
boxes (numpy.ndarray or list) – Batch of B bounding boxes of the N detected faces as (x1, y1, x2, y2) coordinates with dimensions (B, N, 4). Returns a list if different numbers of faces are detected across batched frames. Is None if a frame contains no faces.
probs (numpy.ndarray or list) – Probabilities of the detected faces (B, N). Returns a list if different numbers of faces are detected across batched frames. Is None if a frame contains no faces.
- encode(faces: torch.Tensor) numpy.ndarray[source]
Compute embeddings for face images.
- Parameters:
faces (torch.Tensor) – Cropped N face images from a video frame with dimensions (N, 3, H, W). H and W must at least be 80 for the encoding to work.
- Returns:
Embeddings of the N face images with dimensions (N, 512).
- Return type:
- identify(embeddings: numpy.ndarray) numpy.ndarray[source]
Cluster faces based on their embeddings.
- Parameters:
embeddings (numpy.ndarray) – Embeddings of the N face images with dimensions (N, E) where E is the length of the embedding vector.
- Returns:
Cluster indices for the N face embeddings.
- Return type:
- extract(frame: Union[numpy.ndarray, torch.Tensor], boxes: Union[List[numpy.ndarray], numpy.ndarray]) Tuple[List[List[numpy.ndarray]], List[numpy.ndarray]][source]
Detect facial action units and landmarks.
- Parameters:
frame (numpy.ndarray or torch.Tensor) – Batch of B frames containing RGB values with dimensions (B, H, W, 3).
boxes (numpy.ndarray or list) – Batch of B bounding boxes of the N detected faces as (x1, y1, x2, y2) coordinates with dimensions (B, N, 4) or list of B elements with (N, 4).
- Returns:
landmarks (list) – Batch of B facial landmarks for N detected faces as (x, y) coordinates with dimensions (68, 2). Is None if a frame contains no faces.
aus (list) – Batch of B action unit activations for N detected faces with dimensions (N, 20). Is None if a frame contains no faces.
- compute_confidence(embeddings: numpy.ndarray, labels: numpy.ndarray) numpy.ndarray[source]
Compute face label classification confidence.
- Parameters:
embeddings (numpy.ndarray) – Face embeddings.
labels (numpy.ndarray) – Face labels.
- Returns:
confidence – Confidence scores between 0 and 1. Returns numpy.nan if no label was assigned to a face.
- Return type:
- apply(filepath: str, batch_size: int = 1, skip_frames: int = 1, process_subclip: Tuple[Optional[float]] = (0, None), show_progress: bool = True) mexca.data.VideoAnnotation[source]
Apply multiple steps to extract features from faces in a video file.
This method subsequently calls other methods for each frame of a video file to detect and cluster faces. It also extracts facial landmarks and action units.
- Parameters:
filepath (str) – Path to the video file.
batch_size (int, default=1) – Size of the batch of video frames that are loaded and processed at the same time.
skip_frames (int, default=1) – Only process every nth frame, starting at 0.
process_subclip (tuple, default=(0, None)) – Process only a part of the video clip. Must be the start and end of the subclip in seconds.
show_progress (bool, default=True) – Enables the display of a progress bar.
- Returns:
A data class object with extracted facial features.
- Return type: