mexca.video.extraction
Facial feature extraction from videos.
Module Contents
Classes
Custom torch dataset for a video file. |
|
Combine steps to extract features from faces in a video file. |
Functions
|
Command line interface for extracting facial features. |
- exception mexca.video.extraction.NotEnoughFacesError(msg: str)[source]
Less detected faces than num_faces.
Cannot perform clustering if samples are less than the number of clusters.
- Parameters:
msg (str) – Error message.
- class mexca.video.extraction.VideoDataset(video_file: str, skip_frames: int = 1, start: float = 0, end: float | None = None)[source]
Custom torch dataset for a video file.
Only reads the frame timestamps of the video but not the frames themselves when initialized. Decodes the video frame-by-frame.
- Parameters:
- video_pts
Timestamps of video frames.
- Type:
- video_frames_idx
Indices of video frames.
- Type:
- video_frames
Indices of loaded frames.
- Type:
- __getitem__(idx: int) Dict[str, torch.Tensor] [source]
Get an item from the data set.
Loads the video frame into memory.
- class mexca.video.extraction.FaceExtractor(num_faces: int | None, min_face_size: int = 20, thresholds: Tuple[float] = (0.6, 0.7, 0.7), factor: float = 0.709, post_process: bool = True, select_largest: bool = True, selection_method: str | None = 'num_faces', keep_all: bool = True, device: torch.device = torch.device(type='cpu'), clusterer: sklearn.base.ClusterMixin | None = None, embeddings_model: str = 'vggface2', post_min_face_size: Tuple[float, float] = (45.0, 45.0), au_model: str | None = None)[source]
Combine steps to extract features from faces in a video file.
- Parameters:
num_faces (int, optional) – Number of faces to identify. Must be other than None if clusterer=None.
min_face_size (int, default=20) – Minimum face size (in pixels) for face detection in MTCNN.
thresholds (tuple, default=(0.6, 0.7, 0.7)) – Thesholds for face detection in MTCNN.
factor (float, default=0.709) – Factor used to create a scaling pyramid of face sizes in MTCNN.
post_process (bool, default=True) – Whether detected faces are post processed before computing embeddings. The post processing standardizes the detected faces.
select_largest (bool, default=True) – Whether to return the largest face or the one with the highest probability if multiple faces are detected.
selection_method ({None, 'num_faces', 'probability', 'largest', 'largest_over_threshold', 'center_weighted_size'}, optional, default='num_faces') – The heuristic used for selecting detected faces. If not None, overrides select_largest. The default num_faces, selects a maximum of num_faces faces per frame.
keep_all (bool, default=True) – Whether all faces should be returned in the order of select_largest.
device (torch.device, optional, default=torch.device("cpu")) – The device on which face detection and embedding computations are performed.
clusterer (sklearn.base.ClusterMixin, optional, default=None) – Class instance from
sklearn.cluster
used for clustering face embeddings. If None (default), creates asklearn.cluster.SpectralClustering
instance with n_clusters=num_faces. For large datasets,sklearn.cluster.KMeans
is recommended to avoid memory issues.embeddings_model ({'vggface2', 'casia-webface'}, default='vggface2') – Pretrained Inception Resnet V1 model for computing face embeddings.
post_min_face_size (tuple, default=(45.0, 45.0)) – Minimal width and height (in pixels) for filtering out faces after detection. This can be useful to exclude small faces before clustering their embeddings and can improve clustering performance.
au_model (str, optional, default=None) – Pretrained MEFARG model on Hugging Face Hub for extraction facial action unit activations. If None, uses the default model mexca/mefarg-open-graph-au-resnet50-stage-2.
- property detector: facenet_pytorch.MTCNN[source]
The MTCNN model for face detection and extraction. See facenet-pytorch for details.
- property encoder: facenet_pytorch.InceptionResnetV1[source]
The ResnetV1 model for computing face embeddings. See facenet-pytorch for details.
- property extractor: mexca.video.mefarg.MEFARG[source]
The MEFARG model for extracting action unit activations. See ME-GraphAU model and paper for details.
- __call__(**callargs) mexca.data.VideoAnnotation [source]
Alias for apply.
- detect(frame: numpy.ndarray | torch.Tensor) Tuple[List[torch.Tensor], List[numpy.ndarray] | numpy.ndarray, List[numpy.ndarray] | numpy.ndarray, List[numpy.ndarray] | numpy.ndarray] [source]
Detect faces in a video frame.
- Parameters:
frame (numpy.ndarray or torch.Tensor) – Batch of B frames containing RGB values with dimensions (B, W, H, 3).
- Returns:
faces (list) – Batch of B tensors containing the N cropped face images from each batched frame with dimensions (N, 3, 160, 160). Is None if a frame contains no faces.
boxes (numpy.ndarray or list) – Batch of B bounding boxes of the N detected faces as (x1, y1, x2, y2) coordinates with dimensions (B, N, 4). Returns a list if different numbers of faces are detected across batched frames. Is None if a frame contains no faces.
probs (numpy.ndarray or list) – Probabilities of the detected faces (B, N). Returns a list if different numbers of faces are detected across batched frames. Is None if a frame contains no faces.
landmarks (numpy.ndarray or list) – Batch of B facial landmarks for N detected faces as (x, y) coordinates with dimensions (5, 2). Is None if a frame contains no faces.
- encode(faces: torch.Tensor) numpy.ndarray [source]
Compute embeddings for face images.
- Parameters:
faces (torch.Tensor) – Cropped N face images from a video frame with dimensions (N, 3, H, W). H and W must at least be 80 for the encoding to work.
- Returns:
Embeddings of the N face images with dimensions (N, 512).
- Return type:
- identify(embeddings: numpy.ndarray) numpy.ndarray [source]
Cluster faces based on their embeddings.
- Parameters:
embeddings (numpy.ndarray) – Embeddings of the N face images with dimensions (N, E) where E is the length of the embedding vector.
- Returns:
Cluster indices for the N face embeddings.
- Return type:
- extract(frame: numpy.ndarray | torch.Tensor) List[numpy.ndarray] | numpy.ndarray [source]
Detect facial action units activations.
- Parameters:
frame (numpy.ndarray or torch.Tensor) – Batch of B frames containing RGB values with dimensions (B, H, W, 3).
- Returns:
aus – Batch of B action unit activations for N detected faces with dimensions (N, 41). Is None if a frame contains no faces.
- Return type:
- compute_avg_embeddings(embeddings: numpy.ndarray, labels: numpy.ndarray) dict [source]
Computes average embedding vector for each face detected in the video.
- Parameters:
embeddings (numpy.ndarray) – Face embeddings.
labels (numpy.ndarray) – Face labels.
- Returns:
average embedding dictionary – Dictionary with keys representing face labels and values representing the average embedding vector for each face label.
- Return type:
- compute_confidence(embeddings: numpy.ndarray, labels: numpy.ndarray) numpy.ndarray [source]
Compute face label classification confidence.
- Parameters:
embeddings (numpy.ndarray) – Face embeddings.
labels (numpy.ndarray) – Face labels.
- Returns:
confidence – Confidence scores between 0 and 1. Returns numpy.nan if no label was assigned to a face.
- Return type:
- apply(filepath: str, batch_size: int = 1, skip_frames: int = 1, process_subclip: Tuple[float | None] = (0, None), cluster_embeddings: bool = True, return_embeddings: bool = False, show_progress: bool = True) mexca.data.VideoAnnotation [source]
Apply multiple steps to extract features from faces in a video file.
This method subsequently calls other methods for each frame of a video file to detect and cluster faces. It also extracts facial landmarks and action units.
- Parameters:
filepath (str) – Path to the video file.
batch_size (int, default=1) – Size of the batch of video frames that are loaded and processed at the same time.
skip_frames (int, default=1) – Only process every nth frame, starting at 0.
process_subclip (tuple, default=(0, None)) – Process only a part of the video clip. Must be the start and end of the subclip in seconds.
cluster_embeddings (bool, default=True) – Cluster embeddings using spectral clustering.
return_embeddings (bool, default=False) – Return embedding vectors for each detected face.
show_progress (bool, default=True) – Enables the display of a progress bar.
- Returns:
A data class object with extracted facial features.
- Return type: