mexca.video.extraction
======================

.. py:module:: mexca.video.extraction

.. autoapi-nested-parse::

   Facial feature extraction from videos.


Exceptions
----------

.. autoapisummary::

   mexca.video.extraction.NotEnoughFacesError


Classes
-------

.. autoapisummary::

   mexca.video.extraction.VideoDataset
   mexca.video.extraction.FaceExtractor


Functions
---------

.. autoapisummary::

   mexca.video.extraction.cli


Module Contents
---------------

.. py:exception:: NotEnoughFacesError(msg: str)


   Less detected faces than `num_faces`.

   Cannot perform clustering if samples are less than the number of clusters.

   :param msg: Error message.
   :type msg: str


.. py:class:: VideoDataset(video_file: str, skip_frames: int = 1, start: float = 0, end: Optional[float] = None)


   Custom torch dataset for a video file.

   Only reads the frame timestamps of the video but not the frames themselves when initialized.
   Decodes the video frame-by-frame.

   :param video_file: Path to the video file.
   :type video_file: str
   :param skip_frames: Only load every nth frame.
   :type skip_frames: int, default=1
   :param start: Start of the subclip of the video to be loaded (in seconds).
   :type start: float, default=0
   :param end: End of the subclip of the video to be loaded (in seconds).
   :type end: float, optional, default=None

   .. attribute:: file_name

      Name of the video file.

      :type: str

   .. attribute:: video_pts

      Timestamps of video frames.

      :type: torch.Tensor

   .. attribute:: video_frames_idx

      Indices of video frames.

      :type: torch.Tensor

   .. attribute:: video_fps

      Frames per second.

      :type: int

   .. attribute:: video_frames

      Indices of loaded frames.

      :type: numpy.ndarray


   .. py:property:: duration
      :type: float


      Duration of the video (read-only).


   .. py:method:: __len__() -> int

      Number of video frames.


   .. py:method:: __getitem__(idx: int) -> Dict[str, torch.Tensor]

      Get an item from the data set.

      Loads the video frame into memory.

      :param idx: Index of the item in the dataset.
      :type idx: int

      :returns: Dictionary with 'Image' containing the video frame (T, H, W, C)
                and 'Frame' containing the frame index.
      :rtype: dict


.. py:class:: FaceExtractor(num_faces: Optional[int], min_face_size: int = 20, thresholds: Tuple[float] = (0.6, 0.7, 0.7), factor: float = 0.709, post_process: bool = True, select_largest: bool = True, selection_method: Optional[str] = 'num_faces', keep_all: bool = True, device: torch.device = torch.device(type='cpu'), clusterer: Optional[sklearn.base.ClusterMixin] = None, embeddings_model: str = 'vggface2', post_min_face_size: Tuple[float, float] = (45.0, 45.0), au_model: Optional[str] = None)

   Combine steps to extract features from faces in a video file.

   :param num_faces: Number of faces to identify. Must be other than `None` if `clusterer=None`.
   :type num_faces: int, optional
   :param min_face_size: Minimum face size (in pixels) for face detection in MTCNN.
   :type min_face_size: int, default=20
   :param thresholds: Thesholds for face detection in MTCNN.
   :type thresholds: tuple, default=(0.6, 0.7, 0.7)
   :param factor: Factor used to create a scaling pyramid of face sizes in MTCNN.
   :type factor: float, default=0.709
   :param post_process: Whether detected faces are post processed before computing embeddings.
                        The post processing standardizes the detected faces.
   :type post_process: bool, default=True
   :param select_largest: Whether to return the largest face or the one with the highest probability
                          if multiple faces are detected.
   :type select_largest: bool, default=True
   :param selection_method: The heuristic used for selecting detected faces. If not `None`, overrides `select_largest`.
                            The default `num_faces`, selects a maximum of `num_faces` faces per frame.
   :type selection_method: {None, 'num_faces', 'probability', 'largest', 'largest_over_threshold', 'center_weighted_size'}, optional, default='num_faces'
   :param keep_all: Whether all faces should be returned in the order of `select_largest`.
   :type keep_all: bool, default=True
   :param device: The device on which face detection and embedding computations are performed.
   :type device: torch.device, optional, default=torch.device("cpu")
   :param clusterer: Class instance from :class:`sklearn.cluster` used for clustering face embeddings.
                     If `None` (default), creates a :class:`sklearn.cluster.SpectralClustering` instance with `n_clusters=num_faces`.
                     For large datasets, :class:`sklearn.cluster.KMeans` is recommended to avoid memory issues.
   :type clusterer: sklearn.base.ClusterMixin, optional, default=None
   :param embeddings_model: Pretrained Inception Resnet V1 model for computing face embeddings.
   :type embeddings_model: {'vggface2', 'casia-webface'}, default='vggface2'
   :param post_min_face_size: Minimal width and height (in pixels) for filtering out faces after detection.
                              This can be useful to exclude small faces before clustering their embeddings
                              and can improve clustering performance.
   :type post_min_face_size: tuple, default=(45.0, 45.0)
   :param au_model: Pretrained MEFARG model on Hugging Face Hub for extraction facial action unit activations. If `None`, uses the default model
                    `mexca/mefarg-open-graph-au-resnet50-stage-2`.
   :type au_model: str, optional, default=None


   .. py:property:: detector
      :type: facenet_pytorch.MTCNN


      The MTCNN model for face detection and extraction.
      See `facenet-pytorch <https://github.com/timesler/facenet-pytorch/blob/555aa4bec20ca3e7c2ead14e7e39d5bbce203e4b/models/mtcnn.py#L157>`_ for details.


   .. py:property:: encoder
      :type: facenet_pytorch.InceptionResnetV1


      The ResnetV1 model for computing face embeddings.
      See `facenet-pytorch <https://github.com/timesler/facenet-pytorch/blob/555aa4bec20ca3e7c2ead14e7e39d5bbce203e4b/models/inception_resnet_v1.py#L184>`__ for details.


   .. py:property:: clusterer
      :type: sklearn.base.ClusterMixin


      The clusterer instance from :class:`sklearn.cluster`.


   .. py:property:: extractor
      :type: mexca.video.mefarg.MEFARG


      The MEFARG model for extracting action unit activations.
      See ME-GraphAU `model <https://github.com/CVI-SZU/ME-GraphAU>`_ and
      `paper <https://arxiv.org/abs/2205.01782>`_ for details.


   .. py:method:: __call__(**callargs) -> mexca.data.VideoAnnotation

      Alias for `apply`.


   .. py:method:: detect(frame: Union[numpy.ndarray, torch.Tensor]) -> Tuple[List[torch.Tensor], Union[List[numpy.ndarray], numpy.ndarray], Union[List[numpy.ndarray], numpy.ndarray], Union[List[numpy.ndarray], numpy.ndarray]]

      Detect faces in a video frame.

      :param frame: Batch of B frames containing RGB values with dimensions (B, W, H, 3).
      :type frame: numpy.ndarray or torch.Tensor

      :returns: * **faces** (*list*) -- Batch of B tensors containing the N cropped face images from each batched frame with dimensions (N, 3, 160, 160).
                  Is `None` if a frame contains no faces.
                * **boxes** (*numpy.ndarray or list*) -- Batch of B bounding boxes of the N detected faces as (x1, y1, x2, y2) coordinates with
                  dimensions (B, N, 4). Returns a list if different numbers of faces are detected across batched frames.
                  Is `None` if a frame contains no faces.
                * **probs** (*numpy.ndarray or list*) -- Probabilities of the detected faces (B, N).
                  Returns a list if different numbers of faces are detected across batched frames.
                  Is `None` if a frame contains no faces.
                * **landmarks** (*numpy.ndarray or list*) -- Batch of B facial landmarks for N detected faces as (x, y) coordinates with dimensions (5, 2).
                  Is `None` if a frame contains no faces.


   .. py:method:: encode(faces: torch.Tensor) -> numpy.ndarray

      Compute embeddings for face images.

      :param faces: Cropped N face images from a video frame with dimensions (N, 3, H, W). H and W must at least be 80 for
                    the encoding to work.
      :type faces: torch.Tensor

      :returns: Embeddings of the N face images with dimensions (N, 512).
      :rtype: numpy.ndarray


   .. py:method:: identify(embeddings: numpy.ndarray) -> numpy.ndarray

      Cluster faces based on their embeddings.

      :param embeddings: Embeddings of the N face images with dimensions (N, E) where E is the length
                         of the embedding vector.
      :type embeddings: numpy.ndarray

      :returns: Cluster indices for the N face embeddings.
      :rtype: numpy.ndarray


   .. py:method:: extract(frame: Union[numpy.ndarray, torch.Tensor]) -> Union[List[numpy.ndarray], numpy.ndarray]

      Detect facial action units activations.

      :param frame: Batch of B frames containing RGB values with dimensions (B, H, W, 3).
      :type frame: numpy.ndarray or torch.Tensor

      :returns: **aus** -- Batch of B action unit activations for N detected faces with dimensions (N, 41).
                Is `None` if a frame contains no faces.
      :rtype: numpy.ndarray or list


   .. py:method:: compute_avg_embeddings(embeddings: numpy.ndarray, labels: numpy.ndarray) -> dict

      Computes average embedding vector for each face detected in the video.

      :param embeddings: Face embeddings.
      :type embeddings: numpy.ndarray
      :param labels: Face labels.
      :type labels: numpy.ndarray

      :returns: **average embedding dictionary** -- Dictionary with keys representing face labels and values representing
                the average embedding vector for each face label.
      :rtype: dict


   .. py:method:: compute_confidence(embeddings: numpy.ndarray, labels: numpy.ndarray) -> numpy.ndarray

      Compute face label classification confidence.

      :param embeddings: Face embeddings.
      :type embeddings: numpy.ndarray
      :param labels: Face labels.
      :type labels: numpy.ndarray

      :returns: **confidence** -- Confidence scores between 0 and 1. Returns `numpy.nan` if no label was assigned to a face.
      :rtype: numpy.ndarray


   .. py:method:: apply(filepath: str, batch_size: int = 1, skip_frames: int = 1, process_subclip: Tuple[Optional[float]] = (0, None), cluster_embeddings: bool = True, return_embeddings: bool = False, show_progress: bool = True) -> mexca.data.VideoAnnotation

      Apply multiple steps to extract features from faces in a video file.

      This method subsequently calls other methods for each frame of a video file to detect
      and cluster faces. It also extracts facial landmarks and action units.

      :param filepath: Path to the video file.
      :type filepath: str
      :param batch_size: Size of the batch of video frames that are loaded and processed at the same time.
      :type batch_size: int, default=1
      :param skip_frames: Only process every nth frame, starting at 0.
      :type skip_frames: int, default=1
      :param process_subclip: Process only a part of the video clip. Must be the start and end of the subclip in seconds.
      :type process_subclip: tuple, default=(0, None)
      :param cluster_embeddings: Cluster embeddings using spectral clustering.
      :type cluster_embeddings: bool, default=True
      :param return_embeddings: Return embedding vectors for each detected face.
      :type return_embeddings: bool, default=False
      :param show_progress: Enables the display of a progress bar.
      :type show_progress: bool, default=True

      :returns: A data class object with extracted facial features.
      :rtype: VideoAnnotation


.. py:function:: cli()

   Command line interface for extracting facial features.
   See `extract-faces -h` for details.