mexca.pipeline

Build a pipeline to extract emotion expression features from a video file.

Module Contents

Classes

Pipeline

Build a pipeline to extract emotion expression features from a video file.

class mexca.pipeline.Pipeline(face_extractor: FaceExtractor | FaceExtractorContainer | None = None, speaker_identifier: SpeakerIdentifier | SpeakerIdentifierContainer | None = None, voice_extractor: VoiceExtractor | VoiceExtractorContainer | None = None, audio_transcriber: AudioTranscriber | AudioTranscriberContainer | None = None, sentiment_extractor: SentimentExtractor | SentimentExtractorContainer | None = None)[source]

Build a pipeline to extract emotion expression features from a video file.

Takes either component objects or container component objects (or a mix of both) as input.

Parameters:

Examples

Create a pipeline with standard components.

>>> from mexca import Pipeline
>>> from mexca.audio import SpeakerIdentifier, VoiceExtractor
>>> from mexca.text import AudioTranscriber, SentimentExtractor
>>> from mexca.video import FaceExtractor
>>> num_faces = 2
>>> num_speaker = 2
>>> pipeline = Pipeline(
...     face_extractor=FaceExtractor(num_faces=num_faces),
...     speaker_identifier=SpeakerIdentifier(
...         num_speakers=num_speakers
...     ),
...     voice_extractor=VoiceExtractor(),
...     audio_transcriber=AudioTranscriber(),
...     sentiment_extractor=SentimentExtractor()
... )

Create a pipeline with container components.

>>> from mexca import Pipeline
>>> from mexca.container import AudioTranscriberContainer, FaceExtractorContainer,
>>>     SentimentExtractorContainer, SpeakerIdentifierContainer, VoiceExtractorContainer
>>> num_faces = 2
>>> num_speaker = 2
>>> pipeline = Pipeline(
...     face_extractor=FaceExtractorContainer(num_faces=num_faces),
...     speaker_identifier=SpeakerIdentifierContainer(
...         num_speakers=num_speakers
...     ),
...     voice_extractor=VoiceExtractorContainer(),
...     audio_transcriber=AudioTranscriberContainer(),
...     sentiment_extractor=SentimentExtractorContainer()
... )

Create a pipeline with standard and container components.

>>> from mexca import Pipeline
>>> from mexca.audio import SpeakerIdentifier, VoiceExtractor
>>> from mexca.container import AudioTranscriberContainer, FaceExtractorContainer,
>>>     SentimentExtractorContainer
>>> num_faces = 2
>>> num_speaker = 2
>>> pipeline = Pipeline(
...     face_extractor=FaceExtractorContainer(num_faces=num_faces),
...     speaker_identifier=SpeakerIdentifier( # standard
...         num_speakers=num_speakers
...     ),
...     voice_extractor=VoiceExtractor(), # standard
...     audio_transcriber=AudioTranscriberContainer(),
...     sentiment_extractor=SentimentExtractorContainer()
... )
apply(filepath: str | collections.abc.Iterable, frame_batch_size: int = 1, skip_frames: int = 1, process_subclip: Tuple[float | None] = (0, None), return_embeddings: bool = False, language: str | None = None, keep_audiofile: bool = False, merge: bool = True, show_progress: bool = True) mexca.data.Multimodal | collections.abc.Iterable[source]

Extract emotion expression features from a video file.

This is the main function to apply the complete mexca pipeline to a video file.

Parameters:
  • filepath (str or collections.abc.Iterable) – Path to the video file or iterable returning paths to multiple video files.

  • frame_batch_size (int, default=1) – Size of the batch of video frames that are loaded and processed at the same time.

  • skip_frames (int, default=1) – Only process every nth frame, starting at 0.

  • process_subclip (tuple, default=(0, None)) – Process only a part of the video clip. Must be the start and end of the subclip in seconds. None indicates the end of the video.

  • return_embeddings (bool, default=False) – Return embeddings for each detected face. For large input files, this can increase the size of the output substantially as a 512-element vector is stored for each face. Face embeddings are stored in the video_annotation attribute of the Multimodal object.

  • language (str, optional, default=None) – The language of the speech that is transcribed. If None, the language is detected for each speech segment.

  • keep_audiofile (bool, default=False) – Keeps the audio file after processing. If False, the audio file is only stored temporarily.

  • merge (bool, default=True) – Whether to merge the output from the different components into a single polars.LazyFrame. If True (default), the method merge_features() is called after all components finished processing and a polars.LazyFrame is stored at the features attribute. If False, the method is not called and the features attribute is None.

  • show_progress (bool, default=True) – Enables progress bars and printing info logging messages to the console. The logging is overriden when a custom logger is explicitly created.

Returns:

A data class object that contains the extracted merged features in the features attribute. See the Output section for details. If filepath is an collections.abc.Iterable returns an collections.abc.Iterable of mexca.data.Multimodal objects.

Return type:

Multimodal or collections.abc.Iterable

Examples

>>> import polars as pl
>>> from mexca.data import Multimodal
>>> # Single video file
>>> filepath = 'path/to/video'
>>> output = pipeline.apply(filepath)
>>> assert isinstance(output, Multimodal)
True
>>> assert isinstance(output.features, pl.LazyFrame)
True
>>> # List of video files
>>> filepaths = ['path/to/video', 'path/to/another/video']
>>> output = pipeline.apply(filepaths)
>>> assert isinstance(output, list)
True
>>> assert [isinstance(r, Multimodal) for r in output]
True