`mexca.pipeline`

Build a pipeline to extract emotion expression features from a video file.

Module Contents

Classes

Pipeline

Build a pipeline to extract emotion expression features from a video file.

Build a pipeline to extract emotion expression features from a video file.

Takes either component objects or container component objects (or a mix of both) as input.

Parameters:

face_extractor (FaceExtractor or FaceExtractorContainer, optional, default=None) – Component for detecting and identifying faces as well as extracting facial features.
speaker_identifier (SpeakerIdentifier or SpeakerIdentifierContainer, optional, default=None) – Component for identifying speech segments and speakers.
voice_extractor (VoiceExtractor or VoiceExtractorContainer, optional, default=None) – Component for extracting voice features.
audio_transcriber (AudioTranscriber or AudioTranscriberContainer, optional, default=None) – Component for transcribing speech segments to text.
sentiment_extractor (SentimentExtractor or SentimentExtractorContainer, optional, default=None) – Component for extracting sentiment from text.

Examples

Create a pipeline with standard components.

>>> from mexca import Pipeline
>>> from mexca.audio import SpeakerIdentifier, VoiceExtractor
>>> from mexca.text import AudioTranscriber, SentimentExtractor
>>> from mexca.video import FaceExtractor
>>> num_faces = 2
>>> num_speaker = 2
>>> pipeline = Pipeline(
...     face_extractor=FaceExtractor(num_faces=num_faces),
...     speaker_identifier=SpeakerIdentifier(
...         num_speakers=num_speakers
...     ),
...     voice_extractor=VoiceExtractor(),
...     audio_transcriber=AudioTranscriber(),
...     sentiment_extractor=SentimentExtractor()
... )

Create a pipeline with container components.

>>> from mexca import Pipeline
>>> from mexca.container import AudioTranscriberContainer, FaceExtractorContainer,
>>>     SentimentExtractorContainer, SpeakerIdentifierContainer, VoiceExtractorContainer
>>> num_faces = 2
>>> num_speaker = 2
>>> pipeline = Pipeline(
...     face_extractor=FaceExtractorContainer(num_faces=num_faces),
...     speaker_identifier=SpeakerIdentifierContainer(
...         num_speakers=num_speakers
...     ),
...     voice_extractor=VoiceExtractorContainer(),
...     audio_transcriber=AudioTranscriberContainer(),
...     sentiment_extractor=SentimentExtractorContainer()
... )

Create a pipeline with standard and container components.

>>> from mexca import Pipeline
>>> from mexca.audio import SpeakerIdentifier, VoiceExtractor
>>> from mexca.container import AudioTranscriberContainer, FaceExtractorContainer,
>>>     SentimentExtractorContainer
>>> num_faces = 2
>>> num_speaker = 2
>>> pipeline = Pipeline(
...     face_extractor=FaceExtractorContainer(num_faces=num_faces),
...     speaker_identifier=SpeakerIdentifier( # standard
...         num_speakers=num_speakers
...     ),
...     voice_extractor=VoiceExtractor(), # standard
...     audio_transcriber=AudioTranscriberContainer(),
...     sentiment_extractor=SentimentExtractorContainer()
... )

apply(filepath: str | collections.abc.Iterable, frame_batch_size: int = 1, skip_frames: int = 1, process_subclip: Tuple[float | None] = (0, None), return_embeddings: bool = False, language: str | None = None, keep_audiofile: bool = False, merge: bool = True, show_progress: bool = True) → mexca.data.Multimodal | collections.abc.Iterable[source]

Extract emotion expression features from a video file.

This is the main function to apply the complete mexca pipeline to a video file.

Parameters:

filepath (str or collections.abc.Iterable) – Path to the video file or iterable returning paths to multiple video files.
frame_batch_size (int, default=1) – Size of the batch of video frames that are loaded and processed at the same time.
skip_frames (int, default=1) – Only process every nth frame, starting at 0.
process_subclip (tuple, default=(0, None)) – Process only a part of the video clip. Must be the start and end of the subclip in seconds. None indicates the end of the video.
return_embeddings (bool, default=False) – Return embeddings for each detected face. For large input files, this can increase the size of the output substantially as a 512-element vector is stored for each face. Face embeddings are stored in the video_annotation attribute of the Multimodal object.
language (str, optional, default=None) – The language of the speech that is transcribed. If None, the language is detected for each speech segment.
keep_audiofile (bool, default=False) – Keeps the audio file after processing. If False, the audio file is only stored temporarily.
merge (bool, default=True) – Whether to merge the output from the different components into a single polars.LazyFrame. If True (default), the method merge_features() is called after all components finished processing and a polars.LazyFrame is stored at the features attribute. If False, the method is not called and the features attribute is None.
show_progress (bool, default=True) – Enables progress bars and printing info logging messages to the console. The logging is overriden when a custom logger is explicitly created.

Returns:

A data class object that contains the extracted merged features in the features attribute. See the Output section for details. If filepath is an collections.abc.Iterable returns an collections.abc.Iterable of mexca.data.Multimodal objects.

Return type:

Multimodal or collections.abc.Iterable

mexca.pipeline

Module Contents

Classes

`mexca.pipeline`