mexca.data

Objects for storing multimodal data.

Module Contents

Classes

VideoAnnotation

Video annotation class for storing facial features.

VoiceFeatures

Class for storing voice features.

SegmentData

Class for storing speech segment data.

SpeakerAnnotation

Class for storing speaker and speech segment annotations.

TranscriptionData

Class for storing transcription data.

AudioTranscription

Class for storing audio transcriptions.

SentimentData

Class for storing sentiment data.

SentimentAnnotation

Class for storing sentiment scores of transcribed sentences.

Multimodal

Class for storing multimodal features.

class mexca.data.VideoAnnotation[source]

Video annotation class for storing facial features.

Parameters:
  • frame (list, optional) – Index of each frame.

  • time (list, optional) – Timestamp of each frame in seconds.

  • face_box (list, optional) – Bounding box of a detected face. Is numpy.nan if no face was detected.

  • face_prob (list, optional) – Probability of a detected face. Is numpy.nan if no face was detected.

  • face_landmarks (list, optional) – Facial landmarks of a detected face. Is numpy.nan if no face was detected.

  • face_aus (list, optional) – Facial action unit activations of a detected face. Is numpy.nan if no face was detected.

  • face_label (list, optional) – Label of a detected face. Is numpy.nan if no face was detected.

  • face_confidence (list, optional) – Confidence of the face_label assignment. Is numpy.nan if no face was detected or only one face label was assigned.

classmethod from_json(filename: str)[source]

Load a video annotation from a JSON file.

Parameters:

filename (str) – Name of the JSON file from which the object should be loaded. Must have a .json ending.

write_json(filename: str)[source]

Write the video annotation to a JSON file.

Parameters:

filename (str) – Name of the destination file. Must have a .json ending.

class mexca.data.VoiceFeatures[source]

Class for storing voice features.

Features are stored as lists (like columns of a data frame). Optional features are initialized as empty lists.

Parameters:
  • frame (list) – The frame index for which features were extracted.

  • time (list) – The time stamp at which features were extracted.

classmethod from_json(filename: str)[source]

Load voice features from a JSON file.

Parameters:

filename (str) – Name of the JSON file from which the object should be loaded. Must have a .json ending.

write_json(filename: str)[source]

Store voice features in a JSON file.

Parameters:

filename (str) – Name of the destination file. Must have a .json ending.

class mexca.data.SegmentData[source]

Class for storing speech segment data.

Parameters:
  • filename (str) – Name of the file from which the segment was obtained.

  • channel (int) – Channel index.

  • name (str, optional, default=None) – Speaker label.

  • conf (float, optional, default=None) – Confidence of speaker label.

class mexca.data.SpeakerAnnotation(intervals: List[intervaltree.Interval] = None)[source]

Bases: intervaltree.IntervalTree

Class for storing speaker and speech segment annotations.

Stores speech segments as intervaltree.Interval in an intervaltree.IntervalTree. Speaker labels are stored in SegmentData objects in the data attribute of each interval.

__str__(end: str = '\t', file: TextIO = sys.stdout, header: bool = True)[source]

Return str(self).

classmethod from_pyannote(annotation: Any)[source]

Create a SpeakerAnnotation object from a pyannote.core.Annotation object.

Parameters:

annotation (pyannote.core.Annotation) – Annotation object containing speech segments and speaker labels.

classmethod from_rttm(filename: str)[source]

Load a speaker annotation from an RTTM file.

Parameters:

filename (str) – Path to the file. Must have an RTTM ending.

write_rttm(filename: str)[source]

Write a speaker annotation to an RTTM file.

Parameters:

filename (str) – Path to the file. Must have an RTTM ending.

class mexca.data.TranscriptionData[source]

Class for storing transcription data.

Parameters:
  • index (int) – Index of the transcribed sentence.

  • text (str) – Transcribed text.

  • speaker (str, optional) – Speaker of the transcribed text.

class mexca.data.AudioTranscription(filename: str, subtitles: Optional[intervaltree.IntervalTree] = None)[source]

Class for storing audio transcriptions.

Parameters:
  • filename (str) – Name of the transcribed audio file.

  • subtitles (intervaltree.IntervalTree, optional, default=None) – Interval tree containing the transcribed speech segments split into sentences as intervals. The transcribed sentences are stored in the data attribute of each interval.

classmethod from_srt(filename: str)[source]

Load an audio transcription from an SRT file.

Parameters:

filename (str) – Name of the file to be loaded. Must have an .srt ending.

write_srt(filename: str)[source]

Write an audio transcription to an SRT file

Parameters:

filename (str) – Name of the file to write to. Must have an .srt ending.

class mexca.data.SentimentData[source]

Class for storing sentiment data.

Parameters:
  • text (str) – Text of the sentence for which sentiment scores were predicted.

  • pos (float) – Positive sentiment score.

  • neg (float) – Negative sentiment score.

  • neu (float) – Neutral sentiment score.

class mexca.data.SentimentAnnotation(intervals: List[intervaltree.Interval] = None)[source]

Bases: intervaltree.IntervalTree

Class for storing sentiment scores of transcribed sentences.

Stores sentiment scores as intervals in an interval tree. The scores are stored in the data attribute of each interval.

classmethod from_json(filename: str)[source]

Load a sentiment annotation from a JSON file.

Parameters:

filename (str) – Name of the JSON file from which the object should be loaded. Must have a .json ending.

write_json(filename: str)[source]

Write a sentiment annotation to a JSON file.

Parameters:

filename (str) – Name of the destination file. Must have a .json ending.

class mexca.data.Multimodal(filename: str, duration: Optional[float] = None, fps: Optional[int] = None, fps_adjusted: Optional[int] = None, video_annotation: Optional[VideoAnnotation] = None, audio_annotation: Optional[SpeakerAnnotation] = None, voice_features: Optional[VoiceFeatures] = None, transcription: Optional[AudioTranscription] = None, sentiment: Optional[SentimentAnnotation] = None, features: Optional[pandas.DataFrame] = None)[source]

Class for storing multimodal features.

See the Output section for details.

Parameters:
  • filename (str) – Name of the file from which features were extracted.

  • duration (float, optional, default=None) – Video duration in seconds.

  • fps (: float) – Frames per second.

  • fps_adjusted (float) – Frames per seconds adjusted for skipped frames. Mostly needed for internal computations.

  • video_annotation (VideoAnnotation) – Object containing facial features.

  • audio_annotation (SpeakerAnnotation) – Object containing speech segments and speakers.

  • voice_features (VoiceFeatures) – Object containing voice features.

  • transcription (AudioTranscription) – Object containing transcribed speech segments split into sentences.

  • sentiment (SentimentAnnotation) – Object containing sentiment scores for transcribed sentences.

  • features (pandas.DataFrame) – Merged features.

merge_features() pandas.DataFrame[source]

Merge multimodal features from pipeline components into a common data frame.

Transforms and merges the available output stored in the Multimodal object based on the ‘frame’ variable. Stores the merged features as a pandas.DataFrame in the features attribute.

Returns:

Merged multimodal features.

Return type:

pandas.DataFrame