mexca.audio.identification

Speech segment and speaker identification.

Module Contents

Classes

SpeakerIdentifier

Identify speech segments and cluster speakers using speaker diarization.

Functions

cli()

Command line interface for identifying speech segments and speakers.

exception mexca.audio.identification.AuthenticationError(msg: str)[source]

Failed authentication to HuggingFace Hub.

Parameters:

msg (str) – Error message.

class mexca.audio.identification.SpeakerIdentifier(num_speakers: int | None = None, device: torch.device = torch.device(type='cpu'), use_auth_token: bool | str = True)[source]

Identify speech segments and cluster speakers using speaker diarization.

Wrapper class for pyannote.audio.SpeakerDiarization. Uses pretrained speaker diarization model pyannote/speaker-diarization-3.1 from HuggingFace.

Parameters:
  • num_speakers (int, optional) – Number of speakers to which speech segments will be assigned during the clustering (oracle speakers). If None, the number of speakers is estimated from the audio signal.

  • device (torch.device, default=torch.device("cpu")) – The device on which the speaker diarization model is run.

  • use_auth_token (bool or str, default=True) – Whether to use the HuggingFace authentication token stored on the machine (if bool) or a HuggingFace authentication token with access to the models pyannote/speaker-diarization and pyannote/segmentation (if str).

Notes

This class requires pretrained models for speaker diarization and segmentation from HuggingFace. To download the models accept the user conditions on hf.co/pyannote/speaker-diarization and hf.co/pyannote/segmentation. Then generate an authentication token on hf.co/settings/tokens.

property pipeline: pyannote.audio.Pipeline[source]

The pretrained speaker diarization pipeline. See pyannote.audio.SpeakerDiarization for details.

apply(filepath: str, show_progress: bool = True) mexca.data.SpeakerAnnotation[source]

Identify speech segments and speakers.

Parameters:
  • filepath (str) – Path to the audio file.

  • show_progress (bool, default=True) – Enables the display of a progress bar.

Returns:

A data class object that contains detected speech segments and speakers.

Return type:

SpeakerAnnotation

mexca.audio.identification.cli()[source]

Command line interface for identifying speech segments and speakers. See identify-speakers -h for details.