mexca.audio.extraction

Extract voice features from an audio file.

Construct a dictionary with keys as feature names and values as feature objects. The dictionary can be used to extract the specified features with the VoiceExtractor. Feature objects require lower-level voice signal properties, which are defined in the requires() method of feach feature class. The VoiceExtractor class computes the properties and supplies them to the feature objects.

Module Contents

Classes

BaseFeature

Abstract base class for features.

FeaturePitchF0

Extract voice pitch as the fundamental frequency F0 in Hz.

FeatureJitter

Extract local jitter relative to the fundamental frequency.

FeatureShimmer

Extract local shimmer relative to the fundamental frequency.

FeatureHnr

Extract the harmonicity-to-noise ratio in dB.

FeatureFormantFreq

Extract formant central frequency in Hz.

FeatureFormantBandwidth

Extract formant frequency bandwidth in Hz.

FeatureFormantAmplitude

Extract formant amplitude relative to F0 harmonic amplitude.

FeatureAlphaRatio

Extract the alpha ratio in dB.

FeatureHammarIndex

Extract the Hammarberg index in dB.

FeatureSpectralSlope

Extract spectral slopes for frequency bands.

FeatureHarmonicDifference

Extract the difference between pitch harmonic and/or formant amplitudes in dB.

FeatureMfcc

Extract Mel frequency cepstral coefficients (MFCCs).

FeatureSpectralFlux

Extract spectral flux.

FeatureRmsEnergy

Extract the root mean squared energy in dB.

VoiceExtractor

Extract voice features from an audio file.

Functions

cli()

Command line interface for extracting voice features.

class mexca.audio.extraction.BaseFeature[source]

Abstract base class for features.

Can be used to create custom voice feature extraction classes.

abstract property requires: Dict[str, type] | None[source]

Specify objects required for feature extraction.

This abstract method must be overwritten to return a dictionary with keys as the names of objects required for computing features and values the types of these objects. The VoiceExtractor object will look for objects with the specified types and add them as attributes to the feature class with the names of the dictionary keys.

Returns:

Dictionary where keys are the names and values the types of required objects.

Return type:

dict

abstract apply(time: numpy.ndarray) numpy.ndarray[source]

Extract features at time points by linear interpolation.

Parameters:

time (numpy.ndarray) – Time points.

Returns:

Feature values interpolated at time points.

Return type:

numpy.ndarray

class mexca.audio.extraction.FeaturePitchF0[source]

Extract voice pitch as the fundamental frequency F0 in Hz.

property requires: Dict[str, emvoice.pitch.PitchFrames] | None[source]

Specify objects required for feature extraction.

Returns:

Dictionary with key pitch_frames.

Return type:

dict

apply(time: numpy.ndarray) numpy.ndarray[source]

Extract features at time points by linear interpolation.

Parameters:

time (numpy.ndarray) – Time points.

Returns:

Feature values interpolated at time points.

Return type:

numpy.ndarray

class mexca.audio.extraction.FeatureJitter[source]

Extract local jitter relative to the fundamental frequency.

property requires: Dict[str, emvoice.pitch.JitterFrames] | None[source]

Specify objects required for feature extraction.

Returns:

Dictionary with key jitter_frames.

Return type:

dict

apply(time: numpy.ndarray) numpy.ndarray[source]

Extract features at time points by linear interpolation.

Parameters:

time (numpy.ndarray) – Time points.

Returns:

Feature values interpolated at time points.

Return type:

numpy.ndarray

class mexca.audio.extraction.FeatureShimmer[source]

Extract local shimmer relative to the fundamental frequency.

property requires: Dict[str, emvoice.pitch.ShimmerFrames] | None[source]

Specify objects required for feature extraction.

Returns:

Dictionary with key shimmer_frames.

Return type:

dict

apply(time: numpy.ndarray) numpy.ndarray[source]

Extract features at time points by linear interpolation.

Parameters:

time (numpy.ndarray) – Time points.

Returns:

Feature values interpolated at time points.

Return type:

numpy.ndarray

class mexca.audio.extraction.FeatureHnr[source]

Extract the harmonicity-to-noise ratio in dB.

property requires: Dict[str, emvoice.energy.HnrFrames] | None[source]

Specify objects required for feature extraction.

Returns:

Dictionary with key hnr_frames.

Return type:

dict

apply(time: numpy.ndarray) numpy.ndarray[source]

Extract features at time points by linear interpolation.

Parameters:

time (numpy.ndarray) – Time points.

Returns:

Feature values interpolated at time points.

Return type:

numpy.ndarray

class mexca.audio.extraction.FeatureFormantFreq(n_formant: int)[source]

Extract formant central frequency in Hz.

Parameters:

n_formant (int) – Index of the formant (starting at 0).

property requires: Dict[str, emvoice.formants.FormantFrames] | None[source]

Specify objects required for feature extraction.

Returns:

Dictionary with key formant_frames.

Return type:

dict

apply(time: numpy.ndarray) numpy.ndarray | None[source]

Extract features at time points by linear interpolation.

Parameters:

time (numpy.ndarray) – Time points.

Returns:

Feature values interpolated at time points.

Return type:

numpy.ndarray

class mexca.audio.extraction.FeatureFormantBandwidth(n_formant: int)[source]

Extract formant frequency bandwidth in Hz.

Parameters:

n_formant (int) – Index of the formant (starting at 0).

apply(time: numpy.ndarray) numpy.ndarray | None[source]

Extract features at time points by linear interpolation.

Parameters:

time (numpy.ndarray) – Time points.

Returns:

Feature values interpolated at time points.

Return type:

numpy.ndarray

class mexca.audio.extraction.FeatureFormantAmplitude(n_formant: int)[source]

Extract formant amplitude relative to F0 harmonic amplitude.

Parameters:

n_formant (int) – Index of the formant (starting at 0).

property requires: Dict[str, emvoice.formants.FormantAmplitudeFrames] | None[source]

Specify objects required for feature extraction.

Returns:

Dictionary with key formant_amp_frames.

Return type:

dict

apply(time: numpy.ndarray) numpy.ndarray | None[source]

Extract features at time points by linear interpolation.

Parameters:

time (numpy.ndarray) – Time points.

Returns:

Feature values interpolated at time points.

Return type:

numpy.ndarray

class mexca.audio.extraction.FeatureAlphaRatio[source]

Extract the alpha ratio in dB.

property requires: Dict[str, emvoice.spectral.AlphaRatioFrames] | None[source]

Specify objects required for feature extraction.

Returns:

Dictionary with key alpha_ratio_frames.

Return type:

dict

apply(time: numpy.ndarray) numpy.ndarray[source]

Extract features at time points by linear interpolation.

Parameters:

time (numpy.ndarray) – Time points.

Returns:

Feature values interpolated at time points.

Return type:

numpy.ndarray

class mexca.audio.extraction.FeatureHammarIndex[source]

Extract the Hammarberg index in dB.

property requires: Dict[str, emvoice.spectral.HammarIndexFrames] | None[source]

Specify objects required for feature extraction.

Returns:

Dictionary with key hammar_index_frames.

Return type:

dict

apply(time: numpy.ndarray) numpy.ndarray[source]

Extract features at time points by linear interpolation.

Parameters:

time (numpy.ndarray) – Time points.

Returns:

Feature values interpolated at time points.

Return type:

numpy.ndarray

class mexca.audio.extraction.FeatureSpectralSlope(lower: float, upper: float)[source]

Extract spectral slopes for frequency bands.

Parameters:
  • lower (float) – Lower and upper boundary of the frequency band for which to extract the spectral slope. A band with these boundaries must exist in the required spectral_slope_frames object.

  • upper (float) – Lower and upper boundary of the frequency band for which to extract the spectral slope. A band with these boundaries must exist in the required spectral_slope_frames object.

property requires: Dict[str, type] | None[source]

Specify objects required for feature extraction.

Returns:

Dictionary with key spectral_slope_frames.

Return type:

dict

apply(time: numpy.ndarray) numpy.ndarray[source]

Extract features at time points by linear interpolation.

Parameters:

time (numpy.ndarray) – Time points.

Returns:

Feature values interpolated at time points.

Return type:

numpy.ndarray

class mexca.audio.extraction.FeatureHarmonicDifference(x_idx: int = 0, x_type: str = 'h', y_idx: int = 1, y_type: str = 'h')[source]

Extract the difference between pitch harmonic and/or formant amplitudes in dB.

Parameters:
  • x_idx (int, default=0) – Index of the first/second amplitude.

  • y_idx (int, default=0) – Index of the first/second amplitude.

  • x_type (str, default='h') – Type of the first/second amplitude. Must be either ‘h’ for pitch harmonic or ‘f’ for formant.

  • y_type (str, default='h') – Type of the first/second amplitude. Must be either ‘h’ for pitch harmonic or ‘f’ for formant.

Raises:

ValueError – If x_type or y_type is not ‘h’ or ‘f’.

property requires: Dict[str, emvoice.formants.FormantAmplitudeFrames | emvoice.pitch.PitchHarmonicsFrames] | None[source]

Specify objects required for feature extraction.

Returns:

Dictionary with keys formant_amp_frames and pitch_harmonics_frames.

Return type:

dict

apply(time: numpy.ndarray) numpy.ndarray[source]

Extract features at time points by linear interpolation.

Parameters:

time (numpy.ndarray) – Time points.

Returns:

Feature values interpolated at time points.

Return type:

numpy.ndarray

class mexca.audio.extraction.FeatureMfcc(n_mfcc: int = 0)[source]

Extract Mel frequency cepstral coefficients (MFCCs).

Parameters:

n_mfcc (int, default=0) – Index of the MFCC to be extracted.

property requires: Dict[str, emvoice.spectral.MfccFrames] | None[source]

Specify objects required for feature extraction.

Returns:

Dictionary with key mfcc_frames.

Return type:

dict

apply(time: numpy.ndarray) numpy.ndarray[source]

Extract features at time points by linear interpolation.

Parameters:

time (numpy.ndarray) – Time points.

Returns:

Feature values interpolated at time points.

Return type:

numpy.ndarray

class mexca.audio.extraction.FeatureSpectralFlux[source]

Extract spectral flux.

property requires: Dict[str, emvoice.spectral.SpectralFluxFrames] | None[source]

Specify objects required for feature extraction.

Returns:

Dictionary with key spectral_flux_frames.

Return type:

dict

apply(time: numpy.ndarray) numpy.ndarray[source]

Extract features at time points by linear interpolation.

Parameters:

time (numpy.ndarray) – Time points.

Returns:

Feature values interpolated at time points.

Return type:

numpy.ndarray

class mexca.audio.extraction.FeatureRmsEnergy[source]

Extract the root mean squared energy in dB.

property requires: Dict[str, type] | None[source]

Specify objects required for feature extraction.

Returns:

Dictionary with key rms_frames.

Return type:

dict

apply(time: numpy.ndarray) numpy.ndarray[source]

Extract features at time points by linear interpolation.

Parameters:

time (numpy.ndarray) – Time points.

Returns:

Feature values interpolated at time points.

Return type:

numpy.ndarray

class mexca.audio.extraction.VoiceExtractor(features: Dict[str, BaseFeature] | None = None, config: mexca.data.VoiceFeaturesConfig | None = None)[source]

Extract voice features from an audio file.

For default features, see the Output section.

Parameters:
  • features (dict, optional, default=None) – Dictionary with keys as feature names and values as feature extraction objects. If None, default features are extracted.

  • config (VoiceFeaturesConfig, optional, default=None) – Voice feature extraction configuration object. If None, uses VoiceFeaturesConfig’s default configuration.

apply(filepath: str, time_step: float, skip_frames: int = 1) mexca.data.VoiceFeatures[source]

Extract voice features from an audio file.

Parameters:
  • filepath (str) – Path to the audio file.

  • time_step (float) – The interval between time points at which features are extracted.

  • skip_frames (int) – Only process every nth frame, starting at 0.

Returns:

A data class object containing the extracted voice features.

Return type:

VoiceFeatures

mexca.audio.extraction.cli()[source]

Command line interface for extracting voice features. See extract-voice -h for details.