Spatial Effects

This section covers several types of spatial effects including stereo panning, stereo widening, stereo imager, and stereo enhancer.

StereoPanning

class diffFx_pytorch.processors.spatial.StereoPanning(sample_rate=44100, param_range=None)[source]

Bases: ProcessorsBase

Differentiable implementation of constant-power stereo panning.

This processor implements stereo panning using a constant-power (equal-power) panning law, which maintains consistent perceived loudness across the stereo field. It converts mono input signals to stereo by applying complementary gain coefficients to create the desired stereo position.

The panning uses a sinusoidal/cosine-based gain law that ensures:
  • Constant total power across all pan positions

  • Smooth transitions between channels

  • -3dB center attenuation for optimal power distribution

The gain calculations follow:

\[ \begin{align}\begin{aligned}g_L = \sqrt{\frac{\pi/2 - \theta}{\pi/2}} \cos(\theta)\\g_R = \sqrt{\frac{\theta}{\pi/2}} \sin(\theta)\end{aligned}\end{align} \]
where:
  • θ is the panning angle (0 to π/2)

  • g_L is the gain coefficient for left channel

  • g_R is the gain coefficient for right channel

Parameters:

sample_rate (int) – Audio sample rate in Hz

Parameters Details:
pan: Panning position control
  • 0.0: Full left

  • 0.5: Center

  • 1.0: Full right

  • Controls the perceived position in the stereo field

  • Mapped internally to panning angle θ

Note

  • Input must be mono (single channel)

  • Output is always stereo (two channels)

  • Total power is preserved across all pan positions

  • Uses equal-power (constant-power) panning law

Warning

When using with neural networks:
  • norm_params must be in range [0, 1]

  • Parameter will be automatically mapped to pan position

  • Ensure your network output is properly normalized (e.g., using sigmoid)

Examples

Basic DSP Usage:
>>> # Create a stereo panner
>>> panner = StereoPanning(sample_rate=44100)
>>> # Process mono audio with direct panning
>>> output = panner(input_audio, dsp_params={
...     'pan': 0.75  # Pan 75% to the right
... })
Neural Network Control:
>>> # 1. Simple parameter prediction
>>> class PanningController(nn.Module):
...     def __init__(self, input_size):
...         super().__init__()
...         self.net = nn.Sequential(
...             nn.Linear(input_size, 32),
...             nn.ReLU(),
...             nn.Linear(32, 1),
...             nn.Sigmoid()  # Ensures output is in [0,1] range
...         )
...
...     def forward(self, x):
...         return self.net(x)
>>>
>>> # Initialize controller
>>> panner = StereoPanning(sample_rate=44100)
>>> controller = PanningController(input_size=16)
>>>
>>> # Process with features
>>> features = torch.randn(batch_size, 16)  # Audio features
>>> norm_params = {'pan': controller(features)}
>>> output = panner(input_audio, norm_params=norm_params)
Parameters:

param_range (Optional[Dict[str, EffectParam]]) –

A stereo positioning processor that controls the balance between left and right channels. It allows for continuous panning across the stereo field, from fully left to fully right, while maintaining consistent overall loudness through amplitude-compensated panning laws.

_register_default_parameters()[source]

Register the panning parameter.

Sets up the pan parameter with range 0.0 (full left) to 1.0 (full right).

process(x, norm_params=None, dsp_params=None)[source]

Process input signal through the stereo panner.

Parameters:
  • x (torch.Tensor) – Input audio tensor. Shape: (batch, 1, samples)

  • norm_params (Dict[str, torch.Tensor]) – Normalized parameters (0 to 1) Must contain the following keys: - ‘pan’: Stereo position from left to right (0 to 1) Each value should be a tensor of shape (batch_size,)

  • dsp_params (Dict[str, Union[float, torch.Tensor]], optional) – Direct DSP parameters. Can specify panner parameters as: - float/int: Single value applied to entire batch - 0D tensor: Single value applied to entire batch - 1D tensor: Batch of values matching input batch size Parameters will be automatically expanded to match batch size and moved to input device if necessary. If provided, norm_params must be None.

Returns:

Processed stereo audio tensor. Shape: (batch, 2, samples)

Return type:

torch.Tensor

Raises:

AssertionError – If input is not mono (single channel)

StereoWidener

class diffFx_pytorch.processors.spatial.StereoWidener(sample_rate=44100, param_range=None)[source]

Bases: ProcessorsBase

Differentiable implementation of mid-side stereo width control.

This processor implements stereo width adjustment using mid-side (M/S) processing, allowing continuous control from mono to enhanced stereo width. It operates by converting the input to M/S representation, scaling the side signal, and converting back to left-right stereo.

The width control is implemented using the following process:

\[ \begin{align}\begin{aligned}M_{out} = M_{in} * 2(1 - width)\\S_{out} = S_{in} * 2(width)\end{aligned}\end{align} \]
where:
  • M is the mid (mono) signal: (L + R) / √2

  • S is the side (difference) signal: (L - R) / √2

  • width is the stereo width control parameter

  • Scaling ensures energy preservation across width settings

Processing Chain:
  1. Convert L/R to M/S representation

  2. Scale mid and side signals based on width

  3. Convert back to L/R representation

Parameters:

sample_rate (int) – Audio sample rate in Hz

Parameters Details:
width: Stereo width control
  • 0.0: Mono (side signal removed)

  • 0.5: Original stereo (no change)

  • 1.0: Enhanced stereo (doubled side signal)

  • Continuously variable between these points

  • Maintains constant total energy

Note

  • Input must be stereo (two channels)

  • Uses energy-preserving M/S conversion matrices

  • Width control affects the ratio of mid to side signal

  • Extreme width settings may cause phase issues

  • Mono compatibility is maintained across all settings

Warning

When using with neural networks:
  • norm_params must be in range [0, 1]

  • Parameter will be automatically mapped to width range

  • Ensure your network output is properly normalized (e.g., using sigmoid)

  • Parameter order must match _register_default_parameters()

Examples

Basic DSP Usage:
>>> # Create a stereo widener
>>> widener = StereoWidener(sample_rate=44100)
>>> # Process stereo audio with direct width control
>>> output = widener(input_audio, dsp_params={
...     'width': 0.75  # Enhance stereo width by 50%
... })
Neural Network Control:
>>> # 1. Simple parameter prediction
>>> class WidthController(nn.Module):
...     def __init__(self, input_size):
...         super().__init__()
...         self.net = nn.Sequential(
...             nn.Linear(input_size, 32),
...             nn.ReLU(),
...             nn.Linear(32, 1),
...             nn.Sigmoid()  # Ensures output is in [0,1] range
...         )
...
...     def forward(self, x):
...         return self.net(x)
>>>
>>> # Initialize controller
>>> widener = StereoWidener(sample_rate=44100)
>>> controller = WidthController(input_size=16)
>>>
>>> # Process with features
>>> features = torch.randn(batch_size, 16)  # Audio features
>>> norm_params = {'width': controller(features)}
>>> output = widener(input_audio, norm_params=norm_params)
Parameters:

param_range (Optional[Dict[str, EffectParam]]) –

A stereo enhancement processor that increases the perceived width of the stereo image by manipulating the mid-side (M/S) representation of the signal. It can expand or contract the stereo field while maintaining mono compatibility and allowing independent control over different frequency ranges.

_register_default_parameters()[source]

Register the width parameter.

Sets up the width parameter with range:
  • 0.0: Mono (collapse to center)

  • 0.5: No change (original stereo)

  • 1.0: Enhanced stereo (maximum width)

process(x, norm_params=None, dsp_params=None)[source]

Process input signal through the stereo widener.

Parameters:
  • x (torch.Tensor) – Input audio tensor. Shape: (batch, 2, samples)

  • norm_params (Dict[str, torch.Tensor]) –

    Normalized parameters (0 to 1) Must contain the following keys: - ‘width’: Stereo width control (0 to 1)

    0.0: Mono/centered 0.5: Original stereo width 1.0: Maximum width

    Each value should be a tensor of shape (batch_size,)

  • dsp_params (Dict[str, Union[float, torch.Tensor]], optional) – Direct DSP parameters. Can specify widener parameters as: - float/int: Single value applied to entire batch - 0D tensor: Single value applied to entire batch - 1D tensor: Batch of values matching input batch size Parameters will be automatically expanded to match batch size and moved to input device if necessary. If provided, norm_params must be None.

Returns:

Processed stereo audio tensor. Shape: (batch, 2, samples)

Return type:

torch.Tensor

Raises:

AssertionError – If input is not stereo (two channels)

StereoImager

class diffFx_pytorch.processors.spatial.StereoImager(sample_rate, param_range=None, num_bands=3)[source]

Bases: ProcessorsBase

Differentiable implementation of a multi-band stereo imaging processor.

This processor implements frequency-dependent stereo width control using mid-side (M/S) processing combined with Linkwitz-Riley crossover filters. It allows independent width control over multiple frequency bands, enabling precise stereo field manipulation across the frequency spectrum.

The processor splits the signal into frequency bands using a series of Linkwitz-Riley crossover filters, processes each band’s stereo width independently, then recombines the bands.

Processing Chain:
  1. Convert L/R to M/S representation

  2. Split M/S signals into frequency bands using crossovers

  3. Apply independent width control to each band

  4. Sum processed bands

  5. Convert back to L/R representation

The width control for each band follows:

\[ \begin{align}\begin{aligned}M_{out} = M_{in} * 2(1 - width)\\S_{out} = S_{in} * 2(width)\end{aligned}\end{align} \]
where:
  • M is the mid (mono) signal for the band

  • S is the side (difference) signal for the band

  • width is the stereo width control parameter for that band

Parameters:
  • sample_rate (int) – Audio sample rate in Hz

  • num_bands (int) – Number of frequency bands. Defaults to 3.

crossovers

List of Linkwitz-Riley crossover filters

Type:

nn.ModuleList

num_bands

Number of frequency bands

Type:

int

Parameters Details:
For each band i:
bandX_width: Stereo width control for band X
  • 0.0: Mono (only mid signal)

  • 0.5: Original stereo

  • 1.0: Maximum width (enhanced side signal)

For each crossover i:
crossoverX_freq: Crossover frequency between bands X and X+1
  • Frequency range scales with band number

  • Default ranges follow standard mastering crossover points

  • Min frequency doubles for each successive crossover

  • Max frequency is limited to 20kHz

Note

  • Input must be stereo (two channels)

  • Uses energy-preserving M/S conversion matrices

  • Linkwitz-Riley crossovers ensure phase coherence

  • Total number of parameters = 2 * num_bands - 1

  • Width controls affect the ratio of mid to side signal per band

Warning

When using with neural networks:
  • norm_params must be in range [0, 1]

  • Parameters will be automatically mapped to their ranges

  • Ensure your network output is properly normalized (e.g., using sigmoid)

  • Parameter order must match _register_default_parameters()

Examples

Basic DSP Usage:
>>> # Create a 3-band stereo imager
>>> imager = StereoImager(
...     sample_rate=44100,
...     num_bands=3
... )
>>> # Process with different width for each band
>>> output = imager(input_audio, dsp_params={
...     'band0_width': 0.3,  # Reduce width in low frequencies
...     'band1_width': 0.5,  # Keep mids unchanged
...     'band2_width': 0.8,  # Enhance width in highs
...     'crossover0_freq': 200.0,  # Low/mid crossover
...     'crossover1_freq': 2000.0  # Mid/high crossover
... })
Neural Network Control:
>>> # 1. Simple parameter prediction
>>> class ImagerController(nn.Module):
...     def __init__(self, input_size, num_params):
...         super().__init__()
...         self.net = nn.Sequential(
...             nn.Linear(input_size, 32),
...             nn.ReLU(),
...             nn.Linear(32, num_params),
...             nn.Sigmoid()  # Ensures output is in [0,1] range
...         )
...
...     def forward(self, x):
...         return self.net(x)
>>>
>>> # Initialize controller
>>> imager = StereoImager(num_bands=3)
>>> num_params = imager.count_num_parameters()  # 5 parameters for 3 bands
>>> controller = ImagerController(input_size=16, num_params=num_params)
>>>
>>> # Process with features
>>> features = torch.randn(batch_size, 16)  # Audio features
>>> norm_params = controller(features)
>>> output = imager(input_audio, norm_params=norm_params)
Parameters:

param_range (Optional[Dict[str, EffectParam]]) –

A multi-band stereo processing tool that provides independent control over the stereo width in different frequency bands. It uses mid-side processing with crossover filters to allow precise adjustment of the stereo image across the frequency spectrum, enabling frequency-dependent stereo manipulation.

__init__(sample_rate, param_range=None, num_bands=3)[source]

Initialize the processor base.

Parameters:
  • sample_rate – Audio sample rate in Hz

  • param_range (Optional[Dict[str, EffectParam]]) – Optional parameter definitions to override defaults

_register_default_parameters()[source]

Register parameters for band widths and crossover frequencies.

Sets up:
  • Width control for each frequency band (0.0 to 1.0)

  • Crossover frequencies between bands (frequency ranges scale with band)

_apply_width(mid, side, width)[source]

Apply stereo width processing to mid/side signals for a single band.

Parameters:
  • mid (torch.Tensor) – Mid signal for the band. Shape: (batch, 1, samples)

  • side (torch.Tensor) – Side signal for the band. Shape: (batch, 1, samples)

  • width (torch.Tensor) – Width control parameter. Shape: (batch,)

Returns:

Processed (mid, side) signals

Return type:

Tuple[torch.Tensor, torch.Tensor]

Note

Scales mid and side signals to maintain constant energy across width settings

process(x, norm_params=None, dsp_params=None)[source]

Process input signal through the multi-band stereo imager.

Parameters:
  • x (torch.Tensor) – Input audio tensor. Shape: (batch, 2, samples)

  • norm_params (Dict[str, torch.Tensor]) – Normalized parameters (0 to 1) Must contain the following keys: - ‘bandi_width’: Width control for band i (0 to 1) - ‘crossoveri_freq’: Frequency between band i and band i+1 (0 to 1) Each value should be a tensor of shape (batch_size,)

  • dsp_params (Dict[str, Union[float, torch.Tensor]], optional) – Direct DSP parameters. Can specify imager parameters as: - float/int: Single value applied to entire batch - 0D tensor: Single value applied to entire batch - 1D tensor: Batch of values matching input batch size Parameters will be automatically expanded to match batch size and moved to input device if necessary. If provided, norm_params must be None.

Returns:

Processed stereo audio tensor. Shape: (batch, 2, samples)

Return type:

torch.Tensor

Raises:

AssertionError – If input is not stereo (two channels)

StereoEnhancer

class diffFx_pytorch.processors.spatial.StereoEnhancer(sample_rate=44100, param_range=None)[source]

Bases: ProcessorsBase

Differentiable implementation of stereo enhancement using the Haas effect.

This processor implements stereo enhancement using the Haas effect (precedence effect), which creates an enhanced sense of stereo width by introducing small time delays between channels. The implementation combines mid-side processing with frequency-domain delay to achieve precise control over the stereo image.

The Haas effect exploits the human auditory system’s precedence effect, where delays between 1-30ms affect spatial perception without creating distinct echoes. The processor applies the delay in the frequency domain for artifact-free time shifting.

Processing Chain:
  1. Convert L/R to M/S representation

  2. Apply frequency-domain delay to side signal

  3. Apply width scaling to delayed side signal

  4. Convert back to L/R representation

The frequency domain delay is implemented as:

\[S_{delayed}(f) = S(f) * e^{-j2\pi f \tau}\]
where:
  • S(f) is the side signal in frequency domain

  • f is frequency

  • τ is the delay time in seconds

  • Phase is unwrapped to ensure continuous delay

Parameters:

sample_rate (int) – Audio sample rate in Hz

Parameters Details:
delay_ms: Delay time for the Haas effect
  • Range: 0 to 30 milliseconds

  • Values around 10-15ms typically most effective

  • Controls the perceived spatial width

  • Based on psychoacoustic precedence effect

width: Overall stereo width control
  • Range: 0.0 to 1.0

  • 0.5: No enhancement (original signal)

  • 1.0: Maximum enhancement

  • Scales the processed side signal

Note

  • Input must be stereo (two channels)

  • Uses frequency domain processing for precise delays

  • Phase unwrapping ensures continuous delay response

  • Delay range chosen based on psychoacoustic research

  • Maintains mono compatibility

  • Most effective on transient-rich material

Warning

When using with neural networks:
  • norm_params must be in range [0, 1]

  • Parameters will be automatically mapped to their ranges

  • Ensure your network output is properly normalized (e.g., using sigmoid)

  • Parameter order must match _register_default_parameters()

High delay values (>20ms) may cause noticeable separation of channels, particularly on transient material.

Examples

Basic DSP Usage:
>>> # Create a stereo enhancer
>>> enhancer = StereoEnhancer(sample_rate=44100)
>>> # Process with moderate Haas effect
>>> output = enhancer(input_audio, dsp_params={
...     'delay_ms': 12.0,  # 12ms delay for natural width
...     'width': 0.7      # 70% enhancement amount
... })
Neural Network Control:
>>> # 1. Simple parameter prediction
>>> class EnhancerController(nn.Module):
...     def __init__(self, input_size):
...         super().__init__()
...         self.net = nn.Sequential(
...             nn.Linear(input_size, 32),
...             nn.ReLU(),
...             nn.Linear(32, 2),  # 2 parameters: delay and width
...             nn.Sigmoid()  # Ensures output is in [0,1] range
...         )
...
...     def forward(self, x):
...         return self.net(x)
>>>
>>> # Initialize controller
>>> enhancer = StereoEnhancer(sample_rate=44100)
>>> controller = EnhancerController(input_size=16)
>>>
>>> # Process with features
>>> features = torch.randn(batch_size, 16)  # Audio features
>>> norm_params = controller(features)
>>> output = enhancer(input_audio, norm_params=norm_params)
Parameters:

param_range (Optional[Dict[str, EffectParam]]) –

A stereo enhancement processor that implements the Haas effect (also known as the precedence effect or law of the first wavefront) to create a wider stereo image. It operates by introducing small time delays (typically 5-35ms) between the left and right channels, exploiting the human auditory system’s spatial perception mechanisms. When one channel is delayed relative to the other within this specific time window, the sound appears to come from the direction of the first-arriving sound while maintaining the loudness contribution from both channels, resulting in an enhanced sense of width without phantom center image collapse.

_register_default_parameters()[source]

Register delay and width parameters.

Sets up:
  • delay_ms: Haas effect delay time (0 to 30 ms)

  • width: Enhancement amount (0.0 to 1.0)

process(x, norm_params=None, dsp_params=None)[source]

Process input signal through the stereo enhancer.

Parameters:
  • x (torch.Tensor) – Input audio tensor. Shape: (batch, 2, samples)

  • norm_params (Dict[str, torch.Tensor]) – Normalized parameters (0 to 1) Must contain the following keys: - ‘delay_ms’: Delay time for side signal (0 to 1) - ‘width’: Stereo width enhancement (0 to 1) Each value should be a tensor of shape (batch_size,)

  • dsp_params (Dict[str, Union[float, torch.Tensor]], optional) – Direct DSP parameters. Can specify enhancer parameters as: - float/int: Single value applied to entire batch - 0D tensor: Single value applied to entire batch - 1D tensor: Batch of values matching input batch size Parameters will be automatically expanded to match batch size and moved to input device if necessary. If provided, norm_params must be None.

Returns:

Processed stereo audio tensor. Shape: (batch, 2, samples)

Return type:

torch.Tensor

Raises:

AssertionError – If input is not stereo (two channels)