Diffusers

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v0.39.0).

Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

LatteTransformer3DModel

A Diffusion Transformer model for 3D data from Latte.

LatteTransformer3DModel

class diffusers.LatteTransformer3DModel

< source >

( num_attention_heads: int = 16attention_head_dim: int = 88in_channels: int | None = Noneout_channels: int | None = Nonenum_layers: int = 1dropout: float = 0.0cross_attention_dim: int | None = Noneattention_bias: bool = Falsesample_size: int = 64patch_size: int | None = Noneactivation_fn: str = 'geglu'num_embeds_ada_norm: int | None = Nonenorm_type: str = 'layer_norm'norm_elementwise_affine: bool = Truenorm_eps: float = 1e-05caption_channels: int = Nonevideo_length: int = 16 )

forward

< source >

( hidden_states: Tensortimestep: torch.LongTensor | None = Noneencoder_hidden_states: torch.Tensor | None = Noneencoder_attention_mask: torch.Tensor | None = Noneenable_temporal_attentions: bool = Truereturn_dict: bool = True )

Parameters

hidden_states (torch.Tensor of shape (batch size, channel, num_frame, height, width)) — Input hidden_states.
timestep ( torch.LongTensor, optional) — Used to indicate denoising step. Optional timestep to be applied as an embedding in AdaLayerNorm.
encoder_hidden_states ( torch.FloatTensor of shape (batch size, sequence len, embed dims), optional) — Conditional embeddings for cross attention layer. If not given, cross-attention defaults to self-attention.
encoder_attention_mask ( torch.Tensor, optional) — Cross-attention mask applied to encoder_hidden_states. Two formats supported:
- Mask (batcheight, sequence_length) True = keep, False = discard.
- Bias (batcheight, 1, sequence_length) 0 = keep, -10000 = discard.
If ndim == 2: will be interpreted as a mask, then converted into a bias consistent with the format above. This bias will be added to the cross-attention scores.
enable_temporal_attentions — (bool, optional, defaults to True): Whether to enable temporal attentions.
return_dict (bool, optional, defaults to True) — Whether or not to return a ~models.unet_2d_condition.UNet2DConditionOutput instead of a plain tuple.

The LatteTransformer3DModel forward method.

Update on GitHub

←Krea2Transformer2DModel LongCatImageTransformer2DModel→