Diffusers documentation

HiDreamImageTransformer2DModel

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v0.39.0).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

HiDreamImageTransformer2DModel

A Transformer model for image-like data from HiDream-I1.

The model can be loaded with the following code snippet.

from diffusers import HiDreamImageTransformer2DModel

transformer = HiDreamImageTransformer2DModel.from_pretrained("HiDream-ai/HiDream-I1-Full", subfolder="transformer", torch_dtype=torch.bfloat16)

Loading GGUF quantized checkpoints for HiDream-I1

GGUF checkpoints for the HiDreamImageTransformer2DModel can be loaded using ~FromOriginalModelMixin.from_single_file

import torch
from diffusers import GGUFQuantizationConfig, HiDreamImageTransformer2DModel

ckpt_path = "https://huggingface.co/city96/HiDream-I1-Dev-gguf/blob/main/hidream-i1-dev-Q2_K.gguf"
transformer = HiDreamImageTransformer2DModel.from_single_file(
    ckpt_path,
    quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16),
    torch_dtype=torch.bfloat16
)

HiDreamImageTransformer2DModel

class diffusers.HiDreamImageTransformer2DModel

< >

( patch_size: int | None = Nonein_channels: int = 64out_channels: int | None = Nonenum_layers: int = 16num_single_layers: int = 32attention_head_dim: int = 128num_attention_heads: int = 20caption_channels: list = Nonetext_emb_dim: int = 2048num_routed_experts: int = 4num_activated_experts: int = 2axes_dims_rope: tuple = (32, 32)max_resolution: tuple = (128, 128)llama_layers: list = Noneforce_inference_output: bool = False )

forward

< >

( hidden_states: Tensortimesteps: LongTensor = Noneencoder_hidden_states_t5: Tensor = Noneencoder_hidden_states_llama3: Tensor = Nonepooled_embeds: Tensor = Noneimg_ids: torch.Tensor | None = Noneimg_sizes: list[tuple[int, int]] | None = Nonehidden_states_masks: torch.Tensor | None = Noneattention_kwargs: dict[str, typing.Any] | None = Nonereturn_dict: bool = True**kwargs )

Parameters

  • hidden_states (torch.Tensor of shape (batch_size, in_channels, height, width) or (batch_size, patch_height * patch_width, patch_size * patch_size * channels)) — Input hidden_states.
  • timesteps (torch.LongTensor) — Used to indicate denoising step.
  • encoder_hidden_states_t5 (torch.Tensor) — Conditional embeddings computed from the T5 text encoder.
  • encoder_hidden_states_llama3 (torch.Tensor) — Conditional embeddings computed from the Llama3 text encoder.
  • pooled_embeds (torch.Tensor) — Pooled text embeddings used for additional conditioning.
  • img_ids (torch.Tensor, optional) — Image position ids for the patched hidden states.
  • img_sizes (list of tuple of int, optional) — Per-sample patch grid sizes used to unpatchify the output.
  • hidden_states_masks (torch.Tensor, optional) — Mask over patched hidden_states.
  • attention_kwargs (dict, optional) — A kwargs dictionary that if specified is passed along to the AttentionProcessor as defined under self.processor in diffusers.models.attention_processor.
  • return_dict (bool, optional, defaults to True) — Whether or not to return a ~models.transformer_2d.Transformer2DModelOutput instead of a plain tuple.

The HiDreamImageTransformer2DModel forward method.

Transformer2DModelOutput

class diffusers.models.modeling_outputs.Transformer2DModelOutput

< >

( sample: torch.Tensor )

Parameters

  • sample (torch.Tensor of shape (batch_size, num_channels, height, width) or (batch size, num_vector_embeds - 1, num_latent_pixels) if Transformer2DModel is discrete) — The hidden states output conditioned on the encoder_hidden_states input. If discrete, returns probability distributions for the unnoised latent pixels.

The output of Transformer2DModel.

Update on GitHub