Diffuser Classes

Pipeline API

`QEffTextEncoder`

class QEfficient.diffusers.pipelines.pipeline_module.QEffTextEncoder(model: Module)[source]

Wrapper for text encoder models with ONNX export and QAIC compilation capabilities.

This class handles text encoder models (CLIP, T5) with specific transformations and optimizations for efficient inference on Qualcomm AI hardware. It applies custom PyTorch and ONNX transformations to prepare models for deployment.

model

The wrapped text encoder model (deep copy of original)

Type:: nn.Module

_pytorch_transforms

PyTorch transformations applied before ONNX export

Type:: List

_onnx_transforms

ONNX transformations applied after export

Type:: List

compile(specializations: List[Dict], **compiler_options) → None[source]

Compile the ONNX model for Qualcomm AI hardware.

Parameters:

specializations (List[Dict]) – Model specialization configurations
**compiler_options – Additional compiler options (e.g., num_cores, aic_num_of_activations)

export(inputs: Dict, output_names: List[str], dynamic_axes: Dict, export_dir: str = None, export_kwargs: Dict = {}) → str[source]

Export the text encoder model to ONNX format.

Parameters:

inputs (Dict) – Example inputs for ONNX export
output_names (List[str]) – Names of model outputs
dynamic_axes (Dict) – Specification of dynamic dimensions
export_dir (str, optional) – Directory to save ONNX model
export_kwargs (Dict, optional) – Additional export arguments

Returns:

Path to the exported ONNX model

Return type:

str

property get_model_config: Dict

Get the model configuration as a dictionary.

Returns:: The configuration dictionary of the underlying text encoder model
Return type:: Dict

get_onnx_params() → Tuple[Dict, Dict, List[str]][source]

Generate ONNX export configuration for the text encoder.

Creates example inputs, dynamic axes specifications, and output names tailored to the specific text encoder type (CLIP vs T5).

Returns:

example_inputs (Dict): Sample inputs for ONNX export
dynamic_axes (Dict): Specification of dynamic dimensions
output_names (List[str]): Names of model outputs

Return type:

Tuple containing

`QEffUNet`

class QEfficient.diffusers.pipelines.pipeline_module.QEffUNet(model: Module)[source]

Wrapper for UNet models with ONNX export and QAIC compilation capabilities.

This class handles UNet models with specific transformations and optimizations for efficient inference on Qualcomm AI hardware. UNet is commonly used in diffusion models for image generation tasks.

model

The wrapped UNet model

Type:: nn.Module

_pytorch_transforms

PyTorch transformations applied before ONNX export

Type:: List

_onnx_transforms

ONNX transformations applied after export

Type:: List

compile(specializations: List[Dict], **compiler_options) → None[source]

Compile the ONNX model for Qualcomm AI hardware.

Parameters:

specializations (List[Dict]) – Model specialization configurations
**compiler_options – Additional compiler options

export(inputs: Dict, output_names: List[str], dynamic_axes: Dict, export_dir: str = None, export_kwargs: Dict = {}) → str[source]

Export the UNet model to ONNX format.

Parameters:

inputs (Dict) – Example inputs for ONNX export
output_names (List[str]) – Names of model outputs
dynamic_axes (Dict) – Specification of dynamic dimensions
export_dir (str, optional) – Directory to save ONNX model
export_kwargs (Dict, optional) – Additional export arguments

Returns:

Path to the exported ONNX model

Return type:

str

property get_model_config: Dict

Get the model configuration as a dictionary.

Returns:: The configuration dictionary of the underlying UNet model
Return type:: Dict

`QEffVAE`

class QEfficient.diffusers.pipelines.pipeline_module.QEffVAE(model: Module, type: str)[source]

Wrapper for Variational Autoencoder (VAE) models with ONNX export and QAIC compilation.

This class handles VAE models with specific transformations and optimizations for efficient inference on Qualcomm AI hardware. VAE models are used in diffusion pipelines for encoding images to latent space and decoding latents back to images.

model

The wrapped VAE model (deep copy of original)

Type:: nn.Module

type

VAE operation type (“encoder” or “decoder”)

Type:: str

_pytorch_transforms

PyTorch transformations applied before ONNX export

Type:: List

_onnx_transforms

ONNX transformations applied after export

Type:: List

compile(specializations: List[Dict], **compiler_options) → None[source]

Compile the ONNX model for Qualcomm AI hardware.

Parameters:

specializations (List[Dict]) – Model specialization configurations
**compiler_options – Additional compiler options

export(inputs: Dict, output_names: List[str], dynamic_axes: Dict, export_dir: str = None, export_kwargs: Dict = {}) → str[source]

Export the VAE model to ONNX format.

Parameters:

inputs (Dict) – Example inputs for ONNX export
output_names (List[str]) – Names of model outputs
dynamic_axes (Dict) – Specification of dynamic dimensions
export_dir (str, optional) – Directory to save ONNX model
export_kwargs (Dict, optional) – Additional export arguments

Returns:

Path to the exported ONNX model

Return type:

str

get_img_encoder_onnx_params() → Tuple[Dict, Dict, List[str]][source]

Generate ONNX export configuration for the VAE Encoder.

Returns:

example_inputs (Dict): Sample inputs for ONNX export
dynamic_axes (Dict): Specification of dynamic dimensions
output_names (List[str]): Names of model outputs

Return type:

Tuple containing

property get_model_config: Dict

Get the model configuration as a dictionary.

Returns:: The configuration dictionary of the underlying VAE model
Return type:: Dict

get_onnx_params(latent_height: int = 32, latent_width: int = 32) → Tuple[Dict, Dict, List[str]][source]

Generate ONNX export configuration for the VAE decoder.

Parameters:

latent_height (int) – Height of latent representation (default: 32)
latent_width (int) – Width of latent representation (default: 32)

Returns:

example_inputs (Dict): Sample inputs for ONNX export
dynamic_axes (Dict): Specification of dynamic dimensions
output_names (List[str]): Names of model outputs

Return type:

Tuple containing

get_video_onnx_params() → Tuple[Dict, Dict, List[str]][source]

Generate ONNX export configuration for the VAE decoder.

Parameters:

latent_height (int) – Height of latent representation (default: 32)
latent_width (int) – Width of latent representation (default: 32)

Returns:

example_inputs (Dict): Sample inputs for ONNX export
dynamic_axes (Dict): Specification of dynamic dimensions
output_names (List[str]): Names of model outputs

Return type:

Tuple containing

`QEffFluxTransformerModel`

class QEfficient.diffusers.pipelines.pipeline_module.QEffFluxTransformerModel(model: Module)[source]

Wrapper for Flux Transformer2D models with ONNX export and QAIC compilation capabilities.

This class handles Flux Transformer2D models with specific transformations and optimizations for efficient inference on Qualcomm AI hardware. Flux uses a transformer-based diffusion architecture instead of traditional UNet, with dual transformer blocks and adaptive layer normalization (AdaLN) for conditioning.

model

The wrapped Flux transformer model

Type:: nn.Module

_pytorch_transforms

PyTorch transformations applied before ONNX export

Type:: List

_onnx_transforms

ONNX transformations applied after export

Type:: List

compile(specializations: List[Dict], **compiler_options) → None[source]

Compile the ONNX model for Qualcomm AI hardware.

Parameters:

specializations (List[Dict]) – Model specialization configurations
**compiler_options – Additional compiler options (e.g., num_cores, aic_num_of_activations)

export(inputs: Dict, output_names: List[str], dynamic_axes: Dict, export_dir: str = None, use_onnx_subfunctions: bool = False) → str[source]

Export the Flux transformer model to ONNX format.

Parameters:

inputs (Dict) – Example inputs for ONNX export
output_names (List[str]) – Names of model outputs
dynamic_axes (Dict) – Specification of dynamic dimensions
export_dir (str, optional) – Directory to save ONNX model
use_onnx_subfunctions (bool) – Whether to export transformer blocks as ONNX functions for better modularity and potential optimization

Returns:

Path to the exported ONNX model

Return type:

str

property get_model_config: Dict

Get the model configuration as a dictionary.

Returns:: The configuration dictionary of the underlying Flux transformer model
Return type:: Dict

get_onnx_params(batch_size: int = 1, seq_length: int = 256, cl: int = 4096) → Tuple[Dict, Dict, List[str]][source]

Generate ONNX export configuration for the Flux transformer.

Creates example inputs for all Flux-specific inputs including hidden states, text embeddings, timestep conditioning, and AdaLN embeddings.

Parameters:

batch_size (int) – Batch size for example inputs (default: FLUX_ONNX_EXPORT_BATCH_SIZE)
seq_length (int) – Text sequence length (default: FLUX_ONNX_EXPORT_SEQ_LENGTH)
cl (int) – Compressed latent dimension (default: FLUX_ONNX_EXPORT_COMPRESSED_LATENT_DIM)

Returns:

example_inputs (Dict): Sample inputs for ONNX export
dynamic_axes (Dict): Specification of dynamic dimensions
output_names (List[str]): Names of model outputs

Return type:

Tuple containing

`QEffWanUnifiedTransformer`

class QEfficient.diffusers.pipelines.pipeline_module.QEffWanUnifiedTransformer(unified_transformer)[source]

Wrapper for WAN Unified Transformer with ONNX export and QAIC compilation capabilities.

This class handles the unified WAN transformer model that combines high and low noise transformers into a single model for efficient deployment. Based on the timestep shape, the model dynamically selects between high and low noise transformers during inference.

The wrapper applies specific transformations and optimizations for efficient inference on Qualcomm AI hardware, particularly for video diffusion models.

model

The QEffWanUnifiedWrapper model that combines high/low noise transformers

Type:: nn.Module

_pytorch_transforms

PyTorch transformations applied before ONNX export

Type:: List

_onnx_transforms

ONNX transformations applied after export

Type:: List

compile(specializations, **compiler_options) → None[source]

Compile the ONNX model for Qualcomm AI hardware.

Parameters:

specializations (List[Dict]) – Model specialization configurations
**compiler_options – Additional compiler options (e.g., num_cores, aic_num_of_activations)

export(inputs: Dict, output_names: List[str], dynamic_axes: Dict, export_dir: str = None, use_onnx_subfunctions: bool = False) → str[source]

Export the Wan transformer model to ONNX format.

Parameters:

inputs (Dict) – Example inputs for ONNX export
output_names (List[str]) – Names of model outputs
dynamic_axes (Dict) – Specification of dynamic dimensions
export_dir (str, optional) – Directory to save ONNX model
use_onnx_subfunctions (bool) – Whether to export transformer blocks as ONNX functions for better modularity and potential optimization

Returns:

Path to the exported ONNX model

Return type:

str

property get_model_config: Dict

Get the model configuration as a dictionary.

Returns:: The configuration dictionary of the underlying Wan transformer model
Return type:: Dict

get_onnx_params()[source]

Generate ONNX export configuration for the Wan transformer.

Creates example inputs for all Wan-specific inputs including hidden states, text embeddings, timestep conditioning, :returns: - example_inputs (Dict): Sample inputs for ONNX export

dynamic_axes (Dict): Specification of dynamic dimensions

output_names (List[str]): Names of model outputs

Return type:: Tuple containing

Model Classes

`QEffWanPipeline`

WAN supports two execution architectures:

use_unified=True (default): one unified transformer module.
use_unified=False: separate transformer_high and transformer_low modules.

First-block-cache is currently supported only for non-unified WAN:

from QEfficient import QEffWanPipeline

pipeline = QEffWanPipeline.from_pretrained(
    "Wan-AI/Wan2.2-T2V-A14B-Diffusers",
    use_unified=False,
    enable_first_block_cache=True,
    first_block_cache_downsample_factor=4,
)

output = pipeline(
    prompt="A cat playing in a sunny garden",
    cache_threshold_high=0.1,
    cache_threshold_low=0.065,
)

See examples:

examples/diffusers/wan/wan_lightning.py
examples/diffusers/wan/wan_lightning_custom.py
examples/diffusers/wan/wan_first_block_cache.py

class QEfficient.diffusers.pipelines.wan.pipeline_wan.QEffWanPipeline(model, use_unified: bool = True, enable_first_block_cache: bool = False, first_block_cache_downsample_factor: int = 4, **kwargs)[source]

QEfficient-optimized WAN pipeline for high-performance text-to-video generation on Qualcomm AI hardware.

This pipeline provides an optimized implementation of the WAN diffusion model specifically designed for deployment on Qualcomm AI Cloud (QAIC) devices. It extends the original HuggingFace WAN model with QEfficient-optimized components that can be exported to ONNX format and compiled into Qualcomm Program Container (QPC) files for efficient video generation.

The pipeline supports the complete WAN workflow including: - UMT5 text encoding for rich semantic understanding - Unified transformer architecture: Combines multiple transformer stages into a single optimized model - VAE decoding for final video output - Performance monitoring and hardware optimization

text_encoder: UMT5 text encoder for semantic text understanding (TODO: QEfficient optimization)

unified_wrapper

Wrapper combining transformer stages (unified mode)

Type:: QEffWanUnifiedWrapper

transformer

Optimized unified transformer for denoising (unified mode)

Type:: QEffWanUnifiedTransformer

transformer_high

High-noise transformer module (non-unified mode)

Type:: QEffWanTransformer

transformer_low

Low-noise transformer module (non-unified mode)

Type:: QEffWanTransformer

vae_decode: VAE decoder for latent-to-video conversion

modules

Dictionary of pipeline modules for batch operations

Type:: Dict[str, Any]

model

Original HuggingFace WAN model reference

Type:: WanPipeline

tokenizer: Text tokenizer for preprocessing

scheduler: Diffusion scheduler for timestep management

Example

>>> from QEfficient.diffusers.pipelines.wan import QEffWanPipeline
>>> pipeline = QEffWanPipeline.from_pretrained("path/to/wan/model")
>>> videos = pipeline(
...     prompt="A cat playing in a garden",
...     height=480,
...     width=832,
...     num_frames=81,
...     num_inference_steps=4
... )
>>> # Save generated video
>>> videos.images[0].save("generated_video.mp4")

compile(compile_config: str | None = None, parallel: bool = False, height: int = 48, width: int = 64, num_frames: int = 81, use_onnx_subfunctions: bool = False) → str[source]

Compiles the ONNX graphs of the different model components for deployment on Qualcomm AI hardware.

This method takes the ONNX paths of the transformer and compiles them into an optimized format for inference using JSON-based configuration.

Parameters:

compile_config (str, optional) – Path to a JSON configuration file containing compilation settings, device mappings, and optimization parameters. If None, uses the default configuration.
parallel (bool, default=False) – Compilation mode selection: - True: Compile modules in parallel using ThreadPoolExecutor for faster processing - False: Compile modules sequentially for lower resource usage
height (int, default=192) – Target image height in pixels.
width (int, default=320) – Target image width in pixels.
num_frames (int, deafult=81) – Target num of frames in pixel space
use_onnx_subfunctions (bool, default=False) – Whether to export models with ONNX subfunctions before compilation if not already exported.

Raises:

RuntimeError – If compilation fails for any module or if QAIC compiler is not available
FileNotFoundError – If ONNX models haven’t been exported or config file is missing
ValueError – If configuration parameters are invalid
OSError – If there are issues with file I/O during compilation

Example

>>> pipeline = QEffWanPipeline.from_pretrained("Wan-AI/Wan2.2-T2V-A14B-Diffusers")
>>> # Sequential compilation with default config
>>> pipeline.compile(height=480, width=832, num_frames=81)
>>>
>>> # Parallel compilation with custom config
>>> pipeline.compile(
...     compile_config="/path/to/custom_config.json",
...     parallel=True,
...     height=480,
...     width=832,
...     num_frames=81
... )

property do_classifier_free_guidance

Determine if classifier-free guidance should be used.

Returns:: True if CFG should be applied based on current guidance scales
Return type:: bool

export(export_dir: str | None = None, use_onnx_subfunctions: bool = False) → str[source]

Export all pipeline modules to ONNX format for deployment preparation.

This method systematically exports the unified transformer to ONNX format with video-specific configurations including temporal dimensions, dynamic axes, and optimization settings. The export process prepares the model for subsequent compilation to QPC format for efficient inference on QAIC hardware.

Parameters:

export_dir (str, optional) – Target directory for saving ONNX model files. If None, uses the default export directory structure. The directory will be created if it doesn’t exist.
use_onnx_subfunctions (bool, default=False) – Whether to enable ONNX subfunction optimization for supported modules. This can optimize the graph structure and improve compilation efficiency for complex models like the transformer.

Returns:

Absolute path to the export directory containing all ONNX model files.

Return type:

str

Raises:

RuntimeError – If ONNX export fails for any module
OSError – If there are issues creating the export directory or writing files
ValueError – If module configurations are invalid

Example

>>> pipeline = QEffWanPipeline.from_pretrained("Wan-AI/Wan2.2-T2V-A14B-Diffusers")
>>> export_path = pipeline.export(
...     export_dir="/path/to/export",
...     use_onnx_subfunctions=True
... )

classmethod from_pretrained(pretrained_model_name_or_path: str | PathLike | None, use_unified: bool = True, enable_first_block_cache: bool = False, first_block_cache_downsample_factor: int = 4, **kwargs)[source]

Load a pretrained WAN model from HuggingFace Hub or local path and wrap it with QEfficient optimizations.

This class method provides a convenient way to instantiate a QEffWanPipeline from a pretrained WAN model. It automatically loads the base WanPipeline model in float32 precision on CPU and wraps all components with QEfficient-optimized versions for QAIC deployment.

Parameters:

pretrained_model_name_or_path (str or os.PathLike) – Either a HuggingFace model identifier or a local path to a saved WAN model directory. Should contain transformer, transformer_2, text_encoder, and VAE components.
use_unified (bool, optional) – Selects WAN execution architecture. - True: unified high/low transformer module - False: separate high and low transformer modules
enable_first_block_cache (bool, optional) – Enables retained-state first-block-cache for non-unified mode.
first_block_cache_downsample_factor (int, optional) – Downsample factor for first-block cache key when cache is enabled.
**kwargs – Additional keyword arguments passed to WanPipeline.from_pretrained().

Returns:

A fully initialized pipeline instance with QEfficient-optimized components: ready for export, compilation, and inference on QAIC devices.

Return type:

QEffWanPipeline

Raises:

ValueError – If the model path is invalid or model cannot be loaded
OSError – If there are issues accessing the model files
RuntimeError – If model initialization fails

Example

>>> # Load from HuggingFace Hub
>>> pipeline = QEffWanPipeline.from_pretrained("Wan-AI/Wan2.2-T2V-A14B-Diffusers")
>>>
>>> # Load from local path
>>> pipeline = QEffWanPipeline.from_pretrained("/local/path/to/wan")
>>>
>>> # Load with custom cache directory
>>> pipeline = QEffWanPipeline.from_pretrained(
...     "wan-model-id",
...     cache_dir="/custom/cache/dir"
... )

get_default_config_path()[source]

Get the default configuration file path for WAN pipeline.

Returns:: Path to the default WAN configuration JSON file.
Return type:: str

`QEffWanImageToVideoPipeline`

class QEfficient.diffusers.pipelines.wan.pipeline_wan_i2v.QEffWanImageToVideoPipeline(model, **kwargs)[source]

QEfficient-optimized WAN image-to-video pipeline for high-performance video generation on Qualcomm AI hardware.

This pipeline provides an optimized implementation of the WAN image-to-video diffusion model specifically designed for deployment on Qualcomm AI Cloud (QAIC) devices. It extends the original HuggingFace WAN image-to-video model with QEfficient-optimized components that can be exported to ONNX format and compiled into Qualcomm Program Container (QPC) files for efficient video generation from static images.

The pipeline supports the complete WAN image-to-video workflow including: - Image conditioning and preprocessing for temporal consistency - UMT5 text encoding for rich semantic understanding - Unified transformer architecture: Combines multiple transformer stages into a single optimized model - VAE encoding/decoding for image-to-latent and latent-to-video conversion

text_encoder: UMT5 text encoder for semantic text understanding (TODO: QEfficient optimization)

vae_encoder

VAE encoder for converting input images to latent space

Type:: QEffVAE

unified_wrapper

Wrapper combining transformer stages

Type:: QEffWanUnifiedWrapper

transformer

Optimized unified transformer for denoising

Type:: QEffWanUnifiedTransformer

vae_decoder

VAE decoder for latent-to-video conversion

Type:: QEffVAE

modules

Dictionary of pipeline modules for batch operations

Type:: Dict[str, Any]

model

Original HuggingFace WAN I2V model reference

Type:: WanImageToVideoPipeline

tokenizer: Text tokenizer for preprocessing

scheduler: Diffusion scheduler for timestep management

Example

>>> from QEfficient.diffusers.pipelines.wan import QEffWanImageToVideoPipeline
>>> from PIL import Image
>>>
>>> # Load pipeline and input image
>>> pipeline = QEffWanImageToVideoPipeline.from_pretrained("Wan-AI/Wan2.2-I2V-A14B-Diffusers")
>>> image = Image.open("input_frame.jpg")
>>>
>>> # Generate video with motion
>>> result = pipeline(
...     image=image,
...     prompt="A person walking through a sunny garden with flowing motion",
...     height=544,
...     width=720,
...     num_frames=81,
...     num_inference_steps=4,
...     guidance_scale=1.0
... )
>>> # Save generated video
>>> frames = result.images[0]
>>> export_to_video(frames, "generated_video.mp4", fps=16)

compile(compile_config: str | None = None, parallel: bool = False, height: int = 48, width: int = 64, num_frames: int = 81, use_onnx_subfunctions: bool = False) → str[source]

Compiles the ONNX graphs of the different model components for deployment on Qualcomm AI hardware.

This method takes the ONNX paths of the transformer and compiles them into an optimized format for inference using JSON-based configuration.

Parameters:

compile_config (str, optional) – Path to a JSON configuration file containing compilation settings, device mappings, and optimization parameters. If None, uses the default configuration.
parallel (bool, default=False) – Compilation mode selection: - True: Compile modules in parallel using ThreadPoolExecutor for faster processing - False: Compile modules sequentially for lower resource usage
height (int, default=192) – Target image height in pixels.
width (int, default=320) – Target image width in pixels.
num_frames (int, deafult=81) – Target num of frames in pixel space
use_onnx_subfunctions (bool, default=False) – Whether to export models with ONNX subfunctions before compilation if not already exported.

Raises:

RuntimeError – If compilation fails for any module or if QAIC compiler is not available
FileNotFoundError – If ONNX models haven’t been exported or config file is missing
ValueError – If configuration parameters are invalid
OSError – If there are issues with file I/O during compilation

Example

>>> pipeline = QEffWanImageToVideoPipeline.from_pretrained("Wan-AI/Wan2.2-I2V-A14B-Diffusers")
>>> # Sequential compilation with default config
>>> pipeline.compile(height=480, width=832, num_frames=81)
>>>
>>> # Parallel compilation with custom config
>>> pipeline.compile(
...     compile_config="/path/to/custom_config.json",
...     parallel=True,
...     height=480,
...     width=832,
...     num_frames=81
... )

property do_classifier_free_guidance

Determine if classifier-free guidance should be used.

Returns:: True if CFG should be applied based on current guidance scales
Return type:: bool

export(export_dir: str | None = None, use_onnx_subfunctions: bool = False) → str[source]

Export all pipeline modules to ONNX format for deployment preparation.

This method systematically exports the VAE encoder, unified transformer, and VAE decoder to ONNX format with image-to-video specific configurations including temporal dimensions, dynamic axes, and optimization settings.

The export process prepares the models for subsequent compilation to QPC format, enabling efficient inference on QAIC hardware. ONNX subfunctions can be used for certain modules to optimize memory usage and performance.

Parameters:

export_dir (str, optional) – Target directory for saving ONNX model files. If None, uses the default export directory structure. The directory will be created if it doesn’t exist.
use_onnx_subfunctions (bool, default=False) – Whether to enable ONNX subfunction optimization for supported modules. This can optimize the graph structure and improve compilation efficiency for complex models like the transformer.

Returns:

Absolute path to the export directory containing all ONNX model files.

Return type:

str

Raises:

RuntimeError – If ONNX export fails for any module
OSError – If there are issues creating the export directory or writing files
ValueError – If module configurations are invalid

Example

>>> pipeline = QEffWanImageToVideoPipeline.from_pretrained("Wan-AI/Wan2.2-I2V-A14B-Diffusers")
>>> export_path = pipeline.export(
...     export_dir="/path/to/export",
...     use_onnx_subfunctions=True
... )
>>> print(f"Models exported to: {export_path}")

classmethod from_pretrained(pretrained_model_name_or_path: str | PathLike | None, **kwargs)[source]

Load a pretrained WAN image-to-video model from HuggingFace Hub or local path and wrap it with QEfficient optimizations.

This class method provides a convenient way to instantiate a QEffWanImageToVideoPipeline from a pretrained WAN I2V model. It automatically loads the base WanImageToVideoPipeline model in float32 precision on CPU and wraps all components with QEfficient-optimized versions for QAIC deployment.

Parameters:

pretrained_model_name_or_path (str or os.PathLike) – Either a HuggingFace model identifier or a local path to a saved WAN I2V model directory. Should contain transformer, transformer_2, text_encoder, and VAE components optimized for image-to-video generation.
**kwargs – Additional keyword arguments passed to WanImageToVideoPipeline.from_pretrained().

Returns:

A fully initialized I2V pipeline instance with QEfficient-optimized components: ready for export, compilation, and inference on QAIC devices.

Return type:

QEffWanImageToVideoPipeline

Raises:

ValueError – If the model path is invalid or model cannot be loaded
OSError – If there are issues accessing the model files
RuntimeError – If model initialization fails

Example

>>> # Load from HuggingFace Hub
>>> pipeline = QEffWanImageToVideoPipeline.from_pretrained("Wan-AI/Wan2.2-I2V-A14B-Diffusers")
>>>
>>> # Load from local path
>>> pipeline = QEffWanImageToVideoPipeline.from_pretrained("/local/path/to/wan/i2v")
>>>
>>> # Load with custom cache directory
>>> pipeline = QEffWanImageToVideoPipeline.from_pretrained(
...     "Wan-AI/Wan2.2-I2V-A14B-Diffusers",
...     cache_dir="/custom/cache/dir"
... )

static get_default_config_path()[source]

Get the default configuration file path for WAN pipeline.

Returns:: Path to the default WAN configuration JSON file.
Return type:: str

static get_vae_encoder_npi_path()[source]

Get the default VAE encoder NPI configuration file path for WAN I2V pipeline.

Returns:: Path to the default WAN I2V VAE encoder NPI file.
Return type:: str

Prepare latent variables for image-to-video generation with temporal conditioning.

This method handles the complex process of preparing latent tensors for I2V generation, including image conditioning, temporal mask generation, and VAE encoding. It creates the initial noise latents and processes the input image(s) to create conditioning information that maintains temporal consistency throughout video generation.

Parameters:

image (PipelineImageInput) – Input image(s) to condition the video generation. Can be PIL Image, numpy array, or torch tensor.
batch_size (int) – Number of videos to generate in parallel.
num_channels_latents (int, default=16) – Number of channels in the latent space.
height (int, default=480) – Target video height in pixels.
width (int, default=832) – Target video width in pixels.
num_frames (int, default=81) – Number of frames in the generated video.
dtype (torch.dtype, optional) – Data type for latent tensors. If None, uses float32.
device (torch.device, optional) – Device to place tensors on. If None, uses CPU.
generator (torch.Generator or List[torch.Generator], optional) – Random generator(s) for reproducible latent initialization.
latents (torch.Tensor, optional) – Pre-generated latent tensors. If None, random latents are created.
last_image (torch.Tensor, optional) – Optional last frame image for video completion tasks. Used to create temporal boundaries.

Returns:

A tuple containing:

latents: Initial noise latents for denoising process
condition: Conditioning tensor combining temporal masks and image latents OR (if expand_timesteps=True):
latents: Initial noise latents
latent_condition: Image conditioning latents

Return type:

Tuple[torch.Tensor, torch.Tensor]

Raises:

ValueError – If generator list length doesn’t match batch size
RuntimeError – If VAE encoding fails or tensor operations fail

`QEffFluxPipeline`

FLUX supports optional first-block-cache via runtime monkey patching:

from QEfficient import QEffFluxPipeline

pipeline = QEffFluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-schnell",
    enable_first_block_cache=True,
    first_block_cache_downsample_factor=4,
)

output = pipeline(
    prompt="A laughing girl",
    cache_threshold=0.1,
)

When enable_first_block_cache=False, the pipeline follows baseline behavior and ignores cache_threshold.

See examples:

examples/diffusers/flux/flux_1_schnell.py
examples/diffusers/flux/flux_1_shnell_custom.py
examples/diffusers/flux/flux_1_schnell_first_block_cache.py

class QEfficient.diffusers.pipelines.flux.pipeline_flux.QEffFluxPipeline(model, enable_first_block_cache: bool = False, first_block_cache_downsample_factor: int = 4, *args, **kwargs)[source]

QEfficient-optimized Flux pipeline for high-performance text-to-image generation on Qualcomm AI hardware.

This pipeline provides an optimized implementation of the Flux diffusion model specifically designed for deployment on Qualcomm AI Cloud (QAIC) devices. It wraps the original HuggingFace Flux model components with QEfficient-optimized versions that can be exported to ONNX format and compiled into Qualcomm Program Container (QPC) files for efficient inference.

The pipeline supports the complete Flux workflow including: - Dual text encoding with CLIP and T5 encoders - Transformer-based denoising with adaptive layer normalization - VAE decoding for final image generation - Performance monitoring and optimization

text_encoder

Optimized CLIP text encoder for pooled embeddings

Type:: QEffTextEncoder

text_encoder_2

Optimized T5 text encoder for sequence embeddings

Type:: QEffTextEncoder

transformer

Optimized Flux transformer for denoising

Type:: QEffFluxTransformerModel

vae_decode

Optimized VAE decoder for latent-to-image conversion

Type:: QEffVAE

modules

Dictionary of all pipeline modules for batch operations

Type:: Dict[str, Any]

model

Original HuggingFace Flux model reference

Type:: FluxPipeline

tokenizer: CLIP tokenizer for text preprocessing

scheduler: Diffusion scheduler for timestep management

Example

>>> from QEfficient.diffusers.pipelines.flux import QEffFluxPipeline
>>> pipeline = QEffFluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell")
>>> images = pipeline(
...     prompt="A beautiful sunset over mountains",
...     height=512,
...     width=512,
...     num_inference_steps=28
... )
>>> images.images[0].save("generated_image.png")

compile(compile_config: str | None = None, parallel: bool = False, height: int = 512, width: int = 512, use_onnx_subfunctions: bool = False) → None[source]

Compile ONNX models into optimized QPC format for deployment on Qualcomm AI hardware.

Parameters:

compile_config (str, optional) – Path to a JSON configuration file containing compilation settings, device mappings, and optimization parameters. If None, uses the default configuration from get_default_config_path().
parallel (bool, default=False) – Compilation mode selection: - True: Compile modules in parallel using ThreadPoolExecutor for faster processing - False: Compile modules sequentially for lower resource usage
height (int, default=512) – Target image height in pixels.
width (int, default=512) – Target image width in pixels.
use_onnx_subfunctions (bool, default=False) – Whether to export models with ONNX subfunctions before compilation.

Raises:

RuntimeError – If compilation fails for any module or if QAIC compiler is not available
FileNotFoundError – If ONNX models haven’t been exported or config file is missing
ValueError – If configuration parameters are invalid
OSError – If there are issues with file I/O during compilation

Example

>>> pipeline = QEffFluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell")
>>> # Sequential compilation with default config
>>> pipeline.compile(height=1024, width=1024)
>>>
>>> # Parallel compilation with custom config
>>> pipeline.compile(
...     compile_config="/path/to/custom_config.json",
...     parallel=True,
...     height=512,
...     width=512
... )

Encode text prompts using Flux’s dual text encoder architecture.

Flux employs both CLIP and T5 encoders for comprehensive text understanding: - CLIP provides pooled embeddings for global semantic conditioning - T5 provides detailed sequence embeddings for fine-grained text control

Parameters:

prompt (str or List[str]) – Primary prompt(s) for both encoders
prompt_2 (str or List[str], optional) – Secondary prompt(s) for T5. If None, uses primary prompt
num_images_per_prompt (int) – Number of images to generate per prompt
prompt_embeds (torch.FloatTensor, optional) – Pre-computed T5 embeddings
pooled_prompt_embeds (torch.FloatTensor, optional) – Pre-computed CLIP pooled embeddings
max_sequence_length (int) – Maximum sequence length for T5 tokenization

Returns:

(prompt_embeds, pooled_prompt_embeds, text_ids, encoder_perf_times)

prompt_embeds (torch.Tensor): T5 sequence embeddings [batch*num_images, seq_len, 4096]
pooled_prompt_embeds (torch.Tensor): CLIP pooled embeddings [batch*num_images, 768]
text_ids (torch.Tensor): Position IDs for text tokens [seq_len, 3]
encoder_perf_times (List[float]): Performance times [CLIP_time, T5_time]

Return type:

tuple

export(export_dir: str | None = None, use_onnx_subfunctions: bool = False) → str[source]

Export all pipeline modules to ONNX format for deployment preparation.

This method systematically exports each pipeline component (CLIP text encoder, T5 text encoder, Flux transformer, and VAE decoder) to ONNX format. Each module is exported with its specific configuration including dynamic axes, input/output specifications, and optimization settings.

The export process prepares the models for subsequent compilation to QPC format, enabling efficient inference on QAIC hardware. ONNX subfunctions can be used for certain modules to optimize memory usage and performance.

Parameters:

export_dir (str, optional) – Target directory for saving ONNX model files. If None, uses the default export directory structure based on model name and configuration. The directory will be created if it doesn’t exist.
use_onnx_subfunctions (bool, default=False) – Whether to enable ONNX subfunction optimization for supported modules. This can optimize thegraph and improve compilation efficiency for models like the transformer.

Returns:

Absolute path to the export directory containing all ONNX model files.: Each module will have its own subdirectory with the exported ONNX file.

Return type:

str

Raises:

RuntimeError – If ONNX export fails for any module
OSError – If there are issues creating the export directory or writing files
ValueError – If module configurations are invalid

Note

All models are exported in float32 precision for maximum compatibility
Dynamic axes are configured to support variable batch sizes and sequence lengths
The export process may take several minutes depending on model size
Exported ONNX files can be large (several GB for complete pipeline)

Example

>>> pipeline = QEffFluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell")
>>> export_path = pipeline.export(
...     export_dir="/path/to/export",
...     use_onnx_subfunctions=True
... )
>>> print(f"Models exported to: {export_path}")

classmethod from_pretrained(pretrained_model_name_or_path: str | PathLike | None, enable_first_block_cache: bool = False, first_block_cache_downsample_factor: int = 4, **kwargs)[source]

Load a pretrained Flux model from HuggingFace Hub or local path and wrap it with QEfficient optimizations.

This class method provides a convenient way to instantiate a QEffFluxPipeline from a pretrained Flux model. It automatically loads the base FluxPipeline model in float32 precision on CPU and wraps all components with QEfficient-optimized versions for QAIC deployment.

Parameters:

pretrained_model_name_or_path (str or os.PathLike) – Either a HuggingFace model identifier (e.g., “black-forest-labs/FLUX.1-schnell”) or a local path to a saved model directory.
enable_first_block_cache (bool, optional) – Enables retained-state first-block-cache path.
first_block_cache_downsample_factor (int, optional) – Downsample factor for the first-block residual cache key when cache is enabled.
**kwargs – Additional keyword arguments passed to FluxPipeline.from_pretrained().

Returns:

A fully initialized pipeline instance with QEfficient-optimized components: ready for export, compilation, and inference on QAIC devices.

Return type:

QEffFluxPipeline

Raises:

ValueError – If the model path is invalid or model cannot be loaded
OSError – If there are issues accessing the model files
RuntimeError – If model initialization fails

Example

>>> # Load from HuggingFace Hub
>>> pipeline = QEffFluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell")
>>>
>>> # Load from local path
>>> pipeline = QEffFluxPipeline.from_pretrained("/path/to/local/flux/model")
>>>
>>> # Load with custom cache directory
>>> pipeline = QEffFluxPipeline.from_pretrained(
...     "black-forest-labs/FLUX.1-dev",
...     cache_dir="/custom/cache/dir"
... )

static get_default_config_path() → str[source]

Get the absolute path to the default Flux pipeline configuration file.

Returns:

Absolute path to the flux_config.json file containing default pipeline: configuration settings for compilation and device allocation.

Return type:

str

Diffuser Classes

Pipeline API

QEffTextEncoder

QEffUNet

QEffVAE

QEffFluxTransformerModel

QEffWanUnifiedTransformer

Model Classes

QEffWanPipeline

QEffWanImageToVideoPipeline

QEffFluxPipeline

`QEffTextEncoder`

`QEffUNet`

`QEffVAE`

`QEffFluxTransformerModel`

`QEffWanUnifiedTransformer`

`QEffWanPipeline`

`QEffWanImageToVideoPipeline`

`QEffFluxPipeline`