Diffuser Classes
Pipeline API
QEffTextEncoder
- class QEfficient.diffusers.pipelines.pipeline_module.QEffTextEncoder(model: Module)[source]
Wrapper for text encoder models with ONNX export and QAIC compilation capabilities.
This class handles text encoder models (CLIP, T5) with specific transformations and optimizations for efficient inference on Qualcomm AI hardware. It applies custom PyTorch and ONNX transformations to prepare models for deployment.
- model
The wrapped text encoder model (deep copy of original)
- Type:
nn.Module
- _pytorch_transforms
PyTorch transformations applied before ONNX export
- Type:
List
- _onnx_transforms
ONNX transformations applied after export
- Type:
List
- compile(specializations: List[Dict], **compiler_options) None[source]
Compile the ONNX model for Qualcomm AI hardware.
- Parameters:
specializations (List[Dict]) – Model specialization configurations
**compiler_options – Additional compiler options (e.g., num_cores, aic_num_of_activations)
- export(inputs: Dict, output_names: List[str], dynamic_axes: Dict, export_dir: str = None, export_kwargs: Dict = {}) str[source]
Export the text encoder model to ONNX format.
- Parameters:
inputs (Dict) – Example inputs for ONNX export
output_names (List[str]) – Names of model outputs
dynamic_axes (Dict) – Specification of dynamic dimensions
export_dir (str, optional) – Directory to save ONNX model
export_kwargs (Dict, optional) – Additional export arguments
- Returns:
Path to the exported ONNX model
- Return type:
str
- property get_model_config: Dict
Get the model configuration as a dictionary.
- Returns:
The configuration dictionary of the underlying text encoder model
- Return type:
Dict
- get_onnx_params() Tuple[Dict, Dict, List[str]][source]
Generate ONNX export configuration for the text encoder.
Creates example inputs, dynamic axes specifications, and output names tailored to the specific text encoder type (CLIP vs T5).
- Returns:
example_inputs (Dict): Sample inputs for ONNX export
dynamic_axes (Dict): Specification of dynamic dimensions
output_names (List[str]): Names of model outputs
- Return type:
Tuple containing
QEffUNet
- class QEfficient.diffusers.pipelines.pipeline_module.QEffUNet(model: Module)[source]
Wrapper for UNet models with ONNX export and QAIC compilation capabilities.
This class handles UNet models with specific transformations and optimizations for efficient inference on Qualcomm AI hardware. UNet is commonly used in diffusion models for image generation tasks.
- model
The wrapped UNet model
- Type:
nn.Module
- _pytorch_transforms
PyTorch transformations applied before ONNX export
- Type:
List
- _onnx_transforms
ONNX transformations applied after export
- Type:
List
- compile(specializations: List[Dict], **compiler_options) None[source]
Compile the ONNX model for Qualcomm AI hardware.
- Parameters:
specializations (List[Dict]) – Model specialization configurations
**compiler_options – Additional compiler options
- export(inputs: Dict, output_names: List[str], dynamic_axes: Dict, export_dir: str = None, export_kwargs: Dict = {}) str[source]
Export the UNet model to ONNX format.
- Parameters:
inputs (Dict) – Example inputs for ONNX export
output_names (List[str]) – Names of model outputs
dynamic_axes (Dict) – Specification of dynamic dimensions
export_dir (str, optional) – Directory to save ONNX model
export_kwargs (Dict, optional) – Additional export arguments
- Returns:
Path to the exported ONNX model
- Return type:
str
- property get_model_config: Dict
Get the model configuration as a dictionary.
- Returns:
The configuration dictionary of the underlying UNet model
- Return type:
Dict
QEffVAE
- class QEfficient.diffusers.pipelines.pipeline_module.QEffVAE(model: Module, type: str)[source]
Wrapper for Variational Autoencoder (VAE) models with ONNX export and QAIC compilation.
This class handles VAE models with specific transformations and optimizations for efficient inference on Qualcomm AI hardware. VAE models are used in diffusion pipelines for encoding images to latent space and decoding latents back to images.
- model
The wrapped VAE model (deep copy of original)
- Type:
nn.Module
- type
VAE operation type (“encoder” or “decoder”)
- Type:
str
- _pytorch_transforms
PyTorch transformations applied before ONNX export
- Type:
List
- _onnx_transforms
ONNX transformations applied after export
- Type:
List
- compile(specializations: List[Dict], **compiler_options) None[source]
Compile the ONNX model for Qualcomm AI hardware.
- Parameters:
specializations (List[Dict]) – Model specialization configurations
**compiler_options – Additional compiler options
- export(inputs: Dict, output_names: List[str], dynamic_axes: Dict, export_dir: str = None, export_kwargs: Dict = {}) str[source]
Export the VAE model to ONNX format.
- Parameters:
inputs (Dict) – Example inputs for ONNX export
output_names (List[str]) – Names of model outputs
dynamic_axes (Dict) – Specification of dynamic dimensions
export_dir (str, optional) – Directory to save ONNX model
export_kwargs (Dict, optional) – Additional export arguments
- Returns:
Path to the exported ONNX model
- Return type:
str
- get_img_encoder_onnx_params() Tuple[Dict, Dict, List[str]][source]
Generate ONNX export configuration for the VAE Encoder.
- Returns:
example_inputs (Dict): Sample inputs for ONNX export
dynamic_axes (Dict): Specification of dynamic dimensions
output_names (List[str]): Names of model outputs
- Return type:
Tuple containing
- property get_model_config: Dict
Get the model configuration as a dictionary.
- Returns:
The configuration dictionary of the underlying VAE model
- Return type:
Dict
- get_onnx_params(latent_height: int = 32, latent_width: int = 32) Tuple[Dict, Dict, List[str]][source]
Generate ONNX export configuration for the VAE decoder.
- Parameters:
latent_height (int) – Height of latent representation (default: 32)
latent_width (int) – Width of latent representation (default: 32)
- Returns:
example_inputs (Dict): Sample inputs for ONNX export
dynamic_axes (Dict): Specification of dynamic dimensions
output_names (List[str]): Names of model outputs
- Return type:
Tuple containing
- get_video_onnx_params() Tuple[Dict, Dict, List[str]][source]
Generate ONNX export configuration for the VAE decoder.
- Parameters:
latent_height (int) – Height of latent representation (default: 32)
latent_width (int) – Width of latent representation (default: 32)
- Returns:
example_inputs (Dict): Sample inputs for ONNX export
dynamic_axes (Dict): Specification of dynamic dimensions
output_names (List[str]): Names of model outputs
- Return type:
Tuple containing
QEffFluxTransformerModel
- class QEfficient.diffusers.pipelines.pipeline_module.QEffFluxTransformerModel(model: Module)[source]
Wrapper for Flux Transformer2D models with ONNX export and QAIC compilation capabilities.
This class handles Flux Transformer2D models with specific transformations and optimizations for efficient inference on Qualcomm AI hardware. Flux uses a transformer-based diffusion architecture instead of traditional UNet, with dual transformer blocks and adaptive layer normalization (AdaLN) for conditioning.
- model
The wrapped Flux transformer model
- Type:
nn.Module
- _pytorch_transforms
PyTorch transformations applied before ONNX export
- Type:
List
- _onnx_transforms
ONNX transformations applied after export
- Type:
List
- compile(specializations: List[Dict], **compiler_options) None[source]
Compile the ONNX model for Qualcomm AI hardware.
- Parameters:
specializations (List[Dict]) – Model specialization configurations
**compiler_options – Additional compiler options (e.g., num_cores, aic_num_of_activations)
- export(inputs: Dict, output_names: List[str], dynamic_axes: Dict, export_dir: str = None, use_onnx_subfunctions: bool = False) str[source]
Export the Flux transformer model to ONNX format.
- Parameters:
inputs (Dict) – Example inputs for ONNX export
output_names (List[str]) – Names of model outputs
dynamic_axes (Dict) – Specification of dynamic dimensions
export_dir (str, optional) – Directory to save ONNX model
use_onnx_subfunctions (bool) – Whether to export transformer blocks as ONNX functions for better modularity and potential optimization
- Returns:
Path to the exported ONNX model
- Return type:
str
- property get_model_config: Dict
Get the model configuration as a dictionary.
- Returns:
The configuration dictionary of the underlying Flux transformer model
- Return type:
Dict
- get_onnx_params(batch_size: int = 1, seq_length: int = 256, cl: int = 4096) Tuple[Dict, Dict, List[str]][source]
Generate ONNX export configuration for the Flux transformer.
Creates example inputs for all Flux-specific inputs including hidden states, text embeddings, timestep conditioning, and AdaLN embeddings.
- Parameters:
batch_size (int) – Batch size for example inputs (default: FLUX_ONNX_EXPORT_BATCH_SIZE)
seq_length (int) – Text sequence length (default: FLUX_ONNX_EXPORT_SEQ_LENGTH)
cl (int) – Compressed latent dimension (default: FLUX_ONNX_EXPORT_COMPRESSED_LATENT_DIM)
- Returns:
example_inputs (Dict): Sample inputs for ONNX export
dynamic_axes (Dict): Specification of dynamic dimensions
output_names (List[str]): Names of model outputs
- Return type:
Tuple containing
QEffWanUnifiedTransformer
- class QEfficient.diffusers.pipelines.pipeline_module.QEffWanUnifiedTransformer(unified_transformer)[source]
Wrapper for WAN Unified Transformer with ONNX export and QAIC compilation capabilities.
This class handles the unified WAN transformer model that combines high and low noise transformers into a single model for efficient deployment. Based on the timestep shape, the model dynamically selects between high and low noise transformers during inference.
The wrapper applies specific transformations and optimizations for efficient inference on Qualcomm AI hardware, particularly for video diffusion models.
- model
The QEffWanUnifiedWrapper model that combines high/low noise transformers
- Type:
nn.Module
- _pytorch_transforms
PyTorch transformations applied before ONNX export
- Type:
List
- _onnx_transforms
ONNX transformations applied after export
- Type:
List
- compile(specializations, **compiler_options) None[source]
Compile the ONNX model for Qualcomm AI hardware.
- Parameters:
specializations (List[Dict]) – Model specialization configurations
**compiler_options – Additional compiler options (e.g., num_cores, aic_num_of_activations)
- export(inputs: Dict, output_names: List[str], dynamic_axes: Dict, export_dir: str = None, use_onnx_subfunctions: bool = False) str[source]
Export the Wan transformer model to ONNX format.
- Parameters:
inputs (Dict) – Example inputs for ONNX export
output_names (List[str]) – Names of model outputs
dynamic_axes (Dict) – Specification of dynamic dimensions
export_dir (str, optional) – Directory to save ONNX model
use_onnx_subfunctions (bool) – Whether to export transformer blocks as ONNX functions for better modularity and potential optimization
- Returns:
Path to the exported ONNX model
- Return type:
str
- property get_model_config: Dict
Get the model configuration as a dictionary.
- Returns:
The configuration dictionary of the underlying Wan transformer model
- Return type:
Dict
- get_onnx_params()[source]
Generate ONNX export configuration for the Wan transformer.
Creates example inputs for all Wan-specific inputs including hidden states, text embeddings, timestep conditioning, :returns: - example_inputs (Dict): Sample inputs for ONNX export
dynamic_axes (Dict): Specification of dynamic dimensions
output_names (List[str]): Names of model outputs
- Return type:
Tuple containing
Model Classes
QEffWanPipeline
WAN supports two execution architectures:
use_unified=True(default): one unified transformer module.use_unified=False: separatetransformer_highandtransformer_lowmodules.
First-block-cache is currently supported only for non-unified WAN:
from QEfficient import QEffWanPipeline
pipeline = QEffWanPipeline.from_pretrained(
"Wan-AI/Wan2.2-T2V-A14B-Diffusers",
use_unified=False,
enable_first_block_cache=True,
first_block_cache_downsample_factor=4,
)
output = pipeline(
prompt="A cat playing in a sunny garden",
cache_threshold_high=0.1,
cache_threshold_low=0.065,
)
See examples:
examples/diffusers/wan/wan_lightning.pyexamples/diffusers/wan/wan_lightning_custom.pyexamples/diffusers/wan/wan_first_block_cache.py
- class QEfficient.diffusers.pipelines.wan.pipeline_wan.QEffWanPipeline(model, use_unified: bool = True, enable_first_block_cache: bool = False, first_block_cache_downsample_factor: int = 4, **kwargs)[source]
QEfficient-optimized WAN pipeline for high-performance text-to-video generation on Qualcomm AI hardware.
This pipeline provides an optimized implementation of the WAN diffusion model specifically designed for deployment on Qualcomm AI Cloud (QAIC) devices. It extends the original HuggingFace WAN model with QEfficient-optimized components that can be exported to ONNX format and compiled into Qualcomm Program Container (QPC) files for efficient video generation.
The pipeline supports the complete WAN workflow including: - UMT5 text encoding for rich semantic understanding - Unified transformer architecture: Combines multiple transformer stages into a single optimized model - VAE decoding for final video output - Performance monitoring and hardware optimization
- text_encoder
UMT5 text encoder for semantic text understanding (TODO: QEfficient optimization)
- unified_wrapper
Wrapper combining transformer stages (unified mode)
- Type:
QEffWanUnifiedWrapper
- transformer
Optimized unified transformer for denoising (unified mode)
- transformer_high
High-noise transformer module (non-unified mode)
- Type:
QEffWanTransformer
- transformer_low
Low-noise transformer module (non-unified mode)
- Type:
QEffWanTransformer
- vae_decode
VAE decoder for latent-to-video conversion
- modules
Dictionary of pipeline modules for batch operations
- Type:
Dict[str, Any]
- model
Original HuggingFace WAN model reference
- Type:
WanPipeline
- tokenizer
Text tokenizer for preprocessing
- scheduler
Diffusion scheduler for timestep management
Example
>>> from QEfficient.diffusers.pipelines.wan import QEffWanPipeline >>> pipeline = QEffWanPipeline.from_pretrained("path/to/wan/model") >>> videos = pipeline( ... prompt="A cat playing in a garden", ... height=480, ... width=832, ... num_frames=81, ... num_inference_steps=4 ... ) >>> # Save generated video >>> videos.images[0].save("generated_video.mp4")
- compile(compile_config: str | None = None, parallel: bool = False, height: int = 48, width: int = 64, num_frames: int = 81, use_onnx_subfunctions: bool = False) str[source]
Compiles the ONNX graphs of the different model components for deployment on Qualcomm AI hardware.
This method takes the ONNX paths of the transformer and compiles them into an optimized format for inference using JSON-based configuration.
- Parameters:
compile_config (str, optional) – Path to a JSON configuration file containing compilation settings, device mappings, and optimization parameters. If None, uses the default configuration.
parallel (bool, default=False) – Compilation mode selection: - True: Compile modules in parallel using ThreadPoolExecutor for faster processing - False: Compile modules sequentially for lower resource usage
height (int, default=192) – Target image height in pixels.
width (int, default=320) – Target image width in pixels.
num_frames (int, deafult=81) – Target num of frames in pixel space
use_onnx_subfunctions (bool, default=False) – Whether to export models with ONNX subfunctions before compilation if not already exported.
- Raises:
RuntimeError – If compilation fails for any module or if QAIC compiler is not available
FileNotFoundError – If ONNX models haven’t been exported or config file is missing
ValueError – If configuration parameters are invalid
OSError – If there are issues with file I/O during compilation
Example
>>> pipeline = QEffWanPipeline.from_pretrained("Wan-AI/Wan2.2-T2V-A14B-Diffusers") >>> # Sequential compilation with default config >>> pipeline.compile(height=480, width=832, num_frames=81) >>> >>> # Parallel compilation with custom config >>> pipeline.compile( ... compile_config="/path/to/custom_config.json", ... parallel=True, ... height=480, ... width=832, ... num_frames=81 ... )
- property do_classifier_free_guidance
Determine if classifier-free guidance should be used.
- Returns:
True if CFG should be applied based on current guidance scales
- Return type:
bool
- export(export_dir: str | None = None, use_onnx_subfunctions: bool = False) str[source]
Export all pipeline modules to ONNX format for deployment preparation.
This method systematically exports the unified transformer to ONNX format with video-specific configurations including temporal dimensions, dynamic axes, and optimization settings. The export process prepares the model for subsequent compilation to QPC format for efficient inference on QAIC hardware.
- Parameters:
export_dir (str, optional) – Target directory for saving ONNX model files. If None, uses the default export directory structure. The directory will be created if it doesn’t exist.
use_onnx_subfunctions (bool, default=False) – Whether to enable ONNX subfunction optimization for supported modules. This can optimize the graph structure and improve compilation efficiency for complex models like the transformer.
- Returns:
Absolute path to the export directory containing all ONNX model files.
- Return type:
str
- Raises:
RuntimeError – If ONNX export fails for any module
OSError – If there are issues creating the export directory or writing files
ValueError – If module configurations are invalid
Example
>>> pipeline = QEffWanPipeline.from_pretrained("Wan-AI/Wan2.2-T2V-A14B-Diffusers") >>> export_path = pipeline.export( ... export_dir="/path/to/export", ... use_onnx_subfunctions=True ... )
- classmethod from_pretrained(pretrained_model_name_or_path: str | PathLike | None, use_unified: bool = True, enable_first_block_cache: bool = False, first_block_cache_downsample_factor: int = 4, **kwargs)[source]
Load a pretrained WAN model from HuggingFace Hub or local path and wrap it with QEfficient optimizations.
This class method provides a convenient way to instantiate a QEffWanPipeline from a pretrained WAN model. It automatically loads the base WanPipeline model in float32 precision on CPU and wraps all components with QEfficient-optimized versions for QAIC deployment.
- Parameters:
pretrained_model_name_or_path (str or os.PathLike) – Either a HuggingFace model identifier or a local path to a saved WAN model directory. Should contain transformer, transformer_2, text_encoder, and VAE components.
use_unified (bool, optional) – Selects WAN execution architecture. - True: unified high/low transformer module - False: separate high and low transformer modules
enable_first_block_cache (bool, optional) – Enables retained-state first-block-cache for non-unified mode.
first_block_cache_downsample_factor (int, optional) – Downsample factor for first-block cache key when cache is enabled.
**kwargs – Additional keyword arguments passed to WanPipeline.from_pretrained().
- Returns:
- A fully initialized pipeline instance with QEfficient-optimized components
ready for export, compilation, and inference on QAIC devices.
- Return type:
- Raises:
ValueError – If the model path is invalid or model cannot be loaded
OSError – If there are issues accessing the model files
RuntimeError – If model initialization fails
Example
>>> # Load from HuggingFace Hub >>> pipeline = QEffWanPipeline.from_pretrained("Wan-AI/Wan2.2-T2V-A14B-Diffusers") >>> >>> # Load from local path >>> pipeline = QEffWanPipeline.from_pretrained("/local/path/to/wan") >>> >>> # Load with custom cache directory >>> pipeline = QEffWanPipeline.from_pretrained( ... "wan-model-id", ... cache_dir="/custom/cache/dir" ... )
QEffWanImageToVideoPipeline
- class QEfficient.diffusers.pipelines.wan.pipeline_wan_i2v.QEffWanImageToVideoPipeline(model, **kwargs)[source]
QEfficient-optimized WAN image-to-video pipeline for high-performance video generation on Qualcomm AI hardware.
This pipeline provides an optimized implementation of the WAN image-to-video diffusion model specifically designed for deployment on Qualcomm AI Cloud (QAIC) devices. It extends the original HuggingFace WAN image-to-video model with QEfficient-optimized components that can be exported to ONNX format and compiled into Qualcomm Program Container (QPC) files for efficient video generation from static images.
The pipeline supports the complete WAN image-to-video workflow including: - Image conditioning and preprocessing for temporal consistency - UMT5 text encoding for rich semantic understanding - Unified transformer architecture: Combines multiple transformer stages into a single optimized model - VAE encoding/decoding for image-to-latent and latent-to-video conversion
- text_encoder
UMT5 text encoder for semantic text understanding (TODO: QEfficient optimization)
- unified_wrapper
Wrapper combining transformer stages
- Type:
QEffWanUnifiedWrapper
- transformer
Optimized unified transformer for denoising
- modules
Dictionary of pipeline modules for batch operations
- Type:
Dict[str, Any]
- model
Original HuggingFace WAN I2V model reference
- Type:
WanImageToVideoPipeline
- tokenizer
Text tokenizer for preprocessing
- scheduler
Diffusion scheduler for timestep management
Example
>>> from QEfficient.diffusers.pipelines.wan import QEffWanImageToVideoPipeline >>> from PIL import Image >>> >>> # Load pipeline and input image >>> pipeline = QEffWanImageToVideoPipeline.from_pretrained("Wan-AI/Wan2.2-I2V-A14B-Diffusers") >>> image = Image.open("input_frame.jpg") >>> >>> # Generate video with motion >>> result = pipeline( ... image=image, ... prompt="A person walking through a sunny garden with flowing motion", ... height=544, ... width=720, ... num_frames=81, ... num_inference_steps=4, ... guidance_scale=1.0 ... ) >>> # Save generated video >>> frames = result.images[0] >>> export_to_video(frames, "generated_video.mp4", fps=16)
- compile(compile_config: str | None = None, parallel: bool = False, height: int = 48, width: int = 64, num_frames: int = 81, use_onnx_subfunctions: bool = False) str[source]
Compiles the ONNX graphs of the different model components for deployment on Qualcomm AI hardware.
This method takes the ONNX paths of the transformer and compiles them into an optimized format for inference using JSON-based configuration.
- Parameters:
compile_config (str, optional) – Path to a JSON configuration file containing compilation settings, device mappings, and optimization parameters. If None, uses the default configuration.
parallel (bool, default=False) – Compilation mode selection: - True: Compile modules in parallel using ThreadPoolExecutor for faster processing - False: Compile modules sequentially for lower resource usage
height (int, default=192) – Target image height in pixels.
width (int, default=320) – Target image width in pixels.
num_frames (int, deafult=81) – Target num of frames in pixel space
use_onnx_subfunctions (bool, default=False) – Whether to export models with ONNX subfunctions before compilation if not already exported.
- Raises:
RuntimeError – If compilation fails for any module or if QAIC compiler is not available
FileNotFoundError – If ONNX models haven’t been exported or config file is missing
ValueError – If configuration parameters are invalid
OSError – If there are issues with file I/O during compilation
Example
>>> pipeline = QEffWanImageToVideoPipeline.from_pretrained("Wan-AI/Wan2.2-I2V-A14B-Diffusers") >>> # Sequential compilation with default config >>> pipeline.compile(height=480, width=832, num_frames=81) >>> >>> # Parallel compilation with custom config >>> pipeline.compile( ... compile_config="/path/to/custom_config.json", ... parallel=True, ... height=480, ... width=832, ... num_frames=81 ... )
- property do_classifier_free_guidance
Determine if classifier-free guidance should be used.
- Returns:
True if CFG should be applied based on current guidance scales
- Return type:
bool
- export(export_dir: str | None = None, use_onnx_subfunctions: bool = False) str[source]
Export all pipeline modules to ONNX format for deployment preparation.
This method systematically exports the VAE encoder, unified transformer, and VAE decoder to ONNX format with image-to-video specific configurations including temporal dimensions, dynamic axes, and optimization settings.
The export process prepares the models for subsequent compilation to QPC format, enabling efficient inference on QAIC hardware. ONNX subfunctions can be used for certain modules to optimize memory usage and performance.
- Parameters:
export_dir (str, optional) – Target directory for saving ONNX model files. If None, uses the default export directory structure. The directory will be created if it doesn’t exist.
use_onnx_subfunctions (bool, default=False) – Whether to enable ONNX subfunction optimization for supported modules. This can optimize the graph structure and improve compilation efficiency for complex models like the transformer.
- Returns:
Absolute path to the export directory containing all ONNX model files.
- Return type:
str
- Raises:
RuntimeError – If ONNX export fails for any module
OSError – If there are issues creating the export directory or writing files
ValueError – If module configurations are invalid
Example
>>> pipeline = QEffWanImageToVideoPipeline.from_pretrained("Wan-AI/Wan2.2-I2V-A14B-Diffusers") >>> export_path = pipeline.export( ... export_dir="/path/to/export", ... use_onnx_subfunctions=True ... ) >>> print(f"Models exported to: {export_path}")
- classmethod from_pretrained(pretrained_model_name_or_path: str | PathLike | None, **kwargs)[source]
Load a pretrained WAN image-to-video model from HuggingFace Hub or local path and wrap it with QEfficient optimizations.
This class method provides a convenient way to instantiate a QEffWanImageToVideoPipeline from a pretrained WAN I2V model. It automatically loads the base WanImageToVideoPipeline model in float32 precision on CPU and wraps all components with QEfficient-optimized versions for QAIC deployment.
- Parameters:
pretrained_model_name_or_path (str or os.PathLike) – Either a HuggingFace model identifier or a local path to a saved WAN I2V model directory. Should contain transformer, transformer_2, text_encoder, and VAE components optimized for image-to-video generation.
**kwargs – Additional keyword arguments passed to WanImageToVideoPipeline.from_pretrained().
- Returns:
- A fully initialized I2V pipeline instance with QEfficient-optimized components
ready for export, compilation, and inference on QAIC devices.
- Return type:
- Raises:
ValueError – If the model path is invalid or model cannot be loaded
OSError – If there are issues accessing the model files
RuntimeError – If model initialization fails
Example
>>> # Load from HuggingFace Hub >>> pipeline = QEffWanImageToVideoPipeline.from_pretrained("Wan-AI/Wan2.2-I2V-A14B-Diffusers") >>> >>> # Load from local path >>> pipeline = QEffWanImageToVideoPipeline.from_pretrained("/local/path/to/wan/i2v") >>> >>> # Load with custom cache directory >>> pipeline = QEffWanImageToVideoPipeline.from_pretrained( ... "Wan-AI/Wan2.2-I2V-A14B-Diffusers", ... cache_dir="/custom/cache/dir" ... )
- static get_default_config_path()[source]
Get the default configuration file path for WAN pipeline.
- Returns:
Path to the default WAN configuration JSON file.
- Return type:
str
- static get_vae_encoder_npi_path()[source]
Get the default VAE encoder NPI configuration file path for WAN I2V pipeline.
- Returns:
Path to the default WAN I2V VAE encoder NPI file.
- Return type:
str
- prepare_latents(image: Image | ndarray | Tensor | List[Image] | List[ndarray] | List[Tensor], batch_size: int, num_channels_latents: int = 16, height: int = 480, width: int = 832, num_frames: int = 81, dtype: dtype | None = None, device: device | None = None, generator: Generator | List[Generator] | None = None, latents: Tensor | None = None, last_image: Tensor | None = None) Tuple[Tensor, Tensor][source]
Prepare latent variables for image-to-video generation with temporal conditioning.
This method handles the complex process of preparing latent tensors for I2V generation, including image conditioning, temporal mask generation, and VAE encoding. It creates the initial noise latents and processes the input image(s) to create conditioning information that maintains temporal consistency throughout video generation.
- Parameters:
image (PipelineImageInput) – Input image(s) to condition the video generation. Can be PIL Image, numpy array, or torch tensor.
batch_size (int) – Number of videos to generate in parallel.
num_channels_latents (int, default=16) – Number of channels in the latent space.
height (int, default=480) – Target video height in pixels.
width (int, default=832) – Target video width in pixels.
num_frames (int, default=81) – Number of frames in the generated video.
dtype (torch.dtype, optional) – Data type for latent tensors. If None, uses float32.
device (torch.device, optional) – Device to place tensors on. If None, uses CPU.
generator (torch.Generator or List[torch.Generator], optional) – Random generator(s) for reproducible latent initialization.
latents (torch.Tensor, optional) – Pre-generated latent tensors. If None, random latents are created.
last_image (torch.Tensor, optional) – Optional last frame image for video completion tasks. Used to create temporal boundaries.
- Returns:
- A tuple containing:
latents: Initial noise latents for denoising process
condition: Conditioning tensor combining temporal masks and image latents OR (if expand_timesteps=True):
latents: Initial noise latents
latent_condition: Image conditioning latents
- Return type:
Tuple[torch.Tensor, torch.Tensor]
- Raises:
ValueError – If generator list length doesn’t match batch size
RuntimeError – If VAE encoding fails or tensor operations fail
QEffFluxPipeline
FLUX supports optional first-block-cache via runtime monkey patching:
from QEfficient import QEffFluxPipeline
pipeline = QEffFluxPipeline.from_pretrained(
"black-forest-labs/FLUX.1-schnell",
enable_first_block_cache=True,
first_block_cache_downsample_factor=4,
)
output = pipeline(
prompt="A laughing girl",
cache_threshold=0.1,
)
When enable_first_block_cache=False, the pipeline follows baseline behavior and ignores cache_threshold.
See examples:
examples/diffusers/flux/flux_1_schnell.pyexamples/diffusers/flux/flux_1_shnell_custom.pyexamples/diffusers/flux/flux_1_schnell_first_block_cache.py
- class QEfficient.diffusers.pipelines.flux.pipeline_flux.QEffFluxPipeline(model, enable_first_block_cache: bool = False, first_block_cache_downsample_factor: int = 4, *args, **kwargs)[source]
QEfficient-optimized Flux pipeline for high-performance text-to-image generation on Qualcomm AI hardware.
This pipeline provides an optimized implementation of the Flux diffusion model specifically designed for deployment on Qualcomm AI Cloud (QAIC) devices. It wraps the original HuggingFace Flux model components with QEfficient-optimized versions that can be exported to ONNX format and compiled into Qualcomm Program Container (QPC) files for efficient inference.
The pipeline supports the complete Flux workflow including: - Dual text encoding with CLIP and T5 encoders - Transformer-based denoising with adaptive layer normalization - VAE decoding for final image generation - Performance monitoring and optimization
- text_encoder
Optimized CLIP text encoder for pooled embeddings
- Type:
- text_encoder_2
Optimized T5 text encoder for sequence embeddings
- Type:
- transformer
Optimized Flux transformer for denoising
- Type:
- modules
Dictionary of all pipeline modules for batch operations
- Type:
Dict[str, Any]
- model
Original HuggingFace Flux model reference
- Type:
FluxPipeline
- tokenizer
CLIP tokenizer for text preprocessing
- scheduler
Diffusion scheduler for timestep management
Example
>>> from QEfficient.diffusers.pipelines.flux import QEffFluxPipeline >>> pipeline = QEffFluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell") >>> images = pipeline( ... prompt="A beautiful sunset over mountains", ... height=512, ... width=512, ... num_inference_steps=28 ... ) >>> images.images[0].save("generated_image.png")
- compile(compile_config: str | None = None, parallel: bool = False, height: int = 512, width: int = 512, use_onnx_subfunctions: bool = False) None[source]
Compile ONNX models into optimized QPC format for deployment on Qualcomm AI hardware.
- Parameters:
compile_config (str, optional) – Path to a JSON configuration file containing compilation settings, device mappings, and optimization parameters. If None, uses the default configuration from get_default_config_path().
parallel (bool, default=False) – Compilation mode selection: - True: Compile modules in parallel using ThreadPoolExecutor for faster processing - False: Compile modules sequentially for lower resource usage
height (int, default=512) – Target image height in pixels.
width (int, default=512) – Target image width in pixels.
use_onnx_subfunctions (bool, default=False) – Whether to export models with ONNX subfunctions before compilation.
- Raises:
RuntimeError – If compilation fails for any module or if QAIC compiler is not available
FileNotFoundError – If ONNX models haven’t been exported or config file is missing
ValueError – If configuration parameters are invalid
OSError – If there are issues with file I/O during compilation
Example
>>> pipeline = QEffFluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell") >>> # Sequential compilation with default config >>> pipeline.compile(height=1024, width=1024) >>> >>> # Parallel compilation with custom config >>> pipeline.compile( ... compile_config="/path/to/custom_config.json", ... parallel=True, ... height=512, ... width=512 ... )
- encode_prompt(prompt: str | List[str], prompt_2: str | List[str] | None = None, num_images_per_prompt: int = 1, prompt_embeds: FloatTensor | None = None, pooled_prompt_embeds: FloatTensor | None = None, max_sequence_length: int = 512)[source]
Encode text prompts using Flux’s dual text encoder architecture.
Flux employs both CLIP and T5 encoders for comprehensive text understanding: - CLIP provides pooled embeddings for global semantic conditioning - T5 provides detailed sequence embeddings for fine-grained text control
- Parameters:
prompt (str or List[str]) – Primary prompt(s) for both encoders
prompt_2 (str or List[str], optional) – Secondary prompt(s) for T5. If None, uses primary prompt
num_images_per_prompt (int) – Number of images to generate per prompt
prompt_embeds (torch.FloatTensor, optional) – Pre-computed T5 embeddings
pooled_prompt_embeds (torch.FloatTensor, optional) – Pre-computed CLIP pooled embeddings
max_sequence_length (int) – Maximum sequence length for T5 tokenization
- Returns:
- (prompt_embeds, pooled_prompt_embeds, text_ids, encoder_perf_times)
prompt_embeds (torch.Tensor): T5 sequence embeddings [batch*num_images, seq_len, 4096]
pooled_prompt_embeds (torch.Tensor): CLIP pooled embeddings [batch*num_images, 768]
text_ids (torch.Tensor): Position IDs for text tokens [seq_len, 3]
encoder_perf_times (List[float]): Performance times [CLIP_time, T5_time]
- Return type:
tuple
- export(export_dir: str | None = None, use_onnx_subfunctions: bool = False) str[source]
Export all pipeline modules to ONNX format for deployment preparation.
This method systematically exports each pipeline component (CLIP text encoder, T5 text encoder, Flux transformer, and VAE decoder) to ONNX format. Each module is exported with its specific configuration including dynamic axes, input/output specifications, and optimization settings.
The export process prepares the models for subsequent compilation to QPC format, enabling efficient inference on QAIC hardware. ONNX subfunctions can be used for certain modules to optimize memory usage and performance.
- Parameters:
export_dir (str, optional) – Target directory for saving ONNX model files. If None, uses the default export directory structure based on model name and configuration. The directory will be created if it doesn’t exist.
use_onnx_subfunctions (bool, default=False) – Whether to enable ONNX subfunction optimization for supported modules. This can optimize thegraph and improve compilation efficiency for models like the transformer.
- Returns:
- Absolute path to the export directory containing all ONNX model files.
Each module will have its own subdirectory with the exported ONNX file.
- Return type:
str
- Raises:
RuntimeError – If ONNX export fails for any module
OSError – If there are issues creating the export directory or writing files
ValueError – If module configurations are invalid
Note
All models are exported in float32 precision for maximum compatibility
Dynamic axes are configured to support variable batch sizes and sequence lengths
The export process may take several minutes depending on model size
Exported ONNX files can be large (several GB for complete pipeline)
Example
>>> pipeline = QEffFluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell") >>> export_path = pipeline.export( ... export_dir="/path/to/export", ... use_onnx_subfunctions=True ... ) >>> print(f"Models exported to: {export_path}")
- classmethod from_pretrained(pretrained_model_name_or_path: str | PathLike | None, enable_first_block_cache: bool = False, first_block_cache_downsample_factor: int = 4, **kwargs)[source]
Load a pretrained Flux model from HuggingFace Hub or local path and wrap it with QEfficient optimizations.
This class method provides a convenient way to instantiate a QEffFluxPipeline from a pretrained Flux model. It automatically loads the base FluxPipeline model in float32 precision on CPU and wraps all components with QEfficient-optimized versions for QAIC deployment.
- Parameters:
pretrained_model_name_or_path (str or os.PathLike) – Either a HuggingFace model identifier (e.g., “black-forest-labs/FLUX.1-schnell”) or a local path to a saved model directory.
enable_first_block_cache (bool, optional) – Enables retained-state first-block-cache path.
first_block_cache_downsample_factor (int, optional) – Downsample factor for the first-block residual cache key when cache is enabled.
**kwargs – Additional keyword arguments passed to FluxPipeline.from_pretrained().
- Returns:
- A fully initialized pipeline instance with QEfficient-optimized components
ready for export, compilation, and inference on QAIC devices.
- Return type:
- Raises:
ValueError – If the model path is invalid or model cannot be loaded
OSError – If there are issues accessing the model files
RuntimeError – If model initialization fails
Example
>>> # Load from HuggingFace Hub >>> pipeline = QEffFluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell") >>> >>> # Load from local path >>> pipeline = QEffFluxPipeline.from_pretrained("/path/to/local/flux/model") >>> >>> # Load with custom cache directory >>> pipeline = QEffFluxPipeline.from_pretrained( ... "black-forest-labs/FLUX.1-dev", ... cache_dir="/custom/cache/dir" ... )