Avatar

Rahul Gupta

Researcher

Read Resume
thumbnail
Architecture
Texture Samples
PSNR Comparison
SSIM Comparison
Training Progress

Neural Graphics Texture Compression

PyTorchNeural NetworksComputer GraphicsTexture Compression

A novel approach to texture set compression that integrates traditional GPU texture representation and Neural Image Compression techniques, designed to enable random access and support many-channel texture sets.

Introduction

This groundbreaking research by Farzad Farhadzadeh et al. [1] addresses a critical challenge in computer graphics: efficiently compressing texture assets while maintaining quality and enabling random access during rendering.

"Advances in rendering have led to tremendous growth in texture assets, including resolution, complexity, and novel textures components, but this growth in data volume has not been matched by advances in its compression." - Farhadzadeh et al. [1]

Motivation

Modern renderers utilize a broad range of material properties beyond just color channels. Conventional texture compression methods like ASTC [2] can only compress textures with up to four channels and compress each mip level separately, failing to capture correlations across all channels and mip levels.

The research identifies significant redundancy within feature pyramids used in previous neural texture compression methods [3, 4]. As texture resolution increases, this redundancy becomes more pronounced, adversely affecting compression performance.

Grid Features Visualization

Figure 1: Visualization of grid features showing the dual-bank representation that captures different frequency information.

Method

The paper introduces an asymmetric autoencoder framework with four key components:

Global Transformer

Maps a texture set to a bottleneck latent representation, capturing spatial-channel-resolution redundancy.

Grid Constructor

Two grid constructors map the latent representation to grid pairs that store quantized features.

Grid Sampler

Samples the grids based on texture coordinates and mip level, facilitating texel reconstruction from different mip levels by sampling features with varying strides.

Texture Synthesizer

Reconstructs texels at specific positions and mip levels.

class CompressionModel(nn.Module):
    """
    Complete neural texture compression model.
    
    This integrates all components of the pipeline:
    1. Global Transformer (Encoder)
    2. Grid Constructor
    3. Grid Sampler
    4. Texture Synthesizer (Decoder)
    """
    
    def __init__(self, in_channels, encoder_channels=[64, 128, 256], 
                 grid_channels=[16, 16], quantization_bits=4, hidden_dim=32,
                 num_residual_blocks=4, positional_encoding_levels=10,
                 use_attention=False):
        super().__init__()
        
        # Global Transformer (Encoder)
        self.global_transformer = GlobalTransformer(
            in_channels=in_channels,
            channels=encoder_channels,
            use_attention=use_attention
        )
        
        # Grid Constructor
        self.grid_constructor = GridConstructor(
            in_channels=encoder_channels[-1],
            grid_channels=grid_channels,
            quantization_bits=quantization_bits
        )
        
        # Grid Sampler
        self.grid_sampler = GridSampler()
        
        # Texture Synthesizer (Decoder)
        self.texture_synthesizer = TextureSynthesizer(
            g0_channels=grid_channels[0],
            g1_channels=grid_channels[1],
            out_channels=in_channels,
            hidden_dim=hidden_dim,
            num_residual_blocks=num_residual_blocks,
            positional_encoding_levels=positional_encoding_levels
        )

Multi-Resolution Support

A key innovation in this work is the support for multi-resolution mip levels, which is essential for texture filtering in real-time rendering. The method can reconstruct textures at any mip level from the same compressed representation.

Mip Level 0 (Highest Resolution)

Mip Level 0 (Full Resolution)

Mip Level 2 (Medium Resolution)

Mip Level 2 (Medium Resolution)

Mip Level 4 (Lowest Resolution)

Mip Level 4 (Low Resolution)

Figure 2: Comparison of original (top) and reconstructed (bottom) textures at different mip levels, showing the method's ability to maintain quality across resolutions.

Experimental Results

The method achieves impressive compression results:

Neural Compression is
240x
more efficient1
Average PSNR
27.3dB
across all textures2
BD-Rate Savings
88%
vs. ASTC3
PSNR Comparison

Figure 3: PSNR comparison across different textures

SSIM Comparison

Figure 4: SSIM comparison across different textures

Performance varies across different texture types, with some textures achieving PSNR values over 31 dB. Compared to conventional methods like ASTC [2], this approach shows significant improvements with BD-rate savings of -88.67%.

Implementation Details

The positional encoding used in the texture synthesizer is a critical component that enables high-quality reconstruction:

class PositionalEncoding(nn.Module):
    """
    Positional encoding as described in the paper.
    
    This is based on the encoding used in NeRF and similar methods,
    which maps coordinates to a higher-dimensional space using
    sinusoidal functions at different frequencies.
    """
    
    def __init__(self, num_levels=10, include_identity=True):
        super().__init__()
        self.num_levels = num_levels
        self.include_identity = include_identity
        
        # Frequency multipliers: 2^0, 2^1, 2^2, ...
        self.freq_bands = 2 ** torch.arange(num_levels).float()
        
    def forward(self, x, y):
        # Ensure inputs are float tensors
        x = x.float()
        y = y.float()
        
        # Reshape to [batch_size, num_points, 1]
        x = x.unsqueeze(-1)
        y = y.unsqueeze(-1)
        
        # Apply frequency bands
        x_enc = x * self.freq_bands.to(x.device)
        y_enc = y * self.freq_bands.to(y.device)
        
        # Apply sin and cos to each frequency
        x_sin = torch.sin(x_enc)
        x_cos = torch.cos(x_enc)
        y_sin = torch.sin(y_enc)
        y_cos = torch.cos(y_enc)
        
        # Concatenate all encodings
        out = torch.cat([x_sin, x_cos, y_sin, y_cos], dim=-1)
        
        # Optionally include the original coordinates
        if self.include_identity:
            out = torch.cat([x, y, out], dim=-1)
        
        return out

Analysis

The research analyzes several aspects of the compression method:

  • Interpolation in grid samplers: The grid pair captures different frequency information, with G0 focusing on high-frequency details and G1 on low-frequency features.
  • Global transformer impact: Models without a global transformer struggle to capture high-frequency information.
  • Sampling with stride: Using a single resolution grid-pair with stride sampling outperforms multi-resolution grid-pairs.
  • Synthesizer depth: The residual blocks in the texture synthesizer are critical for performance.

Conclusion

This research introduces an effective method for texture compression in photorealistic rendering, leveraging multiple levels of redundancy:

  • Among different channels of a texture
  • Across various resolutions of the same texture
  • Within individual pixels within each channel

The method achieves state-of-the-art performance, significantly outperforming conventional texture compression methods and competitive neural compression methods.

References

  1. Farhadzadeh, F., et al. "Neural Graphics Texture Compression: Supporting Random Access." arXiv:2407.00021, 2024.
  2. Nystad, J., et al. "Adaptive scalable texture compression." In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on High Performance Graphics, 2012.
  3. Mentzer, F., et al. "High-Fidelity Generative Image Compression." NeurIPS, 2020.
  4. Ballé, J., et al. "Variational image compression with a scale hyperprior." ICLR, 2018.
Code
©2025 Rahul Gupta