Neural Graphics Texture Compression

Introduction

This groundbreaking research by Farzad Farhadzadeh et al. [1] addresses a critical challenge in computer graphics: efficiently compressing texture assets while maintaining quality and enabling random access during rendering.

"Advances in rendering have led to tremendous growth in texture assets, including resolution, complexity, and novel textures components, but this growth in data volume has not been matched by advances in its compression." - Farhadzadeh et al. [1]

Motivation

Modern renderers utilize a broad range of material properties beyond just color channels. Conventional texture compression methods like ASTC [2] can only compress textures with up to four channels and compress each mip level separately, failing to capture correlations across all channels and mip levels.

The research identifies significant redundancy within feature pyramids used in previous neural texture compression methods [3, 4]. As texture resolution increases, this redundancy becomes more pronounced, adversely affecting compression performance.

Figure 1: Visualization of grid features showing the dual-bank representation that captures different frequency information.

Method

The paper introduces an asymmetric autoencoder framework with four key components:

Global Transformer

Maps a texture set to a bottleneck latent representation, capturing spatial-channel-resolution redundancy.

Grid Constructor

Two grid constructors map the latent representation to grid pairs that store quantized features.

Grid Sampler

Samples the grids based on texture coordinates and mip level, facilitating texel reconstruction from different mip levels by sampling features with varying strides.

Texture Synthesizer

Reconstructs texels at specific positions and mip levels.

class CompressionModel(nn.Module):
    """
    Complete neural texture compression model.
    
    This integrates all components of the pipeline:
    1. Global Transformer (Encoder)
    2. Grid Constructor
    3. Grid Sampler
    4. Texture Synthesizer (Decoder)
    """
    
    def __init__(self, in_channels, encoder_channels=[64, 128, 256], 
                 grid_channels=[16, 16], quantization_bits=4, hidden_dim=32,
                 num_residual_blocks=4, positional_encoding_levels=10,
                 use_attention=False):
        super().__init__()
        
        # Global Transformer (Encoder)
        self.global_transformer = GlobalTransformer(
            in_channels=in_channels,
            channels=encoder_channels,
            use_attention=use_attention
        )
        
        # Grid Constructor
        self.grid_constructor = GridConstructor(
            in_channels=encoder_channels[-1],
            grid_channels=grid_channels,
            quantization_bits=quantization_bits
        )
        
        # Grid Sampler
        self.grid_sampler = GridSampler()
        
        # Texture Synthesizer (Decoder)
        self.texture_synthesizer = TextureSynthesizer(
            g0_channels=grid_channels[0],
            g1_channels=grid_channels[1],
            out_channels=in_channels,
            hidden_dim=hidden_dim,
            num_residual_blocks=num_residual_blocks,
            positional_encoding_levels=positional_encoding_levels
        )

Multi-Resolution Support

A key innovation in this work is the support for multi-resolution mip levels, which is essential for texture filtering in real-time rendering. The method can reconstruct textures at any mip level from the same compressed representation.

Mip Level 0 (Full Resolution)

Mip Level 2 (Medium Resolution)

Mip Level 4 (Low Resolution)

Figure 2: Comparison of original (top) and reconstructed (bottom) textures at different mip levels, showing the method's ability to maintain quality across resolutions.

Experimental Results

The method achieves impressive compression results:

Neural Compression is

240x

more efficient¹

Average PSNR

27.3dB

across all textures²

BD-Rate Savings

88%

vs. ASTC³

Figure 3: PSNR comparison across different textures

Figure 4: SSIM comparison across different textures

Performance varies across different texture types, with some textures achieving PSNR values over 31 dB. Compared to conventional methods like ASTC [2], this approach shows significant improvements with BD-rate savings of -88.67%.

Implementation Details

The positional encoding used in the texture synthesizer is a critical component that enables high-quality reconstruction:

class PositionalEncoding(nn.Module):
    """
    Positional encoding as described in the paper.
    
    This is based on the encoding used in NeRF and similar methods,
    which maps coordinates to a higher-dimensional space using
    sinusoidal functions at different frequencies.
    """
    
    def __init__(self, num_levels=10, include_identity=True):
        super().__init__()
        self.num_levels = num_levels
        self.include_identity = include_identity
        
        # Frequency multipliers: 2^0, 2^1, 2^2, ...
        self.freq_bands = 2 ** torch.arange(num_levels).float()
        
    def forward(self, x, y):
        # Ensure inputs are float tensors
        x = x.float()
        y = y.float()
        
        # Reshape to [batch_size, num_points, 1]
        x = x.unsqueeze(-1)
        y = y.unsqueeze(-1)
        
        # Apply frequency bands
        x_enc = x * self.freq_bands.to(x.device)
        y_enc = y * self.freq_bands.to(y.device)
        
        # Apply sin and cos to each frequency
        x_sin = torch.sin(x_enc)
        x_cos = torch.cos(x_enc)
        y_sin = torch.sin(y_enc)
        y_cos = torch.cos(y_enc)
        
        # Concatenate all encodings
        out = torch.cat([x_sin, x_cos, y_sin, y_cos], dim=-1)
        
        # Optionally include the original coordinates
        if self.include_identity:
            out = torch.cat([x, y, out], dim=-1)
        
        return out

Analysis

The research analyzes several aspects of the compression method:

Interpolation in grid samplers: The grid pair captures different frequency information, with G0 focusing on high-frequency details and G1 on low-frequency features.
Global transformer impact: Models without a global transformer struggle to capture high-frequency information.
Sampling with stride: Using a single resolution grid-pair with stride sampling outperforms multi-resolution grid-pairs.
Synthesizer depth: The residual blocks in the texture synthesizer are critical for performance.

Conclusion

This research introduces an effective method for texture compression in photorealistic rendering, leveraging multiple levels of redundancy:

Among different channels of a texture
Across various resolutions of the same texture
Within individual pixels within each channel

The method achieves state-of-the-art performance, significantly outperforming conventional texture compression methods and competitive neural compression methods.

References

Farhadzadeh, F., et al. "Neural Graphics Texture Compression: Supporting Random Access." arXiv:2407.00021, 2024.
Nystad, J., et al. "Adaptive scalable texture compression." In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on High Performance Graphics, 2012.
Mentzer, F., et al. "High-Fidelity Generative Image Compression." NeurIPS, 2020.
Ballé, J., et al. "Variational image compression with a scale hyperprior." ICLR, 2018.