GaINeR: Geometry-Aware Implicit Network Representation

Abstract

Implicit Neural Representations (INRs) are widely used for modeling continuous 2D images, enabling high-fidelity reconstruction, super-resolution, and compression. Architectures such as SIREN, WIRE, and FINER demonstrate their ability to capture fine image details. However, conventional INRs lack explicit geometric structure, limiting local editing, and integration with physical simulation. To address these limitations, we propose GaINeR (Geometry-Aware Implicit Network Representation), a novel framework for 2D images that combines trainable Gaussian distributions with a neural network-based INR. For a given image coordinate, the model retrieves the K nearest Gaussians, aggregates distance-weighted embeddings, and predicts the RGB value via a neural network. This design enables continuous image representation, interpretable geometric structure, and flexible local editing, providing a foundation for physically aware and interactive image manipulation. Our method supports geometry-consistent transformations, seamless super-resolution, and integration with physics-based simulations. Moreover, the Gaussian representation allows lifting a single 2D image into a geometry-aware 3D representation, enabling depth-guided editing. Experiments demonstrate that GaINeR achieves state-of-the-art reconstruction quality while maintaining flexible and physically consistent image editing.

Video

Results

GaINeR enables intuitive and flexible image manipulation. Manual edits become simple and precise: adjusting the learned Gaussian field directly reshapes the image, producing smooth, artifact-free deformations that naturally preserve structure and appearance. The same formulation seamlessly supports physics-based effects — forces, collisions, or fluid-like motion gently modify the Gaussian geometry, yielding stable, coherent dynamics without tearing or retraining. Together, these capabilities allow GaINeR to handle both local edits and complex physics-driven transformations while maintaining high visual fidelity.