Chenyang Zhu

朱晨阳

My name is Chenyang Zhu. I am currently an Associate Professor at School of Computer Science, National University of Defense Technology (NUDT). I am a faculty member of iGrape Lab @ NUDT, which conducts research in the areas of computer graphics and computer vision. The current directions of interest include data-driven shape analysis and modeling, 3D vision and robot perception & navigation, etc.

I was a Ph.D. student in Gruvi Lab, school of Computing Science at Simon Fraser University, under the supervision of Prof. Hao(Richard) Zhang. I earned my Bachelor and Master degree in computer science from National University of Defense Technology (NUDT) in Jun. 2011 and Dec. 2013 respectively.

#

News

  • One paper was conditionally acceptted by PG 2024 (Journal track)!
  • One paper was conditionally acceptted by SIGGRAPH ASIA 2024!
  • One paper got acceptted by ACM MM 2024!

Research

Blog

Shape analysis

Blog

3D Vision

Blog

Robotic applications

Grants
  • Graduate School Funding, National University of Defense Technology. 国防科技大学校科研项目. 2019-2022
  • National Natural Science Foundation of China. 国家自然科学基金青年项目. 2020-2023.
  • Young Elite Scientists Sponsorship Program by CAST. 中国科协青年人才托举工程. 2020-2023
  • Hunan Provincial Science and Technology Department Funding. 湖湘青年英才. 2021-2024
  • National Natural Science Foundation of China. 国家自然科学基金面上项目. 2024-2027.

Publications

2024

PG 2024

DSGI-Net: Density-based Selective Grouping Point Cloud Learning Network for Indoor Scene

Xin Wen, Yao Duan, Kai Xu and Chenyang Zhu

Indoor scene point clouds exhibit diverse distributions and varying sparsity, characterized by more complex geometry and occlusion than outdoor scenes or individual objects. Despite recent advancements in 3D point cloud analysis introducing various network architectures, there remains a lack of frameworks tailored to the unique attributes of indoor scenarios. To address this, we propose DAGINet, a novel indoor scene point cloud learning network that can be embedded into other models. The key innovation of this work is sampling more informative neighbor points adaptively in sparse regions and promoting semantic consistency of the local area where different instances are in proximity but belong to distinct categories. Furthermore, our method encodes the semantic and spatial relationships between points within local regions to mitigate the loss of local geometry details. Extensive experiments on the ScanNetv2, SUN RGB-D, and S3DIS indoor scene benchmarks demonstrate that our method is both concise and efficient.

SIGGRAPH Asia 2024

LLM-enhanced Scene Graph Learning for Household Rearrangement

Wenhao Li, Zhiyuan Yu, Qijin She, Zhinan Yu, Yuqing Lan, Chenyang Zhu, Ruizhen Hu and Kai Xu

The household rearrangement task involves spotting misplaced objects in a scene and accommodate them with proper places. It depends both on common-sense knowledge on the objective side and human user preference on the subjective side. In achieving such task, we propose to mine object functionality with user preference alignment directly from the scene itself, without relying on human intervention. To do so, we work with scene graph representation and propose LLM-enhanced scene graph learning which transforms the input scene graph into an affordance-enhanced graph (AEG) with information-enhanced nodes and newly discovered edges (relations). In AEG, the nodes corresponding to the receptacle objects are augmented with context-induced affordance which encodes what kind of carriable objects can be placed on it. New edges are discovered with newly discovered non-local relations. With AEG, we perform task planning for scene rearrangement by detecting misplaced carriables and determining a proper placement for each of them. We test our method by implementing a tiding robot in simulator and perform evaluation on a new benchmark we build. Extensive evaluations demonstrate that our method achieves state-of-the-art performance on misplacement detection and the following rearrangement planning.

ACM Multimedia 2024

CSO: Constraint-guided Space Optimization for Active Scene Mapping

Xuefeng Yin, Chenyang Zhu, Shanglai Qu, Yuqi Li, Kai Xu, Baocai Yin and Xin Yang

Simultaneously mapping and exploring a complex unknown scene is an NP-hard problem, which is still challenging with the rapid development of deep learning techniques. We present CSO, a deep reinforcement learning-based framework for efficient active scene mapping. Constraint-guided space optimization is adopted for both state and critic space to reduce the difficulty of finding the global optimal explore path and avoid long-distance round trips while exploring. We first take the frontiers-based entropy as the input constraint with the raw observation into the network, which guides the training start from imitating the local greedy searching. However, the entropy-based optimization can easily get stuck with few local optimal or cause inefficient round trips since the entropy space and the real world do not share the same metric. Inspired by constrained reinforcement learning, we then introduce an action mask-based optimization constraint to align the metric of these two spaces. Exploration optimization in aligned spaces can avoid long-distance round trips more effectively.

IEEE transactions on visualization and computer graphics

SuperUDF: Self-Supervised UDF Estimation for Surface Reconstruction

Hui Tian, Chenyang Zhu, Yifei Shi and Kai Xu

Learning-based surface reconstruction based on unsigned distance functions (UDF) has many advantages such as handling open surfaces. We propose SuperUDF, a self-supervised UDF learning which exploits a learned geometry prior for efficient training and a novel regularization for robustness to sparse sampling. The core idea of SuperUDF draws inspiration from the classical surface approximation operator of locally optimal projection (LOP). The key insight is that if the UDF is estimated correctly, the 3D points should be locally projected onto the underlying surface following the gradient of the UDF. Based on that, a number of inductive biases on UDF geometry and a pre-learned geometry prior are devised to learn UDF estimation efficiently. A novel regularization loss is proposed to make SuperUDF robust to sparse sampling. Furthermore, we also contribute a learning-based mesh extraction from the estimated UDFs. Extensive evaluations demonstrate that SuperUDF outperforms the state of the arts on several public datasets in terms of both quality and efficiency.

Computational Visual Media

THP: Tensor-Field-Driven Hierarchical Path Planning for Autonomous Scene Exploration with Depth Sensors

Yuefeng Xi, Chenyang Zhu, Yao Duan, Renjiao Yi, Lintao Zheng Hongjun He and Kai Xu

It is challenging to automatically explore an unknown 3D environment with a robot only equipped with depth sensors due to the limited field of view. We introduce THP, a tensor field-based framework for efficient environment exploration which can better utilize the encoded depth information through the geometric characteristics of tensor fields. Specifically, a corresponding tensor field is constructed incrementally and guides the robot to formulate optimal global exploration paths and a collision-free local movement strategy...

Computational Visual Media

Learning Accurate Template Matching with Differentiable Coarse-to-fine Correspondence Refinement

Zhirui Gao, Renjiao Yi, Zheng Qin, Yunfan Ye, Chenyang Zhu and Kai Xu

Template matching is a fundamental task in computer vision and has been studied for decades. It plays an essential role in the manufacturing industry for estimating the poses of different parts, facilitating downstream tasks such as robotic grasping. Existing works fail when the template and source images are in different modalities, cluttered backgrounds or weak textures. They also rarely consider geometric transformations via homographies, which commonly existed even for planar industrial parts. To tackle the challenges, we propose an accurate template matching method based on differentiable coarse-to-fine correspondence refinement...

2023

ICCV 2023

2D3D-MATR: 2D-3D Matching Transformer for Detection-free Registration between Images and Point Clouds

Minhao Li, Zheng Qin, Zhirui Gao, Renjiao Yi, Chenyang Zhu, Yulan Guo and Kai Xu

The commonly adopted detect-then-match approach to registration finds difficulties in the cross-modality cases due to the incompatible keypoint detection and inconsistent feature description. We propose, 2D3D-MATR, a detection-free method for accurate and robust registration between images and point clouds. Our method adopts a coarse-to-fine pipeline where it first computes coarse correspondences between downsampled patches of the input image and the point cloud and then extends them to form dense correspondences between pixels and points within the patch region...

IEEE Transactions on Multimedia

Tensorformer: Normalized Matrix Attention Transformer for High-quality Point Cloud Reconstruction

Hui Tian, Zheng Qin, Renjiao Yi, Chenyang Zhu and Kai Xu

Surface reconstruction from raw point clouds has been studied for decades in the computer graphics community, which is highly demanded by modeling and rendering applications nowadays. Classic solutions, such as Poisson surface reconstruction, require point normals as extra input to perform reasonable results. Modern transformer-based methods can work without normals, while the results are less fine-grained due to limited encoding performance in local fusion from discrete points. We introduce a novel normalized matrix attention transformer (Tensorformer) to perform high-quality reconstruction...

Computational Visual Media

EFECL: Feature Encoding Enhancement with Contrastive Learning for Indoor 3D Object Detection

Yao Duan, Renjiao Yi, Yuanming Gao, Kai Xu and Chenyang Zhu

Good proposal initials are critical for 3D object detection applications. However, due to the significant geometry variation of indoor scenes, incomplete and noisy proposals are inevitable in most cases. Mining feature information among these ”bad” proposals may mislead the detection. Contrastive learning provides a feasible way for representing proposals, which can align complete and incomplete/noisy proposals in feature space. The aligned feature space can help us build robust 3D representation even if bad proposals are given. Therefore, we devise a new contrast learning framework for indoor 3D object detection, called EFECL, that learns robust 3D representations by contrastive learning of proposals on two different levels...

CVPR 2023

Self-supervised Non-Lambertian Single-view Image Relighting

Renjiao Yi, Chenyang Zhu (co-first author), Kai Xu

We present a learning-based approach to relighting a single image of non-Lambertian objects. Our method enables inserting objects from photographs into new scenes and relighting them under the new environment lighting, which is essential for AR applications. To relight the object, we solve both inverse rendering and re-rendering. To resolve the ill-posed inverse rendering, we propose a self-supervised method by a low-rank constraint. To facilitate the self-supervised training, we contribute Relit, a large-scale (750K images) dataset of videos with aligned objects under changing illuminations. For re-rendering, we propose a differentiable specular rendering layer to render non-Lambertian materials under various illuminations of spherical harmonics. The whole pipeline is end-to-end and efficient, allowing for a mobile app implementation of AR object insertion. Extensive evaluations demonstrate that our method achieves state-of-the-art performance.

CVPR 2023

NEF: Neural Edge Fields for 3D Parametric Curve Reconstruction from Multi-view Images

Yunfan Ye, Renjiao Yi, Zhirui Gao, Chenyang Zhu, Zhiping Cai and Kai Xu

We study the problem of reconstructing 3D feature curves of an object from a set of calibrated multi-view images. To do so, we learn a neural implicit field representing the density distribution of 3D edges which we refer to as Neural Edge Field (NEF). Inspired by NeRF, NEF is optimized with a view-based rendering loss where a 2D edge map is rendered at a given view and is compared to the ground-truth edge map extracted from the image of that view. The rendering-based differentiable optimization of NEF fully exploits 2D edge detection, without needing a supervision of 3D edges, a 3D geometric operator or cross-view edge correspondence. Several technical designs are devised to ensure learning a range-limited and view-independent NEF for robust edge extraction. The final parametric 3D curves are extracted from NEF with an iterative optimization method. On our benchmark with synthetic data, we demonstrate that NEF outperforms existing state-of-the-art methods on all metrics.

AAAI 2023 (Oral)

Multi-resolution Monocular Depth Map Fusion by Self-supervised Gradient-based Composition

Yaqiao Dai, Renjiao Yi, Chenyang Zhu, Hongjun He and Kai Xu

Monocular depth estimation is a challenging problem on which deep neural networks have demonstrated great potential. However, depth maps predicted by existing deep models usually lack fine-grained details due to the convolution operations and the down-samplings in networks. We find that increasing input resolution is helpful to preserve more local details while the estimation at low resolution is more accurate globally. Therefore, we propose a novel depth map fusion module to combine the advantages of estimations with multi-resolution inputs. Instead of merging the low- and high-resolution estimations equally, we adopt the core idea of Poisson fusion, trying to implant the gradient domain of high-resolution depth into the low-resolution depth. While classic Poisson fusion requires a fusion mask as supervision, we propose a self-supervised framework based on guided image filtering. We demonstrate that this gradient-based composition performs much better at noisy immunity, compared with the state-of-the-art depth map fusion method.

2022

Computational Visual Media

6DOF Pose Estimation of a 3D Rigid Object based on Edge-enhanced Point Pair Features

Chenyi Liu, Fei Chen, Lu Deng, Renjiao Yi, Lintao Zheng, Chenyang Zhu and Kai Xu

The point pair feature (PPF) is widely used for 6D pose estimation. In this paper, we propose an efficient 6D pose estimation method based on the PPF framework.We introduce a well-targeted down-sampling strategy that focuses more on edge area for efficient feature extraction of complex geometry. A pose hypothesis validation approach is proposed to resolve the symmetric ambiguity by calculating edge matching degree. We perform evaluations on two challenging datasets and one real-world collected dataset, demonstrating the superiority of our method on pose estimation of geometrically complex, occluded, symmetrical objects. We further validate our method by applying it to simulated punctures.

CVPR 2022

DisARM: Displacement Aware Relation Module for 3D Detection

Yao Duan, Chenyang Zhu, Yuqing Lan, Renjiao Yi, Xinwang Liu and Kai Xu

The core idea of DisARM is that contextual information is critical to tell the difference between different objects when the instance geometry is incomplete or featureless. We find that relations between proposals provide a good representation to describe the context. Rather than working with all relations, we find that training with relations only between the most representative ones, or anchors, can significantly boost the detection performance.

Computational Visual Media

ARM3D: Attention-based relation module for indoor 3D object detection

Yuqing Lan, Yao Duan, Chenyi Liu, Chenyang Zhu, Yueshan Xiong, Hui Huang and Kai Xu

Relation contexts have been proved to be useful for many challenging vision tasks. In the field of 3D object detection, previous methods have been taking the advantage of context encoding, graph embedding, or explicit relation reasoning to extract relation contexts. However, there exist inevitably redundant relation contexts due to noisy or low-quality proposals. In fact, invalid relation contexts usually indicate underlying scene misunderstanding and ambiguity, which may, on the contrary, reduce the performance in complex scenes...

Science China (Information Sciences)

Learning Practically Feasible Policies for Online 3D Bin Packing

Hang Zhao, Chenyang Zhu (co-first author), Xin Xu, Hui Huang and Kai Xu

This is a follow-up of our AAAI 2021 work on online 3D BPP. In this work, we aim to learn more PRACTICALLY FEASIBLE policies with REAL ROBOT TESTING! To that end, we propose three critical designs: (1) an online analysis of packing stability based on a novel stacking tree which is highly accurate and computationally efficient and hence especially suited for RL training, (2) a decoupled packing policy learning for different dimensions of placement for high-res spatial discretization and hence high packing precision, and (3) a reward function dictating the robot to place items in a far-to-near order and therefore simplifying motion planning of the robotic arm.

2021

SIGGRAPH 2021, ACM Transactions on Graphics

ROSEFusion: Random Optimization for Online Dense Reconstruction under Fast Camera Motion

Jiazhao Zhang, Chenyang Zhu, Lintao Zheng and Kai Xu

Despite CNN-based deblurring models have shown their superiority on solving motion blurs, how to restore photorealistic images from severe motion blurs remains an ill-posed problem due to the loss of temporal information and textures. In this paper, we propose a deep fine-grained video deblurring pipeline consisting of a deblurring module and a recurrent module to address severe motion blurs. Concatenating the blurry image with event representations at a fine-grained temporal period, our proposed model achieves state-of-the-art performance on both popular GoPro and real blurry datasets captured by DAVIS, and is capable of generating high frame-rate video by applying a tiny shift to event representations in the recurrent module.

MMM 2021

Fine-Grained Video Deblurring with Event Camera

Limeng Zhang, Hongguang Zhang, Chenyang Zhu, Shasha Guo, Jihua Chen, Lei Wang

Despite CNN-based deblurring models have shown their superiority on solving motion blurs, how to restore photorealistic images from severe motion blurs remains an ill-posed problem due to the loss of temporal information and textures. In this paper, we propose a deep fine-grained video deblurring pipeline consisting of a deblurring module and a recurrent module to address severe motion blurs. Concatenating the blurry image with event representations at a fine-grained temporal period, our proposed model achieves state-of-the-art performance on both popular GoPro and real blurry datasets captured by DAVIS, and is capable of generating high frame-rate video by applying a tiny shift to event representations in the recurrent module.

AAAI 2021

Online 3D Bin Packing with Constrained Deep Reinforcement Learning

Hang Zhao, Qijin She, Chenyang Zhu, Yin Yang and Kai Xu

We solve a challenging yet practically useful variant of 3D Bin Packing Problem (3D-BPP). In our problem, the agent has limited information about the items to be packed into the bin, and an item must be packed immediately after its arrival without buffering or readjusting. The item's placement also subjects to the constraints of collision avoidance and physical stability.

Before 2020

CVPR 2020

Fusion-Aware Point Convolution for Online Semantic 3D Scene Segmentation

Jiazhao Zhang, Chenyang Zhu (co-first author), Lintao Zheng and Kai Xu

Online semantic scene segmentation with high speed (12 FPS) and SOTA accuracy (avg. IoU=0.72 measured w.r.t. per-frame ground-truth image labels). We have also submitted our results to the ScanNet benchmark, demonstrating an avg. IoU of 0.63 on the leaderboard. Note, however, the number was obtained by spatially transferring the point-wise labels of our online recontructed point clouds to the pre-reconstructed point clouds of the benchmark scenes...

CVPR 2020, Oral

AdaCoSeg: Adaptive Shape Co-Segmentation with Group Consistency Loss

Chenyang Zhu, Kai Xu, Siddhartha Chaudhuri, Li Yi, Leonidas J. Guibas and Hao Zhang

We introduce AdaCoSeg, a deep neural network architecture for adaptive co-segmentation of a set of 3D shapes represented as point clouds. Differently from the familiar single-instance segmentation problem, co-segmentation is intrinsically contextual: how a shape is segmented can vary depending on the set it is in. Hence, our network features an adaptive learning module to produce a consistent shape segmentation which adapts to a set.

Pacific Graphics 2019, Computer Graphics Forum

Active Scene Understanding via Online Semantic Reconstruction

Lintao Zheng, Chenyang Zhu, Jiazhao Zhang, Hang Zhao, Hui Huang, Matthias Niessner and Kai Xu

We propose a novel approach to robot-operated active understanding of unknown indoor scenes, based on online RGBD reconstruction with semantic segmentation. In our method, the exploratory robot scanning is both driven by and targeting at the recognition and segmentation of semantic objects from the scene. Our algorithm is built on top of the volumetric depth fusion framework (e.g., KinectFusion) and performs real-time voxel-based semantic labeling over the online reconstructed volume. The robot is guided by an online estimated discrete viewing score field (VSF) parameterized over the 3D space of ...

CVPR 2019

PartNet: A Recursive Part Decomposition Network for Fine-grained and Hierarchical Shape Segmentation

Fenggen Yu, Kun Liu, Yan Zhang, Chenyang Zhu and Kai Xu

Deep learning approaches to 3D shape segmentation are typically formulated as a multi-class labeling problem. Existing models are trained for a fixed set of labels, which greatly limits their flexibility and adaptivity. We opt for topdown recursive decomposition and develop the first deep learning model for hierarchical segmentation of 3D shapes, based on recursive neural networks. Starting from a full shape represented as a point cloud, our model performs recursive binary decomposition, where the decomposition network at all nodes in the hierarchy share weights. At each node, a node classifier is trained to determine the type (adjacency or symmetry) and stopping criteria of its decomposition ...

SIGGRAPH ASIA 2018, ACM Transactions on Graphics

SCORES: Shape Composition with Recursive Substructure Priors

Chenyang Zhu, Kai Xu, Siddhartha Chaudhuri, Renjiao Yi and Hao Zhang

We introduce SCORES, a recursive neural network for shape composition. Our network takes as input sets of parts from two or more source 3D shapes and a rough initial placement of the parts. It outputs an optimized part structure for the composed shape, leading to high-quality geometry construction. A unique feature of our composition network is that it is not merely learning how to connect parts. Our goal is to produce a coherent and plausible 3D shape, despite large incompatibilities among the input parts. The network may significantly alter the geometry and structure of the input parts ...

ECCV 2018

Faces as Lighting Probes via Unsupervised Deep Highlight Extraction

Renjiao Yi, Chenyang Zhu, Ping Tan and Stephen Lin

We present a method for estimating detailed scene illumination using human faces in a single image. In contrast to previous works that estimate lighting in terms of low-order basis functions or distant point lights, our technique estimates illumination at a higher precision in the form of a non-parametric environment map...

SIGGRAPH 2017, ACM Transactions on Graphics

Deformation-Driven Shape Correspondence via Shape Recognition

Chenyang Zhu, Renjiao Yi, Wallace Lira, Ibraheem Alhashim, Kai Xuand Hao Zhang

Many approaches to shape comparison and recognition start by establishing a shape correspondence. We “turn the table” and show that quality shape correspondences can be obtained by performing many shape recognition tasks. What is more, the method we develop computes a fine-grained, topology-varying part correspondence between two 3D shapes where the core evaluation mechanism only recognizes shapes globally. This is made possible by casting the part correspondence problem in a deformation-driven framework and relying on a data-driven “deformation energy” which rates visual similarity between deformed shapes and models from a shape repository. Our basic premise is that if a correspondence between two chairs (or airplanes, bicycles, etc.) is correct, then a reasonable deformation between the two chairs anchored on ...

SIGGRAPH 2015, ACM Transactions on Graphics

Interaction Context (ICON): Towards a Geometric Functionality Descriptor

Ruizhen Hu, Chenyang Zhu, Oliver van Kaick, Ligang Liu, Ariel Shamir and Hao Zhang

We introduce a contextual descriptor which aims to provide a geometric description of the functionality of a 3D object in the context of a given scene. Differently from previous works, we do not regard functionality as an abstract label or represent it implicitly through an agent. Our descriptor, called interaction context or ICON for short, explicitly represents the geometry of object-to-object interactions...

SIGGRAPH 2014, ACM Transactions on Graphics

Organizing Heterogeneous Scene Collections through Contextual Focal Points

Kai Xu, Rui Ma, Hao Zhang, Chenyang Zhu, Ariel Shamir, Daniel Cohen-Or and Hui Huang

We introduce focal points for characterizing, comparing, and organizing collections of complex and heterogeneous data and apply the concepts and algorithms developed to collections of 3D indoor scenes. We represent each scene by a graph of its constituent objects and define focal points as representative substructures in a scene collection. To organize a heterogeneous scene collection, we cluster the scenes...

Contact

chenyang.chandler.zhu@gmail.com

zhuchenyang07@nudt.edu.cn

Address

School Of Computing Science
National University of Defense Technology
109 Deya Rd.
Kaifu District
Changsha, Hunan. 410073
China