ToC Figure. Two point clouds \(PC_t\) and \(PC_{t+1}\) of consecutive frames are passed into the scene flow generator \(G_{sf}\). \(G_{sf}\) consists of three parts: the learning of point coud feature with the set conv layer, the learning of point relationship with the flow embedding layer, and the flow refinement with the set upconv layer. The point cloud \(PC_t\) at time \(t\) is warped to \(PC_{t+1}^*\) based on the predicted scene flow \(SF\). \(PC_t\) , \(PC_{t+1}\) and \(PC_{t+1}^*\) are fed into our designed discriminator \(D_{pc}\) to predict the probability that the input point cloud is from the real point cloud. The \(G_{sf}\) loss and loss are designed to optimize \(G_{sf}\) and \(D_{pc}\) , respectively.