ICCV19-Paper-Review

Summaries of ICCV 2019 papers.

Fine-Grained Segmentation Networks: Self-Supervised Segmentation for Improved Long-Term Visual Localization

In computer vision, image segmentation is the process of partitioning a digital image into multiple segments . The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Visual localization is the problem of estimating the 6 Degree-of-Freedom (DoF) camera pose from which a given image was taken relative to a reference scene representation.

Traditional Method and problems with it:

Semantic Segmentation:

Semantic segmentation is the task of assigning a class label to each pixel in an input image. Mostly CNN are used and are trained in fully supervised fashion.However, obtaining a large amount of densely labeled images is very time-consuming and expensive . As a result, approaches based on weaker forms of annotations have been developed. CNNs for semantic segmentation typically only perform well under varying conditions if these conditions are reflected in the training set. Yet, creating pixel-level annotations for large image sets is a time consuming task.

(Semantic) Visual Localization :

It is done by using a 3D scene model constructed from a set of database images via Structure-from Motion.Thus approach establish a set of 2D-3D correspondences between a query image and the model. There is another machine learning based approaches either replace the 2D-3D matching stage through scene coordinate regression. The former type of methods achieves state-of-the-art localization accuracy in smallscale scenes , but do not seem to easily scale to larger scenes.

In this paper a new neural network is proposed, the Fine-Grained Segmentation Network (FGSN), that can be used to provide image segmentations with a larger number of labels and can be trained in a self-supervised fashion and then after integrating the fine-grained segmentation produced by FSGNs we get substantial improvment in visual localization algorithm.

Fine-Grained Segmentation Networks (FGNs):

The Fine-Grained Segmentation Network (FGSN) has the same structure as a standard CNN used for semantic segmentation, but instead of being trained on a set of manually created labels, labels are created in a self-supervised manner.During training, at certain intervals, features are extracted from the images in the training set and clustered using k-means clustering. In this we use a set of 2D-2D point correspondences during training to ensure that the predictions are stable under seasonal changes and viewpoint variations, method we use for label creation is based on k-means clustering.

Segmantic Visual Localization using FSN:

In this we use fine semantic segmantations from FSN to get substantial improvment in visual localization algorithm.

Simple Semantic Match Consistency (SSMC):

The first approach is a simple-to-implement match consistency filter used as a baseline method . Given a set of 2D-3D matches between features in a query image and 3D points in a Structure-from-Motion (SfM) point cloud, SSMC uses semantics to filter out inconsistent matches.

Geometric-Semantic Match Consistency (GSMC):

The projections are used to measure a semantic consistency score for the pose by counting the number of points projecting into a query image region with the same label as the point.While performing significantly better than SSMC, GSMC makes additional assumptions and is computationally less efficient

Particle Filter-based Semantic Localization (PFSL):

In this approach, localization is approached as a filtering problem where we, in addition to a sequence of camera images, also have access to noisy odometry information. Both these sources are combined in a particle filter to sequentially estimate the pose of the camera by letting each particle describe a possible camera pose.

Evaluation measures:

We are doing evalution based on this table:

Conclusion

  1. We present a novel type of segmentation network, the Fine-Grained Segmentation Network (FGSN), that outputs dense segmentation maps based on cluster indices.This removes the need for human-defined classes.

  2. FGSNs allow us to create finer segmentations with more classes. We show that this has a positive impact on semantic visual localization algorithm.

  3. We perform detailed experiments to investigate the impact the number of clusters has on multiple visual localization algorithms. In addition, we compare two types of weight initializations, using networks pre-trained for semantic segmentation and image classification, respectively.