That’s my notes for the talk “From Faster-RCNN to Mask-RCNN” by Shaoqing Ren on April 26th, 2017.

Yesterday – background and pre-works of Mask R-CNN

Key functions

Mask R-CNN Architecture

Mask R-CNN Architecture

Classification

CNN for classification

Please ignoring the bounding box in the image

$$ \text{class} = Classifier(\text{image}) $$

Problems

Solutions

Detection

Detection Concept $$ \text{location}=Classifier(\text{all patches of an image)}
\text{precise_location}=Regressor(\text{image}, \text{rough_location}) $$

Problems

Solutions

R-CNN

RCNN

SPP-net / Fast R-CNN

SPP-net 1

SPP-net 2

Faster R-CNN

Faster-RCNN

Multiple scales / ratios

Different schemes for addressing multiple scales and sizes.

SSD / FPN

SSD and FPN

Instance Segmentation

Instance Segmentation

Keypoint Detection

Keypoint Detection

Today - details about Mask-RCNN and comparisons

RoI Align

RoI Align 1

Ablations for Mask R-CNN

Multinomial vs. Independent Masks

Multi-task Cascade vs. Joint Learning

Multi-task Cascade vs. Joint Learning

Table for Mask R-CNN

Comparison on Human Keypoints

Results

Keypoint detection results

More results of Mask R-CNN on COCO test images

Future - discussion

Order of key functions?

Precious & semantic label

Precious & semantic label

box-level label -> instance segmentation & keypoints detection -> instance seg with body parts

Semantic 3D reconstruction

Semantic 3D reconstruction

Future