For each bounding box $b$ we extract its embedding from the center pixel. In addition to this 32 × 32 cropped feature map, we add two inputs for improved stability of some mask-heads: (1) Instance embedding: an additional head is added to the backbone that predicts a per-pixel embedding. During post-processing, the predicted mask is re-aligned according to the predicted box and resized to the resolution of the image. We train this mask-head via a per-pixel cross-entropy loss averaged over all pixels and instances. The final prediction at the end is a class-agnostic 32 × 32 tensor which we pass through a sigmoid to get per-pixel probabilities. $b$, we crop a region $P\_$ to a mask-head. The motivation for this new architecture is that boxes are much cheaper to annotate than masks, so the authors address the “partially supervised” instance segmentation problem, where all classes have bounding box annotations but only a subset of classes have mask annotations.įor predicting bounding boxes, CenterNet outputs 3 tensors: (1) a class-specific () which indicates the probability of the center of a bounding box being present at each location, (2) a class-agnostic 2-channel tensor indicating the height and width of the bounding box at each center pixel, and (3) since the output feature map is typically smaller than the image (stride 4 or 8), CenterNet also predicts an x and y direction offset to recover this discretization error at each center pixel.įor Deep-MAC, in parallel to the box-related prediction heads, we add a fourth pixel embedding branch $P$. **Deep-MAC**, or **Deep Mask-heads Above CenterNet**, is a type of anchor-free instance segmentation model based on ().
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |