Shortcuts

mmrotate.apis

mmrotate.apis.inference_detector_by_patches(model: Module, imgs: str | ndarray | Sequence[str] | Sequence[ndarray], sizes: List[int], steps: List[int], ratios: List[float], nms_cfg: dict, test_pipeline: Compose | None = None, bs: int = 1) DetDataSample | List[DetDataSample][source]

Inference patches with the detector.

Split huge image(s) into patches and inference them with the detector. Finally, merge patch results on one huge image by nms.

Parameters:
  • model (nn.Module) – The loaded detector.

  • imgs (str, ndarray, Sequence[str/ndarray]) – Either image files or loaded images.

  • sizes (list[int]) – The sizes of patches.

  • steps (list[int]) – The steps between two patches.

  • ratios (list[float]) – Image resizing ratios for multi-scale detecting.

  • nms_cfg (dict) – nms config.

  • bs (int) – Batch size, must greater than or equal to 1.

Returns:

Detection results.

Return type:

list[np.ndarray]

mmrotate.core

anchor

bbox

patch

evaluation

post_processing

visualization

mmrotate.datasets

datasets

class mmrotate.datasets.DIORDataset(ann_subdir: str = 'Annotations/Oriented Bounding Boxes/', file_client_args: dict | None = None, backend_args: dict | None = None, ann_type: str = 'obb', **kwargs)[source]

DIOR dataset for detection.

Parameters:
  • ann_subdir (str) – Subdir where annotations are. Defaults to ‘Annotations/Oriented Bounding Boxes/’.

  • file_client_args (dict) – Arguments to instantiate the corresponding backend in mmdet <= 3.0.0rc6. Defaults to None.

  • backend_args (dict, optional) – Arguments to instantiate the corresponding backend. Defaults to None.

  • ann_type (str) – Choose obb or hbb as ground truth. Defaults to obb.

property bbox_min_size: str | None

Return the minimum size of bounding boxes in the images.

filter_data() List[dict][source]

Filter annotations according to filter_cfg.

Returns:

Filtered results.

Return type:

List[dict]

get_cat_ids(idx: int) List[int][source]

Get DIOR category ids by index.

Parameters:

idx (int) – Index of data.

Returns:

All categories in the image of specified index.

Return type:

List[int]

load_data_list() List[dict][source]

Load annotation from XML style ann_file.

Returns:

Annotation info from XML file.

Return type:

list[dict]

parse_data_info(img_info: dict) dict | List[dict][source]

Parse raw annotation to target format.

Parameters:

img_info (dict) – Raw image information, usually it includes img_id, file_name, and xml_path.

Returns:

Parsed annotation.

Return type:

Union[dict, List[dict]]

class mmrotate.datasets.DOTADataset(diff_thr: int = 100, img_suffix: str = 'png', **kwargs)[source]

DOTA-v1.0 dataset for detection.

Note: ann_file in DOTADataset is different from the BaseDataset. In BaseDataset, it is the path of an annotation file. In DOTADataset, it is the path of a folder containing XML files.

Parameters:
  • diff_thr (int) – The difficulty threshold of ground truth. Bboxes with difficulty higher than it will be ignored. The range of this value should be non-negative integer. Defaults to 100.

  • img_suffix (str) – The suffix of images. Defaults to ‘png’.

filter_data() List[dict][source]

Filter annotations according to filter_cfg.

Returns:

Filtered results.

Return type:

List[dict]

get_cat_ids(idx: int) List[int][source]

Get DOTA category ids by index.

Parameters:

idx (int) – Index of data.

Returns:

All categories in the image of specified index.

Return type:

List[int]

load_data_list() List[dict][source]

Load annotations from an annotation file named as self.ann_file :returns: A list of annotation. :rtype: List[dict]

class mmrotate.datasets.DOTAv15Dataset(diff_thr: int = 100, img_suffix: str = 'png', **kwargs)[source]

DOTA-v1.5 dataset for detection.

Note: ann_file in DOTAv15Dataset is different from the BaseDataset. In BaseDataset, it is the path of an annotation file. In DOTAv15Dataset, it is the path of a folder containing XML files.

class mmrotate.datasets.DOTAv2Dataset(diff_thr: int = 100, img_suffix: str = 'png', **kwargs)[source]

DOTA-v2.0 dataset for detection.

Note: ann_file in DOTAv2Dataset is different from the BaseDataset. In BaseDataset, it is the path of an annotation file. In DOTAv2Dataset, it is the path of a folder containing XML files.

class mmrotate.datasets.HRSCDataset(img_subdir: str = 'AllImages', ann_subdir: str = 'Annotations', classwise: bool = False, file_client_args: dict | None = None, backend_args: dict | None = None, **kwargs)[source]

HRSC dataset for detection.

Note: There are two evaluation methods for HRSC datasets, which can be chosen through classwise. When classwise=False, it means there is only one class; When classwise=True, it means there are 31 classes of ships.

Parameters:
  • img_subdir (str) – Subdir where images are stored. Defaults to ‘AllImages’.

  • ann_subdir (str) – Subdir where annotations are. Defaults to ‘Annotations’.

  • classwise (bool) – Whether to use all 31 classes or only one class. Defaults to False.

  • file_client_args (dict) – Arguments to instantiate the corresponding backend in mmdet <= 3.0.0rc6. Defaults to None.

  • backend_args (dict, optional) – Arguments to instantiate the corresponding backend. Defaults to None.

property bbox_min_size: str | None

Return the minimum size of bounding boxes in the images.

filter_data() List[dict][source]

Filter annotations according to filter_cfg.

Returns:

Filtered results.

Return type:

List[dict]

get_cat_ids(idx: int) List[int][source]

Get COCO category ids by index.

Parameters:

idx (int) – Index of data.

Returns:

All categories in the image of specified index.

Return type:

List[int]

load_data_list() List[dict][source]

Load annotation from XML style ann_file.

Returns:

Annotation info from XML file.

Return type:

list[dict]

parse_data_info(img_info: dict) dict | List[dict][source]

Parse raw annotation to target format.

Parameters:

img_info (dict) – Raw image information, usually it includes img_id, file_name, and xml_path.

Returns:

Parsed annotation.

Return type:

Union[dict, List[dict]]

property sub_data_root: str

Return the sub data root.

pipelines

mmrotate.models

detectors

class mmrotate.models.detectors.H2RBoxDetector(backbone: ConfigDict | dict, neck: ConfigDict | dict, bbox_head: ConfigDict | dict, crop_size: Tuple[int, int] = (768, 768), padding: str = 'reflection', train_cfg: ConfigDict | dict | None = None, test_cfg: ConfigDict | dict | None = None, data_preprocessor: ConfigDict | dict | None = None, init_cfg: ConfigDict | dict | List[ConfigDict | dict] | None = None)[source]

Implementation of H2RBox

loss(batch_inputs: Tensor, batch_data_samples: List[DetDataSample]) dict | list[source]

Calculate losses from a batch of inputs and data samples.

Parameters:
  • batch_inputs (Tensor) – Input images of shape (N, C, H, W). These should usually be mean centered and std scaled.

  • batch_data_samples (list[DetDataSample]) – The batch data samples. It usually includes information such as gt_instance or gt_panoptic_seg or gt_sem_seg.

Returns:

A dictionary of loss components.

Return type:

dict

rotate_crop(batch_inputs: Tensor, rot: float = 0.0, size: Tuple[int, int] = (768, 768), batch_gt_instances: List[InstanceData] | None = None, padding: str = 'reflection') Tuple[Tensor, List[InstanceData]][source]
Parameters:
  • batch_inputs (Tensor) – Input images of shape (N, C, H, W). These should usually be mean centered and std scaled.

  • rot (float) – Angle of view rotation. Defaults to 0.

  • size (tuple[int]) – Crop size from image center. Defaults to (768, 768).

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • padding (str) – Padding method of image black edge. Defaults to ‘reflection’.

Returns:

Processed batch_inputs (Tensor) and batch_gt_instances (list[InstanceData])

class mmrotate.models.detectors.H2RBoxV2Detector(backbone: ConfigDict | dict, neck: ConfigDict | dict, bbox_head: ConfigDict | dict, crop_size: Tuple[int, int] = (768, 768), padding: str = 'reflection', view_range: Tuple[float, float] = (0.25, 0.75), train_cfg: ConfigDict | dict | None = None, test_cfg: ConfigDict | dict | None = None, data_preprocessor: ConfigDict | dict | None = None, init_cfg: ConfigDict | dict | List[ConfigDict | dict] | None = None)[source]

Implementation of H2RBox-v2

loss(batch_inputs: Tensor, batch_data_samples: List[DetDataSample]) dict | list[source]

Calculate losses from a batch of inputs and data samples.

Parameters:
  • batch_inputs (Tensor) – Input images of shape (N, C, H, W). These should usually be mean centered and std scaled.

  • batch_data_samples (list[DetDataSample]) – The batch data samples. It usually includes information such as gt_instance or gt_panoptic_seg or gt_sem_seg.

Returns:

A dictionary of loss components.

Return type:

dict

rotate_crop(batch_inputs: Tensor, rot: float = 0.0, size: Tuple[int, int] = (768, 768), batch_gt_instances: List[InstanceData] | None = None, padding: str = 'reflection') Tuple[Tensor, List[InstanceData]][source]
Parameters:
  • batch_inputs (Tensor) – Input images of shape (N, C, H, W). These should usually be mean centered and std scaled.

  • rot (float) – Angle of view rotation. Defaults to 0.

  • size (tuple[int]) – Crop size from image center. Defaults to (768, 768).

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • padding (str) – Padding method of image black edge. Defaults to ‘reflection’.

Returns:

Processed batch_inputs (Tensor) and batch_gt_instances (list[InstanceData])

class mmrotate.models.detectors.RefineSingleStageDetector(backbone: ConfigDict | dict, neck: ConfigDict | dict | None = None, bbox_head_init: ConfigDict | dict | None = None, bbox_head_refine: List[ConfigDict | dict | None] | None = None, train_cfg: ConfigDict | dict | None = None, test_cfg: ConfigDict | dict | None = None, data_preprocessor: ConfigDict | dict | None = None, init_cfg: ConfigDict | dict | List[ConfigDict | dict] | None = None)[source]

Base class for refine single-stage detectors, which used by S2A-Net and R3Det.

Parameters:
  • backbone (ConfigDict or dict) – The backbone module.

  • neck (ConfigDict or dict) – The neck module.

  • bbox_head_init (ConfigDict or dict) – The bbox head module of the first stage.

  • bbox_head_refine (list[ConfigDict | dict]) – The bbox head module of the refine stage.

  • train_cfg (ConfigDict or dict, optional) – The training config of RefineSingleStageDetector. Defaults to None.

  • test_cfg (ConfigDict or dict, optional) – The testing config of RefineSingleStageDetector. Defaults to None.

  • data_preprocessor (ConfigDict or dict, optional) – Config of DetDataPreprocessor to process the input data. Defaults to None.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None

extract_feat(batch_inputs: Tensor) Tuple[Tensor][source]

Extract features.

Parameters:

batch_inputs (Tensor) – Image tensor with shape (N, C, H ,W).

Returns:

Multi-level features that may have different resolutions.

Return type:

tuple[Tensor]

loss(batch_inputs: Tensor, batch_data_samples: List[DetDataSample]) dict | list[source]

Calculate losses from a batch of inputs and data samples.

Parameters:
  • batch_inputs (Tensor) – Input images of shape (N, C, H, W). These should usually be mean centered and std scaled.

  • batch_data_samples (list[DetDataSample]) – The batch data samples. It usually includes information such as gt_instance or gt_panoptic_seg or gt_sem_seg.

Returns:

A dictionary of loss components.

Return type:

dict

predict(batch_inputs: Tensor, batch_data_samples: List[DetDataSample], rescale: bool = True) List[DetDataSample][source]

Predict results from a batch of inputs and data samples with post- processing.

Parameters:
  • batch_inputs (Tensor) – Inputs with shape (N, C, H, W).

  • batch_data_samples (List[DetDataSample]) – The Data Samples. It usually includes information such as gt_instance, gt_panoptic_seg and gt_sem_seg.

  • rescale (bool) – Whether to rescale the results. Defaults to True.

Returns:

Detection results of the input images. Each DetDataSample usually contain ‘pred_instances’. And the pred_instances usually contains following keys.

  • scores (Tensor): Classification scores, has a shape (num_instance, )

  • labels (Tensor): Labels of bboxes, has a shape (num_instances, ).

  • bboxes (Tensor): Has a shape (num_instances, 5), the last dimension 5 arrange as (x, y, w, h, t).

Return type:

list[DetDataSample]

backbones

class mmrotate.models.backbones.ReResNet(depth: int, in_channels: int = 3, stem_channels: int = 64, base_channels: int = 64, expansion: int | None = None, num_stages: int = 4, strides: Sequence[int] = (1, 2, 2, 2), dilations: Sequence[int] = (1, 1, 1, 1), out_indices: Sequence[int] = (3,), style: str = 'pytorch', deep_stem: bool = False, avg_down: bool = False, frozen_stages: int = -1, conv_cfg: ConfigDict | dict | None = None, norm_cfg: ConfigDict | dict = {'requires_grad': True, 'type': 'BN'}, norm_eval: bool = False, with_cp: bool = False, zero_init_residual: bool = True, init_cfg: ConfigDict | dict | List[ConfigDict | dict] | None = None)[source]

ReResNet backbone.

Please refer to the paper for details.

Parameters:
  • depth (int) – Network depth, from {18, 34, 50, 101, 152}.

  • in_channels (int) – Number of input image channels. Defaults to 3.

  • stem_channels (int) – Output channels of the stem layer. Defaults to 64.

  • base_channels (int) – Middle channels of the first stage. Defaults to 64.

  • expansion (int, optional) – The expansion for BasicBlock/Bottleneck. If not specified, it will firstly be obtained via block.expansion. If the block has no attribute “expansion”, the following default values will be used: 1 for BasicBlock and 4 for Bottleneck. Defaults to None.

  • num_stages (int) – Stages of the network. Defaults to 4.

  • strides (Sequence[int]) – Strides of the first block of each stage. Defaults to (1, 2, 2, 2).

  • dilations (Sequence[int]) – Dilation of each stage. Defaults to (1, 1, 1, 1).

  • out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Defaults to (3, ).

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Defaults to False.

  • avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Defaults to False.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.

  • conv_cfg (ConfigDict or dict, optional) – dictionary to construct and config conv layer. Defaults to None

  • norm_cfg (ConfigDict or dict) – dictionary to construct and config norm layer. Defaults to dict(type='BN')

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Defaults to False.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Defaults to True.

  • init_cfg (ConfigDict or dict or list[ConfigDict or dict], optional) – Initialization config dict. Defaults to None.

forward(x: Tensor) Tuple[Tensor][source]

Forward function of ReResNet.

make_res_layer(**kwargs) Module[source]

Build Reslayer.

property norm1: str

Get normalizion layer’s name.

train(mode: bool = True) None[source]

Train function of ReResNet.

necks

class mmrotate.models.necks.ReFPN(in_channels: List[int], out_channels: int, num_outs: int, start_level: int = 0, end_level: int = -1, add_extra_convs: bool = False, extra_convs_on_inputs: bool = True, relu_before_extra_convs: bool = False, no_norm_on_lateral: bool = False, conv_cfg: ConfigDict | dict | None = None, norm_cfg: ConfigDict | dict | None = None, activation: str | None = None, init_cfg: ConfigDict | dict | List[ConfigDict | dict] = {'distribution': 'uniform', 'layer': 'Conv2d', 'type': 'Xavier'})[source]

ReFPN.

Parameters:
  • in_channels (List[int]) – Number of input channels per scale.

  • out_channels (int) – Number of output channels (used at each scale)

  • num_outs (int) – Number of output scales.

  • start_level (int) – Index of the start input backbone level used to build the feature pyramid. Defaults to 0.

  • end_level (int) – Index of the end input backbone level (exclusive) to build the feature pyramid. Defaults to -1, which means the last level.

  • add_extra_convs (bool) – It decides whether to add conv layers on top of the original feature maps. Default to False.

  • extra_convs_on_inputs (bool) – It specifies the source feature map of the extra convs is the last feat map of neck inputs.

  • relu_before_extra_convs (bool) – Whether to apply relu before the extra conv. Defaults to False.

  • no_norm_on_lateral (bool) – Whether to apply norm on lateral. Defaults to False.

  • conv_cfg (ConfigDict or dict, optional) – dictionary to construct and config conv layer. Defaults to None

  • norm_cfg (ConfigDict or dict) – dictionary to construct and config norm layer. Defaults to None

  • activation (str, optional) – Activation layer in ConvModule. Defaults to None.

  • init_cfg (ConfigDict or dict or list[ConfigDict or dict], optional) – Initialization config dict. Defaults to None.

forward(inputs: Tuple[Tensor]) Tuple[Tensor][source]

Forward function of ReFPN.

dense_heads

class mmrotate.models.dense_heads.AngleBranchRetinaHead(*args, use_encoded_angle: bool = True, shield_reg_angle: bool = False, use_normalized_angle_feat: bool = False, angle_coder: ConfigDict | dict = {'angle_version': 'le90', 'omega': 1, 'radius': 6, 'type': 'CSLCoder', 'window': 'gaussian'}, loss_angle: ConfigDict | dict = {'loss_weight': 1.0, 'type': 'mmdet.CrossEntropyLoss', 'use_sigmoid': True}, init_cfg: ConfigDict | dict | List[ConfigDict | dict] = {'layer': 'Conv2d', 'override': [{'bias_prob': 0.01, 'name': 'retina_cls', 'std': 0.01, 'type': 'Normal'}, {'bias_prob': 0.01, 'name': 'retina_angle_cls', 'std': 0.01, 'type': 'Normal'}], 'std': 0.01, 'type': 'Normal'}, **kwargs)[source]

Retina head with angle regression branch.

The head contains three subnetworks. The first classifies anchor boxes and the second regresses deltas for the anchors, the third regresses angles.

Parameters:
  • use_encoded_angle (ConfigDict or dict) – Decide whether to use encoded angle or gt angle as target. Defaults to True.

  • shield_reg_angle (ConfigDict or dict) – Decide whether to shield the angle loss from reg branch. Defaults to False.

  • angle_coder (dict) – Config of angle coder.

  • loss_angle (dict) – Config of angle classification loss.

  • init_cfg (ConfigDict or dict or list[ConfigDict or dict]) – Initialization config dict.

forward_single(x: Tensor) Tuple[Tensor, Tensor, Tensor][source]

Forward feature of a single scale level.

Parameters:

x (Tensor) – Features of a single scale level.

Returns:

  • cls_score (Tensor): Cls scores for a single scale level the channels number is num_anchors * num_classes.

  • bbox_pred (Tensor): Box energies / deltas for a single scale level, the channels number is num_anchors * 5.

  • angle_pred (Tensor): Angle for a single scale level the channels number is num_anchors * encode_size.

Return type:

tuple

loss_by_feat(cls_scores: List[Tensor], bbox_preds: List[Tensor], angle_preds: List[Tensor], batch_gt_instances: List[InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: List[InstanceData] | None = None) dict[source]

Calculate the loss based on the features extracted by the detection head.

Parameters:
  • cls_scores (list[Tensor]) – Box scores for each scale level has shape (N, num_anchors * num_classes, H, W).

  • bbox_preds (list[Tensor]) – Box energies / deltas for each scale level with shape (N, num_anchors * 5, H, W).

  • angle_preds (list[Tensor]) – Box angles for each scale level with shape (N, num_anchors * encode_size, H, W).

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

Returns:

A dictionary of loss components.

Return type:

dict

loss_by_feat_single(cls_score: Tensor, bbox_pred: Tensor, angle_pred: Tensor, anchors: Tensor, labels: Tensor, label_weights: Tensor, bbox_targets: Tensor, bbox_weights: Tensor, angle_targets: Tensor, angle_weights: Tensor, avg_factor: int) tuple[source]

Calculate the loss of a single scale level based on the features extracted by the detection head.

Parameters:
  • cls_score (Tensor) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W).

  • bbox_pred (Tensor) – Box energies / deltas for each scale level with shape (N, num_anchors * 5, H, W).

  • angle_pred (Tensor) – Box angles for each scale level with shape (N, num_anchors * encode_size, H, W).

  • anchors (Tensor) – Box reference for each scale level with shape (N, num_total_anchors, 5).

  • labels (Tensor) – Labels of each anchors with shape (N, num_total_anchors).

  • label_weights (Tensor) – Label weights of each anchor with shape (N, num_total_anchors)

  • bbox_targets (Tensor) – BBox regression targets of each anchor weight shape (N, num_total_anchors, 5).

  • bbox_weights (Tensor) – BBox regression loss weights of each anchor with shape (N, num_total_anchors, 5).

  • angle_targets (Tensor) – Angle regression targets of each anchor weight shape (N, num_total_anchors, 1).

  • angle_weights (Tensor) – Angle regression loss weights of each anchor with shape (N, num_total_anchors, 1).

  • avg_factor (int) – Average factor that is used to average the loss.

Returns:

loss components.

Return type:

tuple

predict_by_feat(cls_scores: List[Tensor], bbox_preds: List[Tensor], angle_preds: List[Tensor], score_factors: List[Tensor] | None = None, batch_img_metas: List[dict] | None = None, cfg: ConfigDict | None = None, rescale: bool = False, with_nms: bool = True) List[InstanceData][source]

Transform a batch of output features extracted from the head into bbox results.

Note: When score_factors is not None, the cls_scores are usually multiplied by it then obtain the real score used in NMS, such as CenterNess in FCOS, IoU branch in ATSS.

Parameters:
  • cls_scores (list[Tensor]) – Classification scores for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * num_classes, H, W).

  • bbox_preds (list[Tensor]) – Box energies / deltas for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * 4, H, W).

  • angle_preds (list[Tensor]) – Box angles for each scale level with shape (N, num_anchors * encode_size, H, W)

  • score_factors (list[Tensor], optional) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, num_priors * 1, H, W). Defaults to None.

  • batch_img_metas (list[dict], Optional) – Batch image meta info. Defaults to None.

  • cfg (ConfigDict, optional) – Test / postprocessing configuration, if None, test_cfg would be used. Defaults to None.

  • rescale (bool) – If True, return boxes in original image space. Defaults to False.

  • with_nms (bool) – If True, do nms before return boxes. Defaults to True.

Returns:

Object detection results of each image after the post process. Each item usually contains following keys.

  • scores (Tensor): Classification scores, has a shape (num_instance, )

  • labels (Tensor): Labels of bboxes, has a shape (num_instances, ).

  • bboxes (Tensor): Has a shape (num_instances, 4), the last dimension 4 arrange as (x1, y1, x2, y2).

Return type:

list[InstanceData]

class mmrotate.models.dense_heads.CFAHead(*args, topk: int = 6, anti_factor: float = 0.75, **kwargs)[source]

CFA head.

Parameters:
  • topk (int) – Number of the highest topk points. Defaults to 6.

  • anti_factor (float) – Feature anti-aliasing coefficient. Defaults to 0.75.

get_cfa_targets(proposals_list: List[Tensor], valid_flag_list: List[Tensor], batch_gt_instances: List[InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: List[InstanceData] | None = None, stage: str = 'init', unmap_outputs: bool = True, return_sampling_results: bool = False) tuple[source]

Compute corresponding GT box and classification targets for proposals.

Parameters:
  • proposals_list (list[Tensor]) – Multi level points/bboxes of each image.

  • valid_flag_list (list[Tensor]) – Multi level valid flags of each image.

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

  • stage (str) – ‘init’ or ‘refine’. Generate target for init stage or refine stage. Defaults to ‘init’.

  • unmap_outputs (bool) – Whether to map outputs back to the original set of anchors. Defaults to True.

  • return_sampling_results (bool) – Whether to return the sampling results. Defaults to False.

Returns:

  • all_labels (list[Tensor]): Labels of each level.

  • all_label_weights (list[Tensor]): Label weights of each

level. - all_bbox_gt (list[Tensor]): Ground truth bbox of each level. - all_proposals (list[Tensor]): Proposals(points/bboxes) of each level. - all_proposal_weights (list[Tensor]): Proposal weights of each level. - pos_inds (list[Tensor]): Index of positive samples in all images. - gt_inds (list[Tensor]): Index of ground truth bbox in all images.

Return type:

tuple

get_pos_loss(cls_score: Tensor, pts_pred: Tensor, label: Tensor, bbox_gt: Tensor, label_weight: Tensor, convex_weight: Tensor, pos_inds: Tensor) Tensor[source]

Calculate loss of all potential positive samples obtained from first match process.

Parameters:
  • cls_score (Tensor) – Box scores of single image with shape (num_anchors, num_classes)

  • pts_pred (Tensor) – Box energies / deltas of single image with shape (num_anchors, 4)

  • label (Tensor) – classification target of each anchor with shape (num_anchors,)

  • bbox_gt (Tensor) – Ground truth box.

  • label_weight (Tensor) – Classification loss weight of each anchor with shape (num_anchors).

  • convex_weight (Tensor) – Bbox weight of each anchor with shape (num_anchors, 4).

  • pos_inds (Tensor) – Index of all positive samples got from first assign process.

Returns:

Losses of all positive samples in single image.

Return type:

Tensor

loss_by_feat(cls_scores: List[Tensor], pts_preds_init: List[Tensor], pts_preds_refine: List[Tensor], batch_gt_instances: List[InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: List[InstanceData] | None = None) Dict[str, Tensor][source]

Calculate the loss based on the features extracted by the detection head.

Parameters:
  • cls_scores (list[Tensor]) – Box scores for each scale level, each is a 4D-tensor, of shape (batch_size, num_classes, h, w).

  • pts_preds_init (list[Tensor]) – Points for each scale level, each is a 3D-tensor, of shape (batch_size, h_i * w_i, num_points * 2).

  • pts_preds_refine (list[Tensor]) – Points refined for each scale level, each is a 3D-tensor, of shape (batch_size, h_i * w_i, num_points * 2).

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], Optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

Returns:

A dictionary of loss components.

Return type:

dict[str, Tensor]

loss_by_feat_single(pts_pred_init: Tensor, bbox_gt_init: Tensor, bbox_weights_init: Tensor, stride: int, avg_factor_init: int) Tuple[Tensor][source]

Calculate the loss of a single scale level based on the features extracted by the detection head.

Parameters:
  • pts_pred_init (Tensor) – Points of shape (batch_size, h_i * w_i, num_points * 2).

  • bbox_gt_init (Tensor) – BBox regression targets in the init stage of shape (batch_size, h_i * w_i, 8).

  • bbox_weights_init (Tensor) – BBox regression loss weights in the init stage of shape (batch_size, h_i * w_i, 8).

  • stride (int) – Point stride.

  • avg_factor_init (int) – Average factor that is used to average the loss in the init stage.

Returns:

loss components.

Return type:

Tuple[Tensor]

reassign(pos_losses: Tensor, label: Tensor, label_weight: Tensor, pts_pred_init: Tensor, convex_weight: Tensor, gt_instances: InstanceData, pos_inds: Tensor, pos_gt_inds: Tensor, num_proposals_each_level: List | None = None, num_level: int | None = None) tuple[source]

CFA reassign process.

Parameters:
  • pos_losses (Tensor) – Losses of all positive samples in single image.

  • label (Tensor) – classification target of each anchor with shape (num_anchors,)

  • label_weight (Tensor) – Classification loss weight of each anchor with shape (num_anchors).

  • pts_pred_init (Tensor)

  • convex_weight (Tensor) – Bbox weight of each anchor with shape (num_anchors, 4).

  • gt_instances (InstanceData) – Ground truth of instance annotations. It usually includes bboxes and labels attributes.

  • pos_inds (Tensor) – Index of all positive samples got from first assign process.

  • pos_gt_inds (Tensor) – Gt_index of all positive samples got from first assign process.

  • num_proposals_each_level (list, optional) – Number of proposals of each level.

  • num_level (int, optional) – Number of level.

Returns:

Usually returns a tuple containing learning targets.

  • label (Tensor): classification target of each anchor after

paa assign, with shape (num_anchors,) - label_weight (Tensor): Classification loss weight of each anchor after paa assign, with shape (num_anchors). - convex_weight (Tensor): Bbox weight of each anchor with shape (num_anchors, 4). - pos_normalize_term (list): pos normalize term for refine points losses.

Return type:

tuple

class mmrotate.models.dense_heads.H2RBoxHead(num_classes: int, in_channels: int, angle_version: str = 'le90', use_hbbox_loss: bool = False, scale_angle: bool = True, angle_coder: ConfigDict | dict = {'type': 'PseudoAngleCoder'}, h_bbox_coder: ConfigDict | dict = {'type': 'mmdet.DistancePointBBoxCoder'}, bbox_coder: ConfigDict | dict = {'type': 'DistanceAnglePointCoder'}, loss_cls: ConfigDict | dict = {'alpha': 0.25, 'gamma': 2.0, 'loss_weight': 1.0, 'type': 'mmdet.FocalLoss', 'use_sigmoid': True}, loss_bbox: ConfigDict | dict = {'loss_weight': 1.0, 'type': 'RotatedIoULoss'}, loss_centerness: ConfigDict | dict = {'loss_weight': 1.0, 'type': 'mmdet.CrossEntropyLoss', 'use_sigmoid': True}, loss_angle: ConfigDict | dict | None = None, loss_bbox_ss: ConfigDict | dict = {'loss_weight': 1.0, 'type': 'mmdet.IoULoss'}, rotation_agnostic_classes: list | None = None, weak_supervised: bool = True, square_classes: list | None = None, crop_size: Tuple[int, int] = (768, 768), **kwargs)[source]

Anchor-free head used in H2RBox.

Parameters:
  • num_classes (int) – Number of categories excluding the background category.

  • in_channels (int) – Number of channels in the input feature map.

  • angle_version (str) – Angle representations. Defaults to ‘le90’.

  • use_hbbox_loss (bool) – If true, use horizontal bbox loss and loss_angle should not be None. Defaults to False.

  • scale_angle (bool) – If true, add scale to angle pred branch. Defaults to True.

  • angle_coder (ConfigDict or dict) – Config of angle coder.

  • h_bbox_coder (dict) – Config of horzional bbox coder, only used when use_hbbox_loss is True.

  • bbox_coder (ConfigDict or dict) – Config of bbox coder. Defaults to ‘DistanceAnglePointCoder’.

  • loss_cls (ConfigDict or dict) – Config of classification loss.

  • loss_bbox (ConfigDict or dict) – Config of localization loss.

  • loss_centerness (ConfigDict, or dict) – Config of centerness loss.

  • loss_angle (ConfigDict or dict, Optional) – Config of angle loss.

  • loss_bbox_ss (ConfigDict or dict) – Config of consistency loss.

  • rotation_agnostic_classes (list) – Ids of rotation agnostic category.

  • weak_supervised (bool) – If true, horizontal gtbox is input. Defaults to True.

  • square_classes (list) – Ids of the square category.

  • crop_size (tuple[int]) – Crop size from image center. Defaults to (768, 768).

Example

>>> self = H2RBoxHead(11, 7)
>>> feats = [torch.rand(1, 7, s, s) for s in [4, 8, 16, 32, 64]]
>>> cls_score, bbox_pred, angle_pred, centerness = self.forward(feats)
>>> assert len(cls_score) == len(self.scales)
forward_ss(feats: Tuple[Tensor]) Tuple[List[Tensor], List[Tensor]][source]

Forward features from the upstream network. :param feats: Features from the upstream network, each is

a 4D-tensor.

Returns:

A tuple of each level outputs.

  • bbox_pred (list[Tensor]): Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_points * 4.

  • angle_pred (list[Tensor]): Box angle for each scale level, each is a 4D-tensor, the channel number is num_points * 1.

Return type:

tuple

forward_ss_single(feats: Tensor, scale: Scale, stride: int) Tuple[Tensor, Tensor][source]

Forward features of a single scale level in SS branch.

Parameters:
  • feats (Tensor) – FPN feature maps of the specified stride.

  • ( (scale) – obj: mmcv.cnn.Scale): Learnable scale module to resize the bbox prediction.

  • stride (int) – The corresponding stride for feature maps, only used to normalize the bbox prediction when self.norm_on_bbox is True.

Returns:

bbox predictions and angle predictions of input

feature maps.

Return type:

tuple

loss(x_ws: Tuple[Tensor], x_ss: Tuple[Tensor], rot: float, batch_gt_instances: InstanceData, batch_gt_instances_ignore: InstanceData, batch_img_metas: List[dict]) dict[source]

Perform forward propagation and loss calculation of the detection head on the features of the upstream network.

Parameters:
  • x_ws (tuple[Tensor]) – Features from the weakly supervised network, each is a 4D-tensor.

  • x_ss (tuple[Tensor]) – Features from the self-supervised network, each is a 4D-tensor.

  • rot (float) – Angle of view rotation.

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_gt_instances_ignore (list[batch_gt_instances_ignore]) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

Returns:

A dictionary of loss components.

Return type:

dict

loss_by_feat(cls_scores: List[Tensor], bbox_preds: List[Tensor], angle_preds: List[Tensor], centernesses: List[Tensor], bbox_preds_ss: List[Tensor], angle_preds_ss: List[Tensor], rot: float, batch_gt_instances: List[InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: List[InstanceData] | None = None) Dict[str, Tensor][source]

Calculate the loss based on the features extracted by the detection head.

Parameters:
  • cls_scores (list[Tensor]) – Box scores for each scale level in weakly supervised barch, each is a 4D-tensor, the channel number is num_points * num_classes.

  • bbox_preds (list[Tensor]) – Box energies / deltas for each scale level in weakly supervised barch, each is a 4D-tensor, the channel number is num_points * 4.

  • angle_preds (list[Tensor]) – Box angle for each scale level in weakly supervised barch, each is a 4D-tensor, the channel number is num_points * encode_size.

  • centernesses (list[Tensor]) – centerness for each scale level in weakly supervised barch, each is a 4D-tensor, the channel number is num_points * 1.

  • bbox_preds_ss (list[Tensor]) – Box energies / deltas for each scale level in self-supervised barch, each is a 4D-tensor, the channel number is num_points * 4.

  • angle_preds_ss (list[Tensor]) – Box angle for each scale level in self-supervised barch, each is a 4D-tensor, the channel number is num_points * encode_size.

  • rot (float) – Angle of view rotation.

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], Optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

Returns:

A dictionary of loss components.

Return type:

dict[str, Tensor]

predict_by_feat(cls_scores: List[Tensor], bbox_preds: List[Tensor], angle_preds: List[Tensor], score_factors: List[Tensor] | None = None, batch_img_metas: List[dict] | None = None, cfg: ConfigDict | None = None, rescale: bool = False, with_nms: bool = True) List[InstanceData][source]

Transform a batch of output features extracted from the head into bbox results. Note: When score_factors is not None, the cls_scores are usually multiplied by it then obtain the real score used in NMS, such as CenterNess in FCOS, IoU branch in ATSS. :param cls_scores: Classification scores for all

scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * num_classes, H, W).

Parameters:
  • bbox_preds (list[Tensor]) – Box energies / deltas for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * 4, H, W).

  • angle_preds (list[Tensor]) – Box angle for each scale level with shape (N, num_points * encode_size, H, W)

  • score_factors (list[Tensor], optional) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, num_priors * 1, H, W). Defaults to None.

  • batch_img_metas (list[dict], Optional) – Batch image meta info. Defaults to None.

  • cfg (ConfigDict, optional) – Test / postprocessing configuration, if None, test_cfg would be used. Defaults to None.

  • rescale (bool) – If True, return boxes in original image space. Defaults to False.

  • with_nms (bool) – If True, do nms before return boxes. Defaults to True.

Returns:

Object detection results of each image after the post process. Each item usually contains following keys.

  • scores (Tensor): Classification scores, has a shape (num_instance, )

  • labels (Tensor): Labels of bboxes, has a shape (num_instances, ).

  • bboxes (Tensor): Has a shape (num_instances, 5), the last dimension 5 arrange as (x, y, w, h, t).

Return type:

list[InstanceData]

class mmrotate.models.dense_heads.H2RBoxV2Head(num_classes: int, in_channels: int, angle_version: str = 'le90', use_hbbox_loss: bool = False, scale_angle: bool = False, angle_coder: ConfigDict | dict = {'type': 'PseudoAngleCoder'}, h_bbox_coder: ConfigDict | dict = {'type': 'mmdet.DistancePointBBoxCoder'}, bbox_coder: ConfigDict | dict = {'type': 'DistanceAnglePointCoder'}, loss_cls: ConfigDict | dict = {'alpha': 0.25, 'gamma': 2.0, 'loss_weight': 1.0, 'type': 'mmdet.FocalLoss', 'use_sigmoid': True}, loss_bbox: ConfigDict | dict = {'loss_weight': 1.0, 'type': 'RotatedIoULoss'}, loss_centerness: ConfigDict | dict = {'loss_weight': 1.0, 'type': 'mmdet.CrossEntropyLoss', 'use_sigmoid': True}, loss_angle: ConfigDict | dict | None = None, loss_symmetry_ss: ConfigDict | dict = {'type': 'H2RBoxV2ConsistencyLoss'}, rotation_agnostic_classes: list | None = None, agnostic_resize_classes: list | None = None, use_circumiou_loss=True, use_standalone_angle=True, use_reweighted_loss_bbox=False, **kwargs)[source]

Anchor-free head used in `H2RBox-v2 <https://arxiv.org/abs/2304.04403`_.

Parameters:
  • num_classes (int) – Number of categories excluding the background category.

  • in_channels (int) – Number of channels in the input feature map.

  • angle_version (str) – Angle representations. Defaults to ‘le90’.

  • use_hbbox_loss (bool) – If true, use horizontal bbox loss and loss_angle should not be None. Defaults to False.

  • scale_angle (bool) – If true, add scale to angle pred branch. Defaults to True.

  • angle_coder (ConfigDict or dict) – Config of angle coder.

  • h_bbox_coder (dict) – Config of horzional bbox coder, only used when use_hbbox_loss is True.

  • bbox_coder (ConfigDict or dict) – Config of bbox coder. Defaults to ‘DistanceAnglePointCoder’.

  • loss_cls (ConfigDict or dict) – Config of classification loss.

  • loss_bbox (ConfigDict or dict) – Config of localization loss.

  • loss_centerness (ConfigDict, or dict) – Config of centerness loss.

  • loss_angle (ConfigDict or dict, Optional) – Config of angle loss.

  • loss_bbox_ss (ConfigDict or dict) – Config of consistency loss.

  • rotation_agnostic_classes (list) – Ids of rotation agnostic category.

  • weak_supervised (bool) – If true, horizontal gtbox is input. Defaults to True.

  • square_classes (list) – Ids of the square category.

  • crop_size (tuple[int]) – Crop size from image center. Defaults to (768, 768).

Example

>>> self = H2RBoxHead(11, 7)
>>> feats = [torch.rand(1, 7, s, s) for s in [4, 8, 16, 32, 64]]
>>> cls_score, bbox_pred, angle_pred, centerness = self.forward(feats)
>>> assert len(cls_score) == len(self.scales)
get_targets(points: List[Tensor], batch_gt_instances: List[InstanceData]) Tuple[List[Tensor], List[Tensor], List[Tensor]][source]

Compute regression, classification and centerness targets for points in multiple images. :param points: Points of each fpn level, each has shape

(num_points, 2).

Parameters:

batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

Returns:

Targets of each level. - concat_lvl_labels (list[Tensor]): Labels of each level. - concat_lvl_bbox_targets (list[Tensor]): BBox targets of each level. - concat_lvl_angle_targets (list[Tensor]): Angle targets of each level.

Return type:

tuple

loss_by_feat(cls_scores: List[Tensor], bbox_preds: List[Tensor], angle_preds: List[Tensor], centernesses: List[Tensor], batch_gt_instances: List[InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: List[InstanceData] | None = None) Dict[str, Tensor][source]

Calculate the loss based on the features extracted by the detection head.

Parameters:
  • cls_scores (list[Tensor]) – Box scores for each scale level in weakly supervised barch, each is a 4D-tensor, the channel number is num_points * num_classes.

  • bbox_preds (list[Tensor]) – Box energies / deltas for each scale level in weakly supervised barch, each is a 4D-tensor, the channel number is num_points * 4.

  • angle_preds (list[Tensor]) – Box angle for each scale level in weakly supervised barch, each is a 4D-tensor, the channel number is num_points * encode_size.

  • centernesses (list[Tensor]) – centerness for each scale level in weakly supervised barch, each is a 4D-tensor, the channel number is num_points * 1.

  • bbox_preds_ss (list[Tensor]) – Box energies / deltas for each scale level in self-supervised barch, each is a 4D-tensor, the channel number is num_points * 4.

  • angle_preds_ss (list[Tensor]) – Box angle for each scale level in self-supervised barch, each is a 4D-tensor, the channel number is num_points * encode_size.

  • rot (float) – Angle of view rotation.

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], Optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

Returns:

A dictionary of loss components.

Return type:

dict[str, Tensor]

class mmrotate.models.dense_heads.OrientedRPNHead(in_channels: int, num_classes: int = 1, init_cfg: ConfigDict | dict | List[ConfigDict | dict] = {'layer': 'Conv2d', 'std': 0.01, 'type': 'Normal'}, num_convs: int = 1, **kwargs)[source]

Oriented RPN head for Oriented R-CNN.

class mmrotate.models.dense_heads.OrientedRepPointsHead(*args, loss_spatial_init: ConfigDict | dict = {'loss_weight': 0.05, 'type': 'SpatialBorderLoss'}, loss_spatial_refine: ConfigDict | dict = {'loss_weight': 0.1, 'type': 'SpatialBorderLoss'}, top_ratio: float = 0.4, init_qua_weight: float = 0.2, ori_qua_weight: float = 0.3, poc_qua_weight: float = 0.1, **kwargs)[source]

Oriented RepPoints head -<https://arxiv.org/pdf/2105.11111v4.pdf>. The head contains initial and refined stages based on RepPoints. The initial stage regresses coarse point sets, and the refine stage further regresses the fine point sets. The APAA scheme based on the quality of point set samples in the paper is employed in refined stage.

Parameters:
  • loss_spatial_init (ConfigDict or dict) – Config of initial spatial loss.

  • loss_spatial_refine (ConfigDict or dict) – Config of refine spatial loss.

  • top_ratio (float) – Ratio of top high-quality point sets. Defaults to 0.4.

  • init_qua_weight (float) – Quality weight of initial stage. Defaults to 0.2.

  • ori_qua_weight (float) – Orientation quality weight. Defaults to 0.3.

  • poc_qua_weight (float) – Point-wise correlation quality weight. Defaults to 0.1.

dynamic_pointset_samples_selection(quality: Tensor, label: Tensor, label_weight: Tensor, bbox_weight: Tensor, pos_inds: Tensor, pos_gt_inds: Tensor, num_proposals_each_level: List[int] | None = None, num_level: int | None = None) tuple[source]

The dynamic top k selection of point set samples based on the quality assessment values.

Parameters:
  • quality (Tensor) – the quality values of positive point set samples

  • label (Tensor) – gt label with shape (N)

  • label_weight (Tensor) – label weight with shape (N)

  • bbox_weight (Tensor) – box weight with shape (N)

  • pos_inds (Tensor) – the inds of positive point set samples

  • pos_gt_inds (Tensor) – the inds of positive ground truth

  • num_proposals_each_level (list[int]) – proposals number of each level

  • num_level (int) – the level number

Returns:

  • label: gt label with shape (N)

  • label_weight: label weight with shape (N)

  • bbox_weight: box weight with shape (N)

  • num_pos (int): the number of selected positive point samples with high-quality

  • pos_normalize_term (Tensor): the corresponding positive normalize term

Return type:

tuple

feature_cosine_similarity(points_features: Tensor) Tensor[source]

Compute the points features similarity for points-wise correlation.

Parameters:

points_features (Tensor) – sampling point feature with shape (N_pointsets, N_points, C)

Returns:

max feature similarity in each point set with shape (N_points_set, N_points, C)

Return type:

max_correlation (Tensor)

forward_single(x: Tensor) Tuple[Tensor][source]

Forward feature map of a single FPN level.

get_adaptive_points_feature(features: Tensor, pt_locations: Tensor, stride: int) Tensor[source]

Get the points features from the locations of predicted points.

Parameters:
  • features (Tensor) – base feature with shape (B,C,W,H)

  • pt_locations (Tensor) – locations of points in each point set with shape (B, N_points_set(number of point set), N_points(number of points in each point set) *2)

  • stride (int) – points strdie

Returns:

sampling features with (B, C, N_points_set, N_points)

Return type:

Tensor

get_targets(proposals_list: List[Tensor], valid_flag_list: List[Tensor], batch_gt_instances: List[InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: List[InstanceData] | None = None, stage: str = 'init', unmap_outputs: bool = True) tuple[source]

Compute corresponding GT box and classification targets for proposals.

Parameters:
  • proposals_list (list[Tensor]) – Multi level points/bboxes of each image.

  • valid_flag_list (list[Tensor]) – Multi level valid flags of each image.

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

  • stage (str) – ‘init’ or ‘refine’. Generate target for init stage or refine stage.

  • unmap_outputs (bool) – Whether to map outputs back to the original set of anchors.

Returns:

  • labels_list (list[Tensor]): Labels of each level.

  • label_weights_list (list[Tensor]): Label weights of each level.

  • bbox_gt_list (list[Tensor]): Ground truth bbox of each level.

  • proposals_list (list[Tensor]): Proposals(points/bboxes) of each level.

  • proposal_weights_list (list[Tensor]): Proposal weights of each level.

  • avg_factor (int): Average factor that is used to average the loss. When using sampling method, avg_factor is usually the sum of positive and negative priors. When using PseudoSampler, avg_factor is usually equal to the number of positive priors.

Return type:

tuple

init_loss_single(pts_pred_init: Tensor, bbox_gt_init: Tensor, bbox_weights_init: Tensor, stride: int) Tuple[Tensor, Tensor][source]

Single initial stage loss function.

Parameters:
  • pts_pred_init (Tensor) – Initial point sets prediction with shape (N, 9*2)

  • bbox_gt_init (Tensor) – BBox regression targets in the init stage of shape (batch_size, h_i * w_i, 8).

  • bbox_weights_init (Tensor) – BBox regression loss weights in the init stage of shape (batch_size, h_i * w_i, 8).

  • stride (int) – Point stride.

Returns:

  • loss_pts_init (Tensor): Initial bbox loss.

  • loss_border_init (Tensor): Initial spatial border loss.

Return type:

tuple

loss_by_feat(cls_scores: List[Tensor], pts_preds_init: List[Tensor], pts_preds_refine: List[Tensor], base_feat: List[Tensor], batch_gt_instances: List[InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: List[InstanceData] | None = None) Dict[str, Tensor][source]

Calculate the loss based on the features extracted by the detection head.

Parameters:
  • cls_scores (list[Tensor]) – Box scores for each scale level, each is a 4D-tensor, of shape (batch_size, num_classes, h, w).

  • pts_preds_init (list[Tensor]) – Points for each scale level, each is a 3D-tensor, of shape (batch_size, h_i * w_i, num_points * 2).

  • pts_preds_refine (list[Tensor]) – Points refined for each scale level, each is a 3D-tensor, of shape (batch_size, h_i * w_i, num_points * 2).

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], Optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

Returns:

A dictionary of loss components.

Return type:

dict[str, Tensor]

loss_by_feat_single(cls_score: Tensor, pts_pred_init: Tensor, pts_pred_refine: Tensor, labels: Tensor, label_weights, bbox_gt_init: Tensor, bbox_weights_init: Tensor, bbox_gt_refine: Tensor, bbox_weights_refine: Tensor, stride: int, avg_factor_init: int, avg_factor_refine: int) Tuple[Tensor][source]

Calculate the loss of a single scale level based on the features extracted by the detection head.

Parameters:
  • cls_score (Tensor) – Box scores for each scale level Has shape (N, num_classes, h_i, w_i).

  • pts_pred_init (Tensor) – Points of shape (batch_size, h_i * w_i, num_points * 2).

  • pts_pred_refine (Tensor) – Points refined of shape (batch_size, h_i * w_i, num_points * 2).

  • labels (Tensor) – Ground truth class indices with shape (batch_size, h_i * w_i).

  • label_weights (Tensor) – Label weights of shape (batch_size, h_i * w_i).

  • bbox_gt_init (Tensor) – BBox regression targets in the init stage of shape (batch_size, h_i * w_i, 8).

  • bbox_weights_init (Tensor) – BBox regression loss weights in the init stage of shape (batch_size, h_i * w_i, 8).

  • bbox_gt_refine (Tensor) – BBox regression targets in the refine stage of shape (batch_size, h_i * w_i, 8).

  • bbox_weights_refine (Tensor) – BBox regression loss weights in the refine stage of shape (batch_size, h_i * w_i, 8).

  • stride (int) – Point stride.

  • avg_factor_init (int) – Average factor that is used to average the loss in the init stage.

  • avg_factor_refine (int) – Average factor that is used to average the loss in the refine stage.

Returns:

loss components.

Return type:

Tuple[Tensor]

pointsets_quality_assessment(pts_features: Tensor, cls_score: Tensor, pts_pred_init: Tensor, pts_pred_refine: Tensor, label: Tensor, bbox_gt: Tensor, label_weight: Tensor, bbox_weight: Tensor, pos_inds: Tensor) Tensor[source]

Assess the quality of each point set from the classification, localization, orientation, and point-wise correlation based on the assigned point sets samples.

Parameters:
  • pts_features (Tensor) – points features with shape (N, 9, C)

  • cls_score (Tensor) – classification scores with shape (N, class_num)

  • pts_pred_init (Tensor) – initial point sets prediction with shape (N, 9*2)

  • pts_pred_refine (Tensor) – refined point sets prediction with shape (N, 9*2)

  • label (Tensor) – gt label with shape (N)

  • bbox_gt (Tensor) – gt bbox of polygon with shape (N, 8)

  • label_weight (Tensor) – label weight with shape (N)

  • bbox_weight (Tensor) – box weight with shape (N)

  • pos_inds (Tensor) – the inds of positive point set samples

Returns:

weighted quality values for positive point set samples.

Return type:

qua (Tensor)

sampling_points(polygons: Tensor, points_num: int, device: str) Tensor[source]

Sample edge points for polygon.

Parameters:
  • polygons (Tensor) – polygons with shape (N, 8)

  • points_num (int) – number of sampling points for each polygon edge. 10 by default.

  • device (str) – The device the tensor will be put on. Defaults to cuda.

Returns:

sampling points with shape (N, points_num*4, 2)

Return type:

sampling_points (Tensor)

class mmrotate.models.dense_heads.R3Head(*args, loss_bbox_type: str = 'normal', **kwargs)[source]

An anchor-based head used in R3Det.

filter_bboxes(cls_scores: List[Tensor], bbox_preds: List[Tensor]) List[List[Tensor]][source]

Filter predicted bounding boxes at each position of the feature maps. Only one bounding boxes with highest score will be left at each position. This filter will be used in R3Det prior to the first feature refinement stage.

Parameters:
  • cls_scores (list[Tensor]) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W)

  • bbox_preds (list[Tensor]) – Box energies / deltas for each scale level with shape (N, num_anchors * 5, H, W)

Returns:

best or refined rbboxes of each level of each image.

Return type:

list[list[Tensor]]

class mmrotate.models.dense_heads.R3RefineHead(num_classes: int, in_channels: int, frm_cfg: dict | None = None, **kwargs)[source]

An anchor-based head used in R3Det.

Parameters:
  • num_classes (int) – Number of categories excluding the background category.

  • in_channels (int) – Number of channels in the input feature map.

  • frm_cfg (dict) – Config of the feature refine module.

feature_refine(x: List[Tensor], rois: List[List[Tensor]]) List[Tensor][source]

Refine the input feature use feature refine module.

Parameters:
  • x (list[Tensor]) – feature maps of multiple scales.

  • rois (list[list[Tensor]]) – input rbboxes of multiple scales of multiple images, output by former stages and are to be refined.

Returns:

refined feature maps of multiple scales.

Return type:

list[Tensor]

get_anchors(featmap_sizes: List[tuple], batch_img_metas: List[dict], device: device | str = 'cuda') Tuple[List[List[Tensor]], List[List[Tensor]]][source]

Get anchors according to feature map sizes.

Parameters:
  • featmap_sizes (list[tuple]) – Multi-level feature map sizes.

  • batch_img_metas (list[dict]) – Image meta info.

  • device (torch.device | str) – Device for returned tensors. Defaults to cuda.

Returns:

  • anchor_list (list[list[Tensor]]): Anchors of each image.

  • valid_flag_list (list[list[Tensor]]): Valid flags of each image.

Return type:

tuple

loss_by_feat(cls_scores: List[Tensor], bbox_preds: List[Tensor], batch_gt_instances: List[InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: List[InstanceData] | None = None, rois: List[Tensor] | None = None) dict[source]

Calculate the loss based on the features extracted by the detection head.

Parameters:
  • cls_scores (list[Tensor]) – Box scores for each scale level has shape (N, num_anchors * num_classes, H, W).

  • bbox_preds (list[Tensor]) – Box energies / deltas for each scale level with shape (N, num_anchors * 4, H, W).

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

  • rois (list[Tensor])

Returns:

A dictionary of loss components.

Return type:

dict

predict_by_feat(cls_scores: List[Tensor], bbox_preds: List[Tensor], score_factors: List[Tensor] | None = None, rois: List[Tensor] | None = None, batch_img_metas: List[dict] | None = None, cfg: ConfigDict | None = None, rescale: bool = False, with_nms: bool = True) List[InstanceData][source]

Transform a batch of output features extracted from the head into bbox results.

Note: When score_factors is not None, the cls_scores are usually multiplied by it then obtain the real score used in NMS, such as CenterNess in FCOS, IoU branch in ATSS.

Parameters:
  • cls_scores (list[Tensor]) – Classification scores for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * num_classes, H, W).

  • bbox_preds (list[Tensor]) – Box energies / deltas for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * 4, H, W).

  • score_factors (list[Tensor], optional) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, num_priors * 1, H, W). Defaults to None.

  • rois (list[Tensor])

  • batch_img_metas (list[dict], Optional) – Batch image meta info. Defaults to None.

  • cfg (ConfigDict, optional) – Test / postprocessing configuration, if None, test_cfg would be used. Defaults to None.

  • rescale (bool) – If True, return boxes in original image space. Defaults to False.

  • with_nms (bool) – If True, do nms before return boxes. Defaults to True.

Returns:

Object detection results of each image after the post process. Each item usually contains following keys.

  • scores (Tensor): Classification scores, has a shape (num_instance, )

  • labels (Tensor): Labels of bboxes, has a shape (num_instances, ).

  • bboxes (Tensor): Has a shape (num_instances, 4), the last dimension 4 arrange as (x1, y1, x2, y2).

Return type:

list[InstanceData]

refine_bboxes(cls_scores: List[Tensor], bbox_preds: List[Tensor], rois: List[List[Tensor]]) List[List[Tensor]][source]

Refine predicted bounding boxes at each position of the feature maps. This method will be used in R3Det in refinement stages.

Parameters:
  • cls_scores (list[Tensor]) – Box scores for each scale level Has shape (N, num_classes, H, W)

  • bbox_preds (list[Tensor]) – Box energies / deltas for each scale level with shape (N, 5, H, W)

  • rois (list[list[Tensor]]) – input rbboxes of each level of each image. rois output by former stages and are to be refined

Returns:

best or refined rbboxes of each level of each image.

Return type:

list[list[Tensor]]

class mmrotate.models.dense_heads.RotatedATSSHead(num_classes: int, in_channels: int, pred_kernel_size: int = 3, stacked_convs: int = 4, conv_cfg: ConfigDict | dict | None = None, norm_cfg: ConfigDict | dict = {'num_groups': 32, 'requires_grad': True, 'type': 'GN'}, reg_decoded_bbox: bool = True, loss_centerness: ConfigDict | dict = {'loss_weight': 1.0, 'type': 'CrossEntropyLoss', 'use_sigmoid': True}, init_cfg: ConfigDict | dict | List[ConfigDict | dict] = {'layer': 'Conv2d', 'override': {'bias_prob': 0.01, 'name': 'atss_cls', 'std': 0.01, 'type': 'Normal'}, 'std': 0.01, 'type': 'Normal'}, **kwargs)[source]

Detection Head of ATSS.

ATSS head structure is similar with FCOS, however ATSS use anchor boxes and assign label by Adaptive Training Sample Selection instead max-iou. :param num_classes: Number of categories excluding the background

category.

Parameters:
  • in_channels (int) – Number of channels in the input feature map.

  • pred_kernel_size (int) – Kernel size of nn.Conv2d

  • stacked_convs (int) – Number of stacking convs of the head.

  • conv_cfg (ConfigDict or dict, optional) – Config dict for convolution layer. Defaults to None.

  • norm_cfg (ConfigDict or dict) – Config dict for normalization layer. Defaults to dict(type='GN', num_groups=32, requires_grad=True).

  • reg_decoded_bbox (bool) – If true, the regression loss would be applied directly on decoded bounding boxes, converting both the predicted boxes and regression targets to absolute coordinates format. Defaults to False. It should be True when using IoULoss, GIoULoss, or DIoULoss in the bbox head.

  • loss_centerness (ConfigDict or dict) – Config of centerness loss. Defaults to dict(type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0).

:param init_cfg (ConfigDict or dict or list[dict] or: list[ConfigDict]): Initialization config dict.

centerness_target(anchors: Tensor, gts: Tensor) Tensor[source]

Calculate the centerness between anchors and gts.

Only calculate pos centerness targets, otherwise there may be nan.

Parameters:
  • anchors (Tensor) – Anchors with shape (N, 5), <cx, cy, w, h, t> format.

  • gts (Tensor) – Ground truth bboxes with shape (N, 5), <cx, cy, w, h, t> format.

Returns:

Centerness between anchors and gts.

Return type:

Tensor

get_targets(anchor_list: List[List[Tensor]], valid_flag_list: List[List[Tensor]], batch_gt_instances: List[InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: List[InstanceData] | None = None, unmap_outputs: bool = True) tuple[source]

Get targets for ATSS head.

This method is almost the same as AnchorHead.get_targets(). Besides returning the targets as the parent method does, it also returns the anchors as the first element of the returned tuple.

loss_by_feat_single(anchors: Tensor, cls_score: Tensor, bbox_pred: Tensor, centerness: Tensor, labels: Tensor, label_weights: Tensor, bbox_targets: Tensor, avg_factor: float) dict[source]

Calculate the loss of a single scale level based on the features extracted by the detection head. :param cls_score: Box scores for each scale level

Has shape (N, num_anchors * num_classes, H, W).

Parameters:
  • bbox_pred (Tensor) – Box energies / deltas for each scale level with shape (N, num_anchors * 4, H, W).

  • anchors (Tensor) – Box reference for each scale level with shape (N, num_total_anchors, 4).

  • labels (Tensor) – Labels of each anchors with shape (N, num_total_anchors).

  • label_weights (Tensor) – Label weights of each anchor with shape (N, num_total_anchors)

  • bbox_targets (Tensor) – BBox regression targets of each anchor weight shape (N, num_total_anchors, 4).

  • avg_factor (float) – Average factor that is used to average the loss. When using sampling method, avg_factor is usually the sum of positive and negative priors. When using PseudoSampler, avg_factor is usually equal to the number of positive priors.

Returns:

A dictionary of loss components.

Return type:

dict[str, Tensor]

class mmrotate.models.dense_heads.RotatedFCOSHead(num_classes: int, in_channels: int, angle_version: str = 'le90', use_hbbox_loss: bool = False, scale_angle: bool = True, angle_coder: ConfigDict | dict = {'type': 'PseudoAngleCoder'}, h_bbox_coder: ConfigDict | dict = {'type': 'mmdet.DistancePointBBoxCoder'}, bbox_coder: ConfigDict | dict = {'type': 'DistanceAnglePointCoder'}, loss_cls: ConfigDict | dict = {'alpha': 0.25, 'gamma': 2.0, 'loss_weight': 1.0, 'type': 'mmdet.FocalLoss', 'use_sigmoid': True}, loss_bbox: ConfigDict | dict = {'loss_weight': 1.0, 'type': 'RotatedIoULoss'}, loss_centerness: ConfigDict | dict = {'loss_weight': 1.0, 'type': 'mmdet.CrossEntropyLoss', 'use_sigmoid': True}, loss_angle: ConfigDict | dict | None = None, **kwargs)[source]

Anchor-free head used in FCOS.

Compared with FCOS head, Rotated FCOS head add a angle branch to support rotated object detection.

Parameters:
  • num_classes (int) – Number of categories excluding the background category.

  • in_channels (int) – Number of channels in the input feature map.

  • angle_version (str) – Angle representations. Defaults to ‘le90’.

  • use_hbbox_loss (bool) – If true, use horizontal bbox loss and loss_angle should not be None. Default to False.

  • scale_angle (bool) – If true, add scale to angle pred branch. Default to True.

  • angle_coder (ConfigDict or dict) – Config of angle coder.

  • h_bbox_coder (dict) – Config of horzional bbox coder, only used when use_hbbox_loss is True.

  • bbox_coder (ConfigDict or dict) – Config of bbox coder. Defaults to ‘DistanceAnglePointCoder’.

  • loss_cls (ConfigDict or dict) – Config of classification loss.

  • loss_bbox (ConfigDict or dict) – Config of localization loss.

  • loss_centerness (ConfigDict, or dict) – Config of centerness loss.

  • loss_angle (ConfigDict or dict, Optional) – Config of angle loss.

Example

>>> self = RotatedFCOSHead(11, 7)
>>> feats = [torch.rand(1, 7, s, s) for s in [4, 8, 16, 32, 64]]
>>> cls_score, bbox_pred, angle_pred, centerness = self.forward(feats)
>>> assert len(cls_score) == len(self.scales)
forward_single(x: Tensor, scale: Scale, stride: int) Tuple[Tensor, Tensor, Tensor, Tensor][source]

Forward features of a single scale level.

Parameters:
  • x (Tensor) – FPN feature maps of the specified stride.

  • ( (scale) – obj: mmcv.cnn.Scale): Learnable scale module to resize the bbox prediction.

  • stride (int) – The corresponding stride for feature maps, only used to normalize the bbox prediction when self.norm_on_bbox is True.

Returns:

scores for each class, bbox predictions, angle predictions and centerness predictions of input feature maps.

Return type:

tuple

get_targets(points: List[Tensor], batch_gt_instances: List[InstanceData]) Tuple[List[Tensor], List[Tensor], List[Tensor]][source]

Compute regression, classification and centerness targets for points in multiple images. :param points: Points of each fpn level, each has shape

(num_points, 2).

Parameters:

batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

Returns:

Targets of each level. - concat_lvl_labels (list[Tensor]): Labels of each level. - concat_lvl_bbox_targets (list[Tensor]): BBox targets of each level. - concat_lvl_angle_targets (list[Tensor]): Angle targets of each level.

Return type:

tuple

loss_by_feat(cls_scores: List[Tensor], bbox_preds: List[Tensor], angle_preds: List[Tensor], centernesses: List[Tensor], batch_gt_instances: List[InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: List[InstanceData] | None = None) Dict[str, Tensor][source]

Calculate the loss based on the features extracted by the detection head.

Parameters:
  • cls_scores (list[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_points * num_classes.

  • bbox_preds (list[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_points * 4.

  • angle_preds (list[Tensor]) – Box angle for each scale level, each is a 4D-tensor, the channel number is num_points * encode_size.

  • centernesses (list[Tensor]) – centerness for each scale level, each is a 4D-tensor, the channel number is num_points * 1.

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], Optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

Returns:

A dictionary of loss components.

Return type:

dict[str, Tensor]

predict_by_feat(cls_scores: List[Tensor], bbox_preds: List[Tensor], angle_preds: List[Tensor], score_factors: List[Tensor] | None = None, batch_img_metas: List[dict] | None = None, cfg: ConfigDict | None = None, rescale: bool = False, with_nms: bool = True) List[InstanceData][source]

Transform a batch of output features extracted from the head into bbox results. Note: When score_factors is not None, the cls_scores are usually multiplied by it then obtain the real score used in NMS, such as CenterNess in FCOS, IoU branch in ATSS. :param cls_scores: Classification scores for all

scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * num_classes, H, W).

Parameters:
  • bbox_preds (list[Tensor]) – Box energies / deltas for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * 4, H, W).

  • angle_preds (list[Tensor]) – Box angle for each scale level with shape (N, num_points * encode_size, H, W)

  • score_factors (list[Tensor], optional) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, num_priors * 1, H, W). Defaults to None.

  • batch_img_metas (list[dict], Optional) – Batch image meta info. Defaults to None.

  • cfg (ConfigDict, optional) – Test / postprocessing configuration, if None, test_cfg would be used. Defaults to None.

  • rescale (bool) – If True, return boxes in original image space. Defaults to False.

  • with_nms (bool) – If True, do nms before return boxes. Defaults to True.

Returns:

Object detection results of each image after the post process. Each item usually contains following keys.

  • scores (Tensor): Classification scores, has a shape (num_instance, )

  • labels (Tensor): Labels of bboxes, has a shape (num_instances, ).

  • bboxes (Tensor): Has a shape (num_instances, 5), the last dimension 5 arrange as (x, y, w, h, t).

Return type:

list[InstanceData]

class mmrotate.models.dense_heads.RotatedRTMDetHead(num_classes: int, in_channels: int, angle_version: str = 'le90', use_hbbox_loss: bool = False, scale_angle: bool = True, angle_coder: ConfigDict | dict = {'type': 'PseudoAngleCoder'}, loss_angle: ConfigDict | dict | None = None, **kwargs)[source]

Detection Head of Rotated RTMDet.

Parameters:
  • num_classes (int) – Number of categories excluding the background category.

  • in_channels (int) – Number of channels in the input feature map.

  • angle_version (str) – Angle representations. Defaults to ‘le90’.

  • use_hbbox_loss (bool) – If true, use horizontal bbox loss and loss_angle should not be None. Default to False.

  • scale_angle (bool) – If true, add scale to angle pred branch. Default to True.

  • angle_coder (ConfigDict or dict) – Config of angle coder.

  • loss_angle (ConfigDict or dict, Optional) – Config of angle loss.

forward(feats: Tuple[Tensor, ...]) tuple[source]

Forward features from the upstream network.

Parameters:

feats (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.

Returns:

Usually a tuple of classification scores and bbox prediction - cls_scores (list[Tensor]): Classification scores for all scale

levels, each is a 4D-tensor, the channels number is num_base_priors * num_classes.

  • bbox_preds (list[Tensor]): Box energies / deltas for all scale levels, each is a 4D-tensor, the channels number is num_base_priors * 4.

  • angle_preds (list[Tensor]): Angle prediction for all scale levels, each is a 4D-tensor, the channels number is num_base_priors * angle_dim.

Return type:

tuple

init_weights() None[source]

Initialize weights of the head.

loss_by_feat(cls_scores: List[Tensor], bbox_preds: List[Tensor], angle_preds: List[Tensor], batch_gt_instances: List[InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: List[InstanceData] | None = None)[source]

Compute losses of the head.

Parameters:
  • cls_scores (list[Tensor]) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W).

  • bbox_preds (list[Tensor]) – Box predict for each scale level with shape (N, num_anchors * 4, H, W) in [t, b, l, r] format.

  • bbox_preds – Angle pred for each scale level with shape (N, num_anchors * angle_dim, H, W).

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], Optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

Returns:

A dictionary of loss components.

Return type:

dict[str, Tensor]

loss_by_feat_single(cls_score: Tensor, bbox_pred: Tensor, angle_pred: Tensor, labels: Tensor, label_weights: Tensor, bbox_targets: Tensor, assign_metrics: Tensor, stride: List[int])[source]

Compute loss of a single scale level.

Parameters:
  • cls_score (Tensor) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W).

  • bbox_pred (Tensor) – Decoded bboxes for each scale level with shape (N, num_anchors * 5, H, W) for rbox loss or (N, num_anchors * 4, H, W) for hbox loss.

  • angle_pred (Tensor) – Decoded bboxes for each scale level with shape (N, num_anchors * angle_dim, H, W).

  • labels (Tensor) – Labels of each anchors with shape (N, num_total_anchors).

  • label_weights (Tensor) – Label weights of each anchor with shape (N, num_total_anchors).

  • bbox_targets (Tensor) – BBox regression targets of each anchor with shape (N, num_total_anchors, 4).

  • assign_metrics (Tensor) – Assign metrics with shape (N, num_total_anchors).

  • stride (List[int]) – Downsample stride of the feature map.

Returns:

A dictionary of loss components.

Return type:

dict[str, Tensor]

predict_by_feat(cls_scores: List[Tensor], bbox_preds: List[Tensor], angle_preds: List[Tensor], score_factors: List[Tensor] | None = None, batch_img_metas: List[dict] | None = None, cfg: ConfigDict | None = None, rescale: bool = False, with_nms: bool = True) List[InstanceData][source]

Transform a batch of output features extracted from the head into bbox results. Note: When score_factors is not None, the cls_scores are usually multiplied by it then obtain the real score used in NMS, such as CenterNess in FCOS, IoU branch in ATSS. :param cls_scores: Classification scores for all

scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * num_classes, H, W).

Parameters:
  • bbox_preds (list[Tensor]) – Box energies / deltas for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * 4, H, W).

  • angle_preds (list[Tensor]) – Box angle for each scale level with shape (N, num_points * angle_dim, H, W)

  • score_factors (list[Tensor], optional) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, num_priors * 1, H, W). Defaults to None.

  • batch_img_metas (list[dict], Optional) – Batch image meta info. Defaults to None.

  • cfg (ConfigDict, optional) – Test / postprocessing configuration, if None, test_cfg would be used. Defaults to None.

  • rescale (bool) – If True, return boxes in original image space. Defaults to False.

  • with_nms (bool) – If True, do nms before return boxes. Defaults to True.

Returns:

Object detection results of each image after the post process. Each item usually contains following keys.

  • scores (Tensor): Classification scores, has a shape (num_instance, )

  • labels (Tensor): Labels of bboxes, has a shape (num_instances, ).

  • bboxes (Tensor): Has a shape (num_instances, 5), the last dimension 5 arrange as (x, y, w, h, t).

Return type:

list[InstanceData]

class mmrotate.models.dense_heads.RotatedRTMDetSepBNHead(num_classes: int, in_channels: int, share_conv: bool = True, scale_angle: bool = False, norm_cfg: ConfigDict | dict = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: ConfigDict | dict = {'type': 'SiLU'}, pred_kernel_size: int = 1, exp_on_reg: bool = False, **kwargs)[source]

Rotated RTMDetHead with separated BN layers and shared conv layers.

Parameters:
  • num_classes (int) – Number of categories excluding the background category.

  • in_channels (int) – Number of channels in the input feature map.

  • share_conv (bool) – Whether to share conv layers between stages. Defaults to True.

  • scale_angle (bool) – Does not support in RotatedRTMDetSepBNHead, Defaults to False.

  • norm_cfg (ConfigDict or dict)) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (ConfigDict or dict)) – Config dict for activation layer. Defaults to dict(type=’SiLU’).

  • pred_kernel_size (int) – Kernel size of prediction layer. Defaults to 1.

  • exp_on_reg (bool) – Whether to apply exponential on bbox_pred. Defaults to False.

forward(feats: Tuple[Tensor, ...]) tuple[source]

Forward features from the upstream network.

Parameters:

feats (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.

Returns:

Usually a tuple of classification scores and bbox prediction - cls_scores (list[Tensor]): Classification scores for all scale

levels, each is a 4D-tensor, the channels number is num_base_priors * num_classes.

  • bbox_preds (list[Tensor]): Box energies / deltas for all scale levels, each is a 4D-tensor, the channels number is num_base_priors * 4.

  • angle_preds (list[Tensor]): Angle prediction for all scale levels, each is a 4D-tensor, the channels number is num_base_priors * angle_dim.

Return type:

tuple

init_weights() None[source]

Initialize weights of the head.

class mmrotate.models.dense_heads.RotatedRepPointsHead(*args, **kwargs)[source]

RotatedRepPoint head.

Parameters:
  • num_classes (int) – Number of categories excluding the background category.

  • in_channels (int) – Number of channels in the input feature map.

  • point_feat_channels (int) – Number of channels of points features.

  • num_points (int) – Number of points.

  • gradient_mul (float) – The multiplier to gradients from points refinement and recognition.

  • point_strides (Sequence[int]) – points strides.

  • point_base_scale (int) – bbox scale for assigning labels.

  • loss_cls (ConfigDict or dict) – Config of classification loss.

  • loss_bbox_init (ConfigDict or dict) – Config of initial points loss.

  • loss_bbox_refine (ConfigDict or dict) – Config of points loss in refinement.

  • transform_method (str) – The methods to transform RepPoints to qbbox, which cannot be ‘moment’ in here.

  • init_cfg (ConfigDict or dict or list[ConfigDict or dict]) – Initialization config dict.

forward_single(x: Tensor) Tuple[Tensor][source]

Forward feature map of a single FPN level.

loss_by_feat(cls_scores: List[Tensor], pts_preds_init: List[Tensor], pts_preds_refine: List[Tensor], batch_gt_instances: List[InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: List[InstanceData] | None = None) Dict[str, Tensor][source]

Calculate the loss based on the features extracted by the detection head. :param cls_scores: Box scores for each scale level,

each is a 4D-tensor, of shape (batch_size, num_classes, h, w).

Parameters:
  • pts_preds_init (list[Tensor]) – Points for each scale level, each is a 3D-tensor, of shape (batch_size, h_i * w_i, num_points * 2).

  • pts_preds_refine (list[Tensor]) – Points refined for each scale level, each is a 3D-tensor, of shape (batch_size, h_i * w_i, num_points * 2).

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], Optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

Returns:

A dictionary of loss components.

Return type:

dict[str, Tensor]

loss_by_feat_single(cls_score: Tensor, pts_pred_init: Tensor, pts_pred_refine: Tensor, labels: Tensor, label_weights, bbox_gt_init: Tensor, bbox_weights_init: Tensor, bbox_gt_refine: Tensor, bbox_weights_refine: Tensor, stride: int, avg_factor_init: int, avg_factor_refine: int) Tuple[Tensor][source]

Calculate the loss of a single scale level based on the features extracted by the detection head. :param cls_score: Box scores for each scale level

Has shape (N, num_classes, h_i, w_i).

Parameters:
  • pts_pred_init (Tensor) – Points of shape (batch_size, h_i * w_i, num_points * 2).

  • pts_pred_refine (Tensor) – Points refined of shape (batch_size, h_i * w_i, num_points * 2).

  • labels (Tensor) – Ground truth class indices with shape (batch_size, h_i * w_i).

  • label_weights (Tensor) – Label weights of shape (batch_size, h_i * w_i).

  • bbox_gt_init (Tensor) – BBox regression targets in the init stage of shape (batch_size, h_i * w_i, 8).

  • bbox_weights_init (Tensor) – BBox regression loss weights in the init stage of shape (batch_size, h_i * w_i, 8).

  • bbox_gt_refine (Tensor) – BBox regression targets in the refine stage of shape (batch_size, h_i * w_i, 8).

  • bbox_weights_refine (Tensor) – BBox regression loss weights in the refine stage of shape (batch_size, h_i * w_i, 8).

  • stride (int) – Point stride.

  • avg_factor_init (int) – Average factor that is used to average the loss in the init stage.

  • avg_factor_refine (int) – Average factor that is used to average the loss in the refine stage.

Returns:

loss components.

Return type:

Tuple[Tensor]

class mmrotate.models.dense_heads.RotatedRetinaHead(*args, loss_bbox_type: str = 'normal', **kwargs)[source]

Rotated retina head.

Parameters:

loss_bbox_type (str) – Set the input type of loss_bbox. Defaults to ‘normal’.

loss_by_feat_single(cls_score: Tensor, bbox_pred: Tensor, anchors: Tensor, labels: Tensor, label_weights: Tensor, bbox_targets: Tensor, bbox_weights: Tensor, avg_factor: int) tuple[source]

Calculate the loss of a single scale level based on the features extracted by the detection head.

Parameters:
  • cls_score (Tensor) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W).

  • bbox_pred (Tensor) – Box energies / deltas for each scale level with shape (N, num_anchors * 4, H, W).

  • anchors (Tensor) – Box reference for each scale level with shape (N, num_total_anchors, 4).

  • labels (Tensor) – Labels of each anchors with shape (N, num_total_anchors).

  • label_weights (Tensor) – Label weights of each anchor with shape (N, num_total_anchors)

  • bbox_targets (Tensor) – BBox regression targets of each anchor weight shape (N, num_total_anchors, 4).

  • bbox_weights (Tensor) – BBox regression loss weights of each anchor with shape (N, num_total_anchors, 4).

  • avg_factor (int) – Average factor that is used to average the loss.

Returns:

loss components.

Return type:

tuple

class mmrotate.models.dense_heads.S2AHead(*args, loss_bbox_type: str = 'normal', **kwargs)[source]

An anchor-based head used in S2A-Net.

filter_bboxes(cls_scores: List[Tensor], bbox_preds: List[Tensor]) List[List[Tensor]][source]

This function will be used in S2ANet, whose num_anchors=1.

Parameters:
  • cls_scores (list[Tensor]) – Box scores for each scale level Has shape (N, num_classes, H, W)

  • bbox_preds (list[Tensor]) – Box energies / deltas for each scale level with shape (N, 5, H, W)

Returns:

refined rbboxes of each level of each image.

Return type:

list[list[Tensor]]

class mmrotate.models.dense_heads.S2ARefineHead(num_classes: int, in_channels: int, frm_cfg: dict | None = None, **kwargs)[source]

Rotated Anchor-based refine head. It’s a part of the Oriented Detection Module (ODM), which produces orientation-sensitive features for classification and orientation-invariant features for localization.

Parameters:
  • num_classes (int) – Number of categories excluding the background category.

  • in_channels (int) – Number of channels in the input feature map.

  • frm_cfg (dict) – Config of the feature refine module.

feature_refine(x: List[Tensor], rois: List[List[Tensor]]) List[Tensor][source]

Refine the input feature use feature refine module.

Parameters:
  • x (list[Tensor]) – feature maps of multiple scales.

  • rois (list[list[Tensor]]) – input rbboxes of multiple scales of multiple images, output by former stages and are to be refined.

Returns:

refined feature maps of multiple scales.

Return type:

list[Tensor]

forward_single(x: Tensor) Tuple[Tensor, Tensor][source]

Forward feature of a single scale level.

Parameters:

x (Tensor) – Features of a single scale level.

Returns:

  • cls_score (Tensor): Cls scores for a single scale level the channels number is num_anchors * num_classes.

  • bbox_pred (Tensor): Box energies / deltas for a single scale level, the channels number is num_anchors * 4.

Return type:

tuple

get_anchors(featmap_sizes: List[tuple], batch_img_metas: List[dict], device: device | str = 'cuda') Tuple[List[List[Tensor]], List[List[Tensor]]][source]

Get anchors according to feature map sizes.

Parameters:
  • featmap_sizes (list[tuple]) – Multi-level feature map sizes.

  • batch_img_metas (list[dict]) – Image meta info.

  • device (torch.device | str) – Device for returned tensors. Defaults to cuda.

Returns:

  • anchor_list (list[list[Tensor]]): Anchors of each image.

  • valid_flag_list (list[list[Tensor]]): Valid flags of each image.

Return type:

tuple

loss_by_feat(cls_scores: List[Tensor], bbox_preds: List[Tensor], batch_gt_instances: List[InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: List[InstanceData] | None = None, rois: List[Tensor] | None = None) dict[source]

Calculate the loss based on the features extracted by the detection head.

Parameters:
  • cls_scores (list[Tensor]) – Box scores for each scale level has shape (N, num_anchors * num_classes, H, W).

  • bbox_preds (list[Tensor]) – Box energies / deltas for each scale level with shape (N, num_anchors * 4, H, W).

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

  • rois (list[Tensor])

Returns:

A dictionary of loss components.

Return type:

dict

predict_by_feat(cls_scores: List[Tensor], bbox_preds: List[Tensor], score_factors: List[Tensor] | None = None, rois: List[Tensor] | None = None, batch_img_metas: List[dict] | None = None, cfg: ConfigDict | None = None, rescale: bool = False, with_nms: bool = True) List[InstanceData][source]

Transform a batch of output features extracted from the head into bbox results.

Note: When score_factors is not None, the cls_scores are usually multiplied by it then obtain the real score used in NMS, such as CenterNess in FCOS, IoU branch in ATSS.

Parameters:
  • cls_scores (list[Tensor]) – Classification scores for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * num_classes, H, W).

  • bbox_preds (list[Tensor]) – Box energies / deltas for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * 4, H, W).

  • score_factors (list[Tensor], optional) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, num_priors * 1, H, W). Defaults to None.

  • rois (list[Tensor])

  • batch_img_metas (list[dict], Optional) – Batch image meta info. Defaults to None.

  • cfg (ConfigDict, optional) – Test / postprocessing configuration, if None, test_cfg would be used. Defaults to None.

  • rescale (bool) – If True, return boxes in original image space. Defaults to False.

  • with_nms (bool) – If True, do nms before return boxes. Defaults to True.

Returns:

Object detection results of each image after the post process. Each item usually contains following keys.

  • scores (Tensor): Classification scores, has a shape (num_instance, )

  • labels (Tensor): Labels of bboxes, has a shape (num_instances, ).

  • bboxes (Tensor): Has a shape (num_instances, 4), the last dimension 4 arrange as (x1, y1, x2, y2).

Return type:

list[InstanceData]

refine_bboxes(cls_scores: List[Tensor], bbox_preds: List[Tensor], rois: List[List[Tensor]]) List[List[Tensor]][source]

Refine predicted bounding boxes at each position of the feature maps. This method will be used in R3Det in refinement stages.

Parameters:
  • cls_scores (list[Tensor]) – Box scores for each scale level Has shape (N, num_classes, H, W)

  • bbox_preds (list[Tensor]) – Box energies / deltas for each scale level with shape (N, 5, H, W)

  • rois (list[list[Tensor]]) – input rbboxes of each level of each image. rois output by former stages and are to be refined

Returns:

best or refined rbboxes of each level of each image.

Return type:

list[list[Tensor]]

class mmrotate.models.dense_heads.SAMRepPointsHead(*args, **kwargs)[source]

SAM RepPoints head.

get_targets(proposals_list: List[Tensor], valid_flag_list: List[Tensor], batch_gt_instances: List[InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: List[InstanceData] | None = None, stage: str = 'init', unmap_outputs: bool = True) tuple[source]

Compute corresponding GT box and classification targets for proposals.

Parameters:
  • proposals_list (list[Tensor]) – Multi level points/bboxes of each image.

  • valid_flag_list (list[Tensor]) – Multi level valid flags of each image.

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

  • stage (str) – ‘init’ or ‘refine’. Generate target for init stage or refine stage. Defaults to ‘init’.

  • unmap_outputs (bool) – Whether to map outputs back to the original set of anchors. Defaults to True.

Returns:

  • labels_list (list[Tensor]): Labels of each level.

  • label_weights_list (list[Tensor]): Label weights of each

level. - bbox_gt_list (list[Tensor]): Ground truth bbox of each level. - proposals_list (list[Tensor]): Proposals(points/bboxes) of each level. - proposal_weights_list (list[Tensor]): Proposal weights of each level. - avg_factor (int): Average factor that is used to average the loss. When using sampling method, avg_factor is usually the sum of positive and negative priors. When using PseudoSampler, avg_factor is usually equal to the number of positive priors.

Return type:

tuple

loss_by_feat(cls_scores: List[Tensor], pts_preds_init: List[Tensor], pts_preds_refine: List[Tensor], batch_gt_instances: List[InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: List[InstanceData] | None = None) Dict[str, Tensor][source]

Calculate the loss based on the features extracted by the detection head.

Parameters:
  • cls_scores (list[Tensor]) – Box scores for each scale level, each is a 4D-tensor, of shape (batch_size, num_classes, h, w).

  • pts_preds_init (list[Tensor]) – Points for each scale level, each is a 3D-tensor, of shape (batch_size, h_i * w_i, num_points * 2).

  • pts_preds_refine (list[Tensor]) – Points refined for each scale level, each is a 3D-tensor, of shape (batch_size, h_i * w_i, num_points * 2).

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], Optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

Returns:

A dictionary of loss components.

Return type:

dict[str, Tensor]

loss_by_feat_single(cls_score: Tensor, pts_pred_init: Tensor, pts_pred_refine: Tensor, labels: Tensor, label_weights, bbox_gt_init: Tensor, bbox_weights_init: Tensor, sam_weights_init: Tensor, bbox_gt_refine: Tensor, bbox_weights_refine: Tensor, sam_weights_refine: Tensor, stride: int, avg_factor_refine: int) Tuple[Tensor][source]

Calculate the loss of a single scale level based on the features extracted by the detection head.

Parameters:
  • cls_score (Tensor) – Box scores for each scale level Has shape (N, num_classes, h_i, w_i).

  • pts_pred_init (Tensor) – Points of shape (batch_size, h_i * w_i, num_points * 2).

  • pts_pred_refine (Tensor) – Points refined of shape (batch_size, h_i * w_i, num_points * 2).

  • labels (Tensor) – Ground truth class indices with shape (batch_size, h_i * w_i).

  • label_weights (Tensor) – Label weights of shape (batch_size, h_i * w_i).

  • bbox_gt_init (Tensor) – BBox regression targets in the init stage of shape (batch_size, h_i * w_i, 8).

  • bbox_weights_init (Tensor) – BBox regression loss weights in the init stage of shape (batch_size, h_i * w_i, 8).

  • sam_weights_init (Tensor)

  • bbox_gt_refine (Tensor) – BBox regression targets in the refine stage of shape (batch_size, h_i * w_i, 8).

  • bbox_weights_refine (Tensor) – BBox regression loss weights in the refine stage of shape (batch_size, h_i * w_i, 8).

  • sam_weights_refine (Tensor)

  • stride (int) – Point stride.

  • avg_factor_refine (int) – Average factor that is used to average the loss in the refine stage.

Returns:

loss components.

Return type:

Tuple[Tensor]

roi_heads

class mmrotate.models.roi_heads.GVRatioRoIHead(bbox_roi_extractor: ConfigDict | dict | List[ConfigDict | dict] | None = None, bbox_head: ConfigDict | dict | List[ConfigDict | dict] | None = None, mask_roi_extractor: ConfigDict | dict | List[ConfigDict | dict] | None = None, mask_head: ConfigDict | dict | List[ConfigDict | dict] | None = None, shared_head: ConfigDict | dict | None = None, train_cfg: ConfigDict | dict | None = None, test_cfg: ConfigDict | dict | None = None, init_cfg: ConfigDict | dict | List[ConfigDict | dict] | None = None)[source]

Gliding vertex roi head including one bbox head and one mask head.

bbox_loss(x: Tuple[Tensor], sampling_results: List[SamplingResult]) dict[source]

Perform forward propagation and loss calculation of the bbox head on the features of the upstream network.

Parameters:
  • x (tuple[Tensor]) – List of multi-level img features.

  • (list["obj (sampling_results) – SamplingResult]): Sampling results.

Returns:

Usually returns a dictionary with keys:

  • cls_score (Tensor): Classification scores.

  • bbox_pred (Tensor): Box energies / deltas.

  • fix_pred (Tensor): fix / deltas.

  • ratio_pred (Tensor): ratio / deltas.

  • bbox_feats (Tensor): Extract bbox RoI features.

  • loss_bbox (dict): A dictionary of bbox loss components.

Return type:

dict[str, Tensor]

forward(x: Tuple[Tensor], rpn_results_list: List[InstanceData]) tuple[source]

Network forward process. Usually includes backbone, neck and head forward without any post-processing.

Parameters:
  • x (List[Tensor]) – Multi-level features that may have different resolutions.

  • rpn_results_list (list[InstanceData]) – List of region proposals.

Returns

tuple: A tuple of features from bbox_head and mask_head forward.

predict_bbox(x: Tuple[Tensor], batch_img_metas: List[dict], rpn_results_list: List[InstanceData], rcnn_test_cfg: ConfigDict | dict, rescale: bool = False) List[InstanceData][source]

Perform forward propagation of the bbox head and predict detection results on the features of the upstream network.

Parameters:
  • x (tuple[Tensor]) – Feature maps of all scale level.

  • batch_img_metas (list[dict]) – List of image information.

  • rpn_results_list (list[InstanceData]) – List of region proposals.

  • (obj (rcnn_test_cfg) – ConfigDict): test_cfg of R-CNN.

  • rescale (bool) – If True, return boxes in original image space. Defaults to False.

Returns:

Detection results of each image after the post process. Each item usually contains following keys.

  • scores (Tensor): Classification scores, has a shape (num_instance, )

  • labels (Tensor): Labels of bboxes, has a shape (num_instances, ).

  • bboxes (Tensor): Has a shape (num_instances, 4), the last dimension 4 arrange as (x1, y1, x2, y2).

Return type:

list[InstanceData]

class mmrotate.models.roi_heads.RotatedShared2FCBBoxHead(*args, loss_bbox_type: str = 'normal', **kwargs)[source]

Rotated Shared2FC RBBox head.

Parameters:

loss_bbox_type (str) – Set the input type of loss_bbox. Defaults to ‘normal’.

loss(cls_score: Tensor, bbox_pred: Tensor, rois: Tensor, labels: Tensor, label_weights: Tensor, bbox_targets: Tensor, bbox_weights: Tensor, reduction_override: str | None = None) dict[source]

Calculate the loss based on the network predictions and targets.

Parameters:
  • cls_score (Tensor) – Classification prediction results of all class, has shape (batch_size * num_proposals_single_image, num_classes)

  • bbox_pred (Tensor) – Regression prediction results, has shape (batch_size * num_proposals_single_image, 4), the last dimension 4 represents [tl_x, tl_y, br_x, br_y].

  • rois (Tensor) – RoIs with the shape (batch_size * num_proposals_single_image, 5) where the first column indicates batch id of each RoI.

  • labels (Tensor) – Gt_labels for all proposals in a batch, has shape (batch_size * num_proposals_single_image, ).

  • label_weights (Tensor) – Labels_weights for all proposals in a batch, has shape (batch_size * num_proposals_single_image, ).

  • bbox_targets (Tensor) – Regression target for all proposals in a batch, has shape (batch_size * num_proposals_single_image, 4), the last dimension 4 represents [tl_x, tl_y, br_x, br_y].

  • bbox_weights (Tensor) – Regression weights for all proposals in a batch, has shape (batch_size * num_proposals_single_image, 4).

  • reduction_override (str, optional) – The reduction method used to override the original reduction method of the loss. Options are “none”, “mean” and “sum”. Defaults to None,

Returns:

A dictionary of loss.

Return type:

dict

class mmrotate.models.roi_heads.RotatedSingleRoIExtractor(roi_layer, out_channels, featmap_strides, finest_scale=56, init_cfg=None)[source]

Extract RoI features from a single level feature map.

If there are multiple input feature levels, each RoI is mapped to a level according to its scale. The mapping rule is proposed in FPN.

Parameters:
  • roi_layer (dict) – Specify RoI layer type and arguments.

  • out_channels (int) – Output channels of RoI layers.

  • featmap_strides (List[int]) – Strides of input feature maps.

  • finest_scale (int) – Scale threshold of mapping to level 0. Default: 56.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None

build_roi_layers(layer_cfg, featmap_strides)[source]

Build RoI operator to extract feature from each level feature map.

Parameters:
  • layer_cfg (dict) – Dictionary to construct and config RoI layer operation. Options are modules under mmcv/ops such as RoIAlign.

  • featmap_strides (List[int]) – The stride of input feature map w.r.t to the original image size, which would be used to scale RoI coordinate (original image coordinate system) to feature coordinate system.

Returns:

The RoI extractor modules for each level feature map.

Return type:

nn.ModuleList

forward(feats, rois, roi_scale_factor=None)[source]

Forward function.

Parameters:
  • feats (torch.Tensor) – Input features.

  • rois (torch.Tensor) – Input RoIs, shape (k, 5).

  • scale_factor (float) – Scale factor that RoI will be multiplied by.

Returns:

Scaled RoI features.

Return type:

torch.Tensor

map_roi_levels(rois, num_levels)[source]

Map rois to corresponding feature levels by scales.

  • scale < finest_scale * 2: level 0

  • finest_scale * 2 <= scale < finest_scale * 4: level 1

  • finest_scale * 4 <= scale < finest_scale * 8: level 2

  • scale >= finest_scale * 8: level 3

Parameters:
  • rois (torch.Tensor) – Input RoIs, shape (k, 5).

  • num_levels (int) – Total level number.

Returns:

Level index (0-based) of each RoI, shape (k, )

Return type:

Tensor

roi_rescale(rois, scale_factor)[source]

Scale RoI coordinates by scale factor.

Parameters:
  • rois (torch.Tensor) – RoI (Region of Interest), shape (n, 6)

  • scale_factor (float) – Scale factor that RoI will be multiplied by.

Returns:

Scaled RoI.

Return type:

torch.Tensor

losses

class mmrotate.models.losses.BCConvexGIoULoss(reduction='mean', loss_weight=1.0)[source]

BCConvex GIoU loss.

Computing the BCConvex GIoU loss between a set of predicted convexes and target convexes. :param reduction: The reduction method of the loss. Defaults

to ‘mean’.

Parameters:

loss_weight (float, optional) – The weight of loss. Defaults to 1.0.

Returns:

Loss tensor.

Return type:

torch.Tensor

forward(pred, target, weight=None, avg_factor=None, reduction_override=None, **kwargs)[source]

Forward function.

Parameters:
  • pred (torch.Tensor) – Predicted convexes.

  • target (torch.Tensor) – Corresponding gt convexes.

  • weight (torch.Tensor, optional) – The weight of loss for each prediction. Defaults to None.

  • avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.

  • reduction_override (str, optional) – The reduction method used to override the original reduction method of the loss. Defaults to None.

class mmrotate.models.losses.ConvexGIoULoss(reduction='mean', loss_weight=1.0)[source]

Convex GIoU loss.

Computing the Convex GIoU loss between a set of predicted convexes and target convexes. :param reduction: The reduction method of the loss. Defaults

to ‘mean’.

Parameters:

loss_weight (float, optional) – The weight of loss. Defaults to 1.0.

Returns:

Loss tensor.

Return type:

torch.Tensor

forward(pred, target, weight=None, avg_factor=None, reduction_override=None, **kwargs)[source]

Forward function.

Parameters:
  • pred (torch.Tensor) – Predicted convexes.

  • target (torch.Tensor) – Corresponding gt convexes.

  • weight (torch.Tensor, optional) – The weight of loss for each prediction. Defaults to None.

  • avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.

  • reduction_override (str, optional) – The reduction method used to override the original reduction method of the loss. Defaults to None.

class mmrotate.models.losses.GDLoss(loss_type, representation='xy_wh_r', fun='log1p', tau=0.0, alpha=1.0, reduction='mean', loss_weight=1.0, **kwargs)[source]

Gaussian based loss.

Parameters:
  • loss_type (str) – Type of loss.

  • representation (str, optional) – Coordinate System.

  • fun (str, optional) – The function applied to distance. Defaults to ‘log1p’.

  • tau (float, optional) – Defaults to 1.0.

  • alpha (float, optional) – Defaults to 1.0.

  • reduction (str, optional) – The reduction method of the loss. Defaults to ‘mean’.

  • loss_weight (float, optional) – The weight of loss. Defaults to 1.0.

Returns:

loss (torch.Tensor)

forward(pred, target, weight=None, avg_factor=None, reduction_override=None, **kwargs)[source]

Forward function.

Parameters:
  • pred (torch.Tensor) – Predicted convexes.

  • target (torch.Tensor) – Corresponding gt convexes.

  • weight (torch.Tensor, optional) – The weight of loss for each prediction. Defaults to None.

  • avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.

  • reduction_override (str, optional) – The reduction method used to override the original reduction method of the loss. Defaults to None.

class mmrotate.models.losses.GDLoss_v1(loss_type, fun='sqrt', tau=1.0, reduction='mean', loss_weight=1.0, **kwargs)[source]

Gaussian based loss.

Parameters:
  • loss_type (str) – Type of loss.

  • fun (str, optional) – The function applied to distance. Defaults to ‘log1p’.

  • tau (float, optional) – Defaults to 1.0.

  • reduction (str, optional) – The reduction method of the loss. Defaults to ‘mean’.

  • loss_weight (float, optional) – The weight of loss. Defaults to 1.0.

Returns:

loss (torch.Tensor)

forward(pred, target, weight=None, avg_factor=None, reduction_override=None, **kwargs)[source]

Forward function.

Parameters:
  • pred (torch.Tensor) – Predicted convexes.

  • target (torch.Tensor) – Corresponding gt convexes.

  • weight (torch.Tensor, optional) – The weight of loss for each prediction. Defaults to None.

  • avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.

  • reduction_override (str, optional) – The reduction method used to override the original reduction method of the loss. Defaults to None.

class mmrotate.models.losses.H2RBoxConsistencyLoss(center_loss_cfg: ConfigDict | dict = {'loss_weight': 0.0, 'type': 'mmdet.L1Loss'}, shape_loss_cfg: ConfigDict | dict = {'loss_weight': 1.0, 'type': 'mmdet.IoULoss'}, angle_loss_cfg: ConfigDict | dict = {'loss_weight': 1.0, 'type': 'mmdet.L1Loss'}, reduction: str = 'mean', loss_weight: float = 1.0)[source]
forward(pred: Tensor, target: Tensor, weight: Tensor, avg_factor: int | None = None, reduction_override: str | None = None) Tensor[source]

Forward function.

Parameters:
  • pred (Tensor) – Predicted boxes.

  • target (Tensor) – Corresponding gt boxes.

  • weight (Tensor) – The weight of loss for each prediction.

  • avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.

  • reduction_override (str, optional) – The reduction method used to override the original reduction method of the loss. Defaults to None.

Returns:

Calculated loss (Tensor)

class mmrotate.models.losses.H2RBoxV2ConsistencyLoss(loss_rot: ConfigDict | dict = {'beta': 0.1, 'loss_weight': 1.0, 'type': 'mmdet.SmoothL1Loss'}, loss_flp: ConfigDict | dict = {'beta': 0.1, 'loss_weight': 0.05, 'type': 'mmdet.SmoothL1Loss'}, use_snap_loss: bool = True, reduction: str = 'mean')[source]
forward(pred_ori: Tensor, pred_rot: Tensor, pred_flp: Tensor, target_ori: Tensor, target_rot: Tensor, agnostic_mask: Tensor | None = None, avg_factor: int | None = None, reduction_override: str | None = None) Tensor[source]

Forward function.

Parameters:
  • pred (Tensor) – Predicted boxes.

  • target (Tensor) – Corresponding gt boxes.

  • weight (Tensor) – The weight of loss for each prediction.

  • avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.

  • reduction_override (str, optional) – The reduction method used to override the original reduction method of the loss. Defaults to None.

Returns:

Calculated loss (Tensor)

class mmrotate.models.losses.KFLoss(fun='none', reduction='mean', loss_weight=1.0, **kwargs)[source]

Kalman filter based loss.

Parameters:
  • fun (str, optional) – The function applied to distance. Defaults to ‘log1p’.

  • reduction (str, optional) – The reduction method of the loss. Defaults to ‘mean’.

  • loss_weight (float, optional) – The weight of loss. Defaults to 1.0.

Returns:

loss (torch.Tensor)

forward(pred, target, weight=None, avg_factor=None, pred_decode=None, targets_decode=None, reduction_override=None, **kwargs)[source]

Forward function.

Parameters:
  • pred (torch.Tensor) – Predicted convexes.

  • target (torch.Tensor) – Corresponding gt convexes.

  • weight (torch.Tensor, optional) – The weight of loss for each prediction. Defaults to None.

  • avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.

  • pred_decode (torch.Tensor) – Predicted decode bboxes.

  • targets_decode (torch.Tensor) – Corresponding gt decode bboxes.

  • reduction_override (str, optional) – The reduction method used to override the original reduction method of the loss. Defaults to None.

Returns:

loss (torch.Tensor)

class mmrotate.models.losses.RotatedIoULoss(linear=False, eps=1e-06, reduction='mean', loss_weight=1.0, mode='log')[source]

RotatedIoULoss.

Computing the IoU loss between a set of predicted rbboxes and target rbboxes. :param linear: If True, use linear scale of loss else determined

by mode. Default: False.

Parameters:
  • eps (float) – Eps to avoid log(0).

  • reduction (str) – Options are “none”, “mean” and “sum”.

  • loss_weight (float) – Weight of loss.

  • mode (str) – Loss scaling mode, including “linear”, “square”, and “log”. Default: ‘log’

forward(pred, target, weight=None, avg_factor=None, reduction_override=None, **kwargs)[source]

Forward function.

Parameters:
  • pred (torch.Tensor) – The prediction.

  • target (torch.Tensor) – The learning target of the prediction.

  • weight (torch.Tensor, optional) – The weight of loss for each prediction. Defaults to None.

  • avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.

  • reduction_override (str, optional) – The reduction method used to override the original reduction method of the loss. Defaults to None. Options are “none”, “mean” and “sum”.

class mmrotate.models.losses.SmoothFocalLoss(gamma=2.0, alpha=0.25, reduction='mean', loss_weight=1.0)[source]

Smooth Focal Loss. Implementation of Circular Smooth Label (CSL).

Parameters:
  • gamma (float, optional) – The gamma for calculating the modulating factor. Defaults to 2.0.

  • alpha (float, optional) – A balanced form for Focal Loss. Defaults to 0.25.

  • reduction (str, optional) – The method used to reduce the loss into a scalar. Defaults to ‘mean’. Options are “none”, “mean” and “sum”.

  • loss_weight (float, optional) – Weight of loss. Defaults to 1.0.

Returns:

loss (torch.Tensor)

forward(pred, target, weight=None, avg_factor=None, reduction_override=None)[source]

Forward function.

Parameters:
  • pred (torch.Tensor) – The prediction.

  • target (torch.Tensor) – The learning label of the prediction.

  • weight (torch.Tensor, optional) – The weight of loss for each prediction. Defaults to None.

  • avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.

  • reduction_override (str, optional) – The reduction method used to override the original reduction method of the loss. Options are “none”, “mean” and “sum”.

Returns:

The calculated loss

Return type:

torch.Tensor

class mmrotate.models.losses.SpatialBorderLoss(loss_weight=1.0)[source]

Spatial Border loss for learning points in Oriented RepPoints.

Parameters:
  • pts (torch.Tensor) – point sets with shape (N, 9*2). Default points number in each point set is 9.

  • gt_bboxes (torch.Tensor) – gt_bboxes with polygon form with shape(N, 8)

Returns:

spatial border loss.

Return type:

torch.Tensor

forward(pts, gt_bboxes, weight, *args, **kwargs)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

utils

class mmrotate.models.utils.ORConv2d(in_channels, out_channels, kernel_size=3, arf_config=None, stride=1, padding=0, dilation=1, groups=1, bias=True)[source]

Oriented 2-D convolution.

Parameters:
  • in_channels (List[int]) – Number of input channels per scale.

  • out_channels (int) – Number of output channels (used at each scale).

  • kernel_size (int, optional) – The size of kernel.

  • arf_config (tuple, optional) – a tuple consist of nOrientation and nRotation.

  • stride (int, optional) – Stride of the convolution. Default: 1.

  • padding (int or tuple) – Zero-padding added to both sides of the input. Default: 0.

  • dilation (int or tuple) – Spacing between kernel elements. Default: 1.

  • groups (int) – Number of blocked connections from input. channels to output channels. Default: 1.

  • bias (bool) – If True, adds a learnable bias to the output. Default: False.

forward(input)[source]

Forward function.

get_indices()[source]

Get the indices of ORConv2d.

reset_parameters()[source]

Reset the parameters of ORConv2d.

rotate_arf()[source]

Build active rotating filter module.

class mmrotate.models.utils.RotationInvariantPooling(nInputPlane, nOrientation=8)[source]

Rotating invariant pooling module.

Parameters:
  • nInputPlane (int) – The number of Input plane.

  • nOrientation (int, optional) – The number of oriented channels.

forward(x)[source]

Forward function.

mmrotate.models.utils.convex_overlaps(gt_bboxes, points)[source]

Compute overlaps between polygons and points.

Parameters:
  • gt_rbboxes (torch.Tensor) – Groundtruth polygons, shape (k, 8).

  • points (torch.Tensor) – Points to be assigned, shape(n, 18).

Returns:

Overlaps between k gt_bboxes and n bboxes,

shape(k, n).

Return type:

overlaps (torch.Tensor)

mmrotate.models.utils.get_num_level_anchors_inside(num_level_anchors, inside_flags)[source]

Get number of every level anchors inside.

Parameters:
  • num_level_anchors (List[int]) – List of number of every level’s anchors.

  • inside_flags (torch.Tensor) – Flags of all anchors.

Returns:

List of number of inside anchors.

Return type:

List[int]

mmrotate.models.utils.levels_to_images(mlvl_tensor, flatten=False)[source]

Concat multi-level feature maps by image.

[feature_level0, feature_level1…] -> [feature_image0, feature_image1…] Convert the shape of each element in mlvl_tensor from (N, C, H, W) to (N, H*W , C), then split the element to N elements with shape (H*W, C), and concat elements in same image of all level along first dimension.

Parameters:
  • mlvl_tensor (list[torch.Tensor]) – list of Tensor which collect from corresponding level. Each element is of shape (N, C, H, W)

  • flatten (bool, optional) – if shape of mlvl_tensor is (N, C, H, W) set False, if shape of mlvl_tensor is (N, H, W, C) set True.

Returns:

A list that contains N tensors and each tensor is

of shape (num_elements, C)

Return type:

list[torch.Tensor]

mmrotate.models.utils.points_center_pts(RPoints, y_first=True)[source]

Compute center point of Pointsets.

Parameters:
  • RPoints (torch.Tensor) – the lists of Pointsets, shape (k, 18).

  • y_first (bool, optional) – if True, the sequence of Pointsets is (y,x).

Returns:

the mean_center coordination of Pointsets,

shape (k, 18).

Return type:

center_pts (torch.Tensor)

mmrotate.utils

mmrotate.utils.collect_env()[source]

Collect environment information.

mmrotate.utils.get_multiscale_patch(sizes, steps, ratios)[source]

Get multiscale patch sizes and steps.

Parameters:
  • sizes (list) – A list of patch sizes.

  • steps (list) – A list of steps to slide patches.

  • ratios (list) – Multiscale ratios. devidie to each size and step and generate patches in new scales.

Returns:

A list of multiscale patch sizes. new_steps (list): A list of steps corresponding to new_sizes.

Return type:

new_sizes (list)

mmrotate.utils.get_test_pipeline_cfg(cfg: str | ConfigDict) ConfigDict[source]

Get the test dataset pipeline from entire config.

Parameters:

cfg (str or ConfigDict) – the entire config. Can be a config file or a ConfigDict.

Returns:

the config of test dataset.

Return type:

ConfigDict

mmrotate.utils.merge_results_by_nms(results: List[DetDataSample], offsets: ndarray, img_shape: Tuple[int, int], nms_cfg: dict) DetDataSample[source]

Merge patch results by nms.

Parameters:
  • results (List[DetDataSample]) – A list of patches results.

  • offsets (np.ndarray) – Positions of the left top points of patches.

  • img_shape (Tuple[int, int]) – A tuple of the huge image’s width and height.

  • nms_cfg (dict) – it should specify nms type and other parameters like iou_threshold.

Returns:

merged results.

Return type:

DetDataSample

mmrotate.utils.register_all_modules(init_default_scope: bool = True) None[source]

Register all modules in mmrotate into the registries.

Parameters:

init_default_scope (bool) – Whether initialize the mmrotate default scope. When init_default_scope=True, the global default scope will be set to mmrotate, anmmrotate all registries will build modules from mmrotate’s registry node. To understand more about the registry, please refer to https://github.com/vbti-development/onedl-mmengine/blob/main/docs/en/tutorials/registry.md Defaults to True.

mmrotate.utils.slide_window(width, height, sizes, steps, img_rate_thr=0.6)[source]

Slide windows in images and get window position.

Parameters:
  • width (int) – The width of the image.

  • height (int) – The height of the image.

  • sizes (list) – List of window’s sizes.

  • steps (list) – List of window’s steps.

  • img_rate_thr (float) – Threshold of window area divided by image area.

Returns:

Information of valid windows.

Return type:

np.ndarray