mmrotate.apis¶
- mmrotate.apis.inference_detector_by_patches(model: Module, imgs: str | ndarray | Sequence[str] | Sequence[ndarray], sizes: List[int], steps: List[int], ratios: List[float], nms_cfg: dict, test_pipeline: Compose | None = None, bs: int = 1) DetDataSample | List[DetDataSample][source]¶
Inference patches with the detector.
Split huge image(s) into patches and inference them with the detector. Finally, merge patch results on one huge image by nms.
- Parameters:
model (nn.Module) – The loaded detector.
imgs (str, ndarray, Sequence[str/ndarray]) – Either image files or loaded images.
sizes (list[int]) – The sizes of patches.
steps (list[int]) – The steps between two patches.
ratios (list[float]) – Image resizing ratios for multi-scale detecting.
nms_cfg (dict) – nms config.
bs (int) – Batch size, must greater than or equal to 1.
- Returns:
Detection results.
- Return type:
list[np.ndarray]
mmrotate.core¶
anchor¶
bbox¶
patch¶
evaluation¶
post_processing¶
visualization¶
mmrotate.datasets¶
datasets¶
- class mmrotate.datasets.DIORDataset(ann_subdir: str = 'Annotations/Oriented Bounding Boxes/', file_client_args: dict | None = None, backend_args: dict | None = None, ann_type: str = 'obb', **kwargs)[source]¶
DIOR dataset for detection.
- Parameters:
ann_subdir (str) – Subdir where annotations are. Defaults to ‘Annotations/Oriented Bounding Boxes/’.
file_client_args (dict) – Arguments to instantiate the corresponding backend in mmdet <= 3.0.0rc6. Defaults to None.
backend_args (dict, optional) – Arguments to instantiate the corresponding backend. Defaults to None.
ann_type (str) – Choose obb or hbb as ground truth. Defaults to obb.
- property bbox_min_size: str | None¶
Return the minimum size of bounding boxes in the images.
- filter_data() List[dict][source]¶
Filter annotations according to filter_cfg.
- Returns:
Filtered results.
- Return type:
List[dict]
- get_cat_ids(idx: int) List[int][source]¶
Get DIOR category ids by index.
- Parameters:
idx (int) – Index of data.
- Returns:
All categories in the image of specified index.
- Return type:
List[int]
- class mmrotate.datasets.DOTADataset(diff_thr: int = 100, img_suffix: str = 'png', **kwargs)[source]¶
DOTA-v1.0 dataset for detection.
Note:
ann_filein DOTADataset is different from the BaseDataset. In BaseDataset, it is the path of an annotation file. In DOTADataset, it is the path of a folder containing XML files.- Parameters:
diff_thr (int) – The difficulty threshold of ground truth. Bboxes with difficulty higher than it will be ignored. The range of this value should be non-negative integer. Defaults to 100.
img_suffix (str) – The suffix of images. Defaults to ‘png’.
- filter_data() List[dict][source]¶
Filter annotations according to filter_cfg.
- Returns:
Filtered results.
- Return type:
List[dict]
- class mmrotate.datasets.DOTAv15Dataset(diff_thr: int = 100, img_suffix: str = 'png', **kwargs)[source]¶
DOTA-v1.5 dataset for detection.
Note:
ann_filein DOTAv15Dataset is different from the BaseDataset. In BaseDataset, it is the path of an annotation file. In DOTAv15Dataset, it is the path of a folder containing XML files.
- class mmrotate.datasets.DOTAv2Dataset(diff_thr: int = 100, img_suffix: str = 'png', **kwargs)[source]¶
DOTA-v2.0 dataset for detection.
Note:
ann_filein DOTAv2Dataset is different from the BaseDataset. In BaseDataset, it is the path of an annotation file. In DOTAv2Dataset, it is the path of a folder containing XML files.
- class mmrotate.datasets.HRSCDataset(img_subdir: str = 'AllImages', ann_subdir: str = 'Annotations', classwise: bool = False, file_client_args: dict | None = None, backend_args: dict | None = None, **kwargs)[source]¶
HRSC dataset for detection.
Note: There are two evaluation methods for HRSC datasets, which can be chosen through
classwise. Whenclasswise=False, it means there is only one class; Whenclasswise=True, it means there are 31 classes of ships.- Parameters:
img_subdir (str) – Subdir where images are stored. Defaults to ‘AllImages’.
ann_subdir (str) – Subdir where annotations are. Defaults to ‘Annotations’.
classwise (bool) – Whether to use all 31 classes or only one class. Defaults to False.
file_client_args (dict) – Arguments to instantiate the corresponding backend in mmdet <= 3.0.0rc6. Defaults to None.
backend_args (dict, optional) – Arguments to instantiate the corresponding backend. Defaults to None.
- property bbox_min_size: str | None¶
Return the minimum size of bounding boxes in the images.
- filter_data() List[dict][source]¶
Filter annotations according to filter_cfg.
- Returns:
Filtered results.
- Return type:
List[dict]
- get_cat_ids(idx: int) List[int][source]¶
Get COCO category ids by index.
- Parameters:
idx (int) – Index of data.
- Returns:
All categories in the image of specified index.
- Return type:
List[int]
- load_data_list() List[dict][source]¶
Load annotation from XML style ann_file.
- Returns:
Annotation info from XML file.
- Return type:
list[dict]
- parse_data_info(img_info: dict) dict | List[dict][source]¶
Parse raw annotation to target format.
- Parameters:
img_info (dict) – Raw image information, usually it includes img_id, file_name, and xml_path.
- Returns:
Parsed annotation.
- Return type:
Union[dict, List[dict]]
- property sub_data_root: str¶
Return the sub data root.
pipelines¶
mmrotate.models¶
detectors¶
- class mmrotate.models.detectors.H2RBoxDetector(backbone: ConfigDict | dict, neck: ConfigDict | dict, bbox_head: ConfigDict | dict, crop_size: Tuple[int, int] = (768, 768), padding: str = 'reflection', train_cfg: ConfigDict | dict | None = None, test_cfg: ConfigDict | dict | None = None, data_preprocessor: ConfigDict | dict | None = None, init_cfg: ConfigDict | dict | List[ConfigDict | dict] | None = None)[source]¶
Implementation of H2RBox
- loss(batch_inputs: Tensor, batch_data_samples: List[DetDataSample]) dict | list[source]¶
Calculate losses from a batch of inputs and data samples.
- Parameters:
batch_inputs (Tensor) – Input images of shape (N, C, H, W). These should usually be mean centered and std scaled.
batch_data_samples (list[
DetDataSample]) – The batch data samples. It usually includes information such as gt_instance or gt_panoptic_seg or gt_sem_seg.
- Returns:
A dictionary of loss components.
- Return type:
dict
- rotate_crop(batch_inputs: Tensor, rot: float = 0.0, size: Tuple[int, int] = (768, 768), batch_gt_instances: List[InstanceData] | None = None, padding: str = 'reflection') Tuple[Tensor, List[InstanceData]][source]¶
- Parameters:
batch_inputs (Tensor) – Input images of shape (N, C, H, W). These should usually be mean centered and std scaled.
rot (float) – Angle of view rotation. Defaults to 0.
size (tuple[int]) – Crop size from image center. Defaults to (768, 768).
batch_gt_instances (list[
InstanceData]) – Batch of gt_instance. It usually includesbboxesandlabelsattributes.padding (str) – Padding method of image black edge. Defaults to ‘reflection’.
- Returns:
Processed batch_inputs (Tensor) and batch_gt_instances (list[
InstanceData])
- class mmrotate.models.detectors.H2RBoxV2Detector(backbone: ConfigDict | dict, neck: ConfigDict | dict, bbox_head: ConfigDict | dict, crop_size: Tuple[int, int] = (768, 768), padding: str = 'reflection', view_range: Tuple[float, float] = (0.25, 0.75), train_cfg: ConfigDict | dict | None = None, test_cfg: ConfigDict | dict | None = None, data_preprocessor: ConfigDict | dict | None = None, init_cfg: ConfigDict | dict | List[ConfigDict | dict] | None = None)[source]¶
Implementation of H2RBox-v2
- loss(batch_inputs: Tensor, batch_data_samples: List[DetDataSample]) dict | list[source]¶
Calculate losses from a batch of inputs and data samples.
- Parameters:
batch_inputs (Tensor) – Input images of shape (N, C, H, W). These should usually be mean centered and std scaled.
batch_data_samples (list[
DetDataSample]) – The batch data samples. It usually includes information such as gt_instance or gt_panoptic_seg or gt_sem_seg.
- Returns:
A dictionary of loss components.
- Return type:
dict
- rotate_crop(batch_inputs: Tensor, rot: float = 0.0, size: Tuple[int, int] = (768, 768), batch_gt_instances: List[InstanceData] | None = None, padding: str = 'reflection') Tuple[Tensor, List[InstanceData]][source]¶
- Parameters:
batch_inputs (Tensor) – Input images of shape (N, C, H, W). These should usually be mean centered and std scaled.
rot (float) – Angle of view rotation. Defaults to 0.
size (tuple[int]) – Crop size from image center. Defaults to (768, 768).
batch_gt_instances (list[
InstanceData]) – Batch of gt_instance. It usually includesbboxesandlabelsattributes.padding (str) – Padding method of image black edge. Defaults to ‘reflection’.
- Returns:
Processed batch_inputs (Tensor) and batch_gt_instances (list[
InstanceData])
- class mmrotate.models.detectors.RefineSingleStageDetector(backbone: ConfigDict | dict, neck: ConfigDict | dict | None = None, bbox_head_init: ConfigDict | dict | None = None, bbox_head_refine: List[ConfigDict | dict | None] | None = None, train_cfg: ConfigDict | dict | None = None, test_cfg: ConfigDict | dict | None = None, data_preprocessor: ConfigDict | dict | None = None, init_cfg: ConfigDict | dict | List[ConfigDict | dict] | None = None)[source]¶
Base class for refine single-stage detectors, which used by S2A-Net and R3Det.
- Parameters:
backbone (
ConfigDictor dict) – The backbone module.neck (
ConfigDictor dict) – The neck module.bbox_head_init (
ConfigDictor dict) – The bbox head module of the first stage.bbox_head_refine (list[
ConfigDict| dict]) – The bbox head module of the refine stage.train_cfg (
ConfigDictor dict, optional) – The training config of RefineSingleStageDetector. Defaults to None.test_cfg (
ConfigDictor dict, optional) – The testing config of RefineSingleStageDetector. Defaults to None.data_preprocessor (
ConfigDictor dict, optional) – Config ofDetDataPreprocessorto process the input data. Defaults to None.init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None
- extract_feat(batch_inputs: Tensor) Tuple[Tensor][source]¶
Extract features.
- Parameters:
batch_inputs (Tensor) – Image tensor with shape (N, C, H ,W).
- Returns:
Multi-level features that may have different resolutions.
- Return type:
tuple[Tensor]
- loss(batch_inputs: Tensor, batch_data_samples: List[DetDataSample]) dict | list[source]¶
Calculate losses from a batch of inputs and data samples.
- Parameters:
batch_inputs (Tensor) – Input images of shape (N, C, H, W). These should usually be mean centered and std scaled.
batch_data_samples (list[
DetDataSample]) – The batch data samples. It usually includes information such as gt_instance or gt_panoptic_seg or gt_sem_seg.
- Returns:
A dictionary of loss components.
- Return type:
dict
- predict(batch_inputs: Tensor, batch_data_samples: List[DetDataSample], rescale: bool = True) List[DetDataSample][source]¶
Predict results from a batch of inputs and data samples with post- processing.
- Parameters:
batch_inputs (Tensor) – Inputs with shape (N, C, H, W).
batch_data_samples (List[
DetDataSample]) – The Data Samples. It usually includes information such as gt_instance, gt_panoptic_seg and gt_sem_seg.rescale (bool) – Whether to rescale the results. Defaults to True.
- Returns:
Detection results of the input images. Each DetDataSample usually contain ‘pred_instances’. And the
pred_instancesusually contains following keys.scores (Tensor): Classification scores, has a shape (num_instance, )
labels (Tensor): Labels of bboxes, has a shape (num_instances, ).
bboxes (Tensor): Has a shape (num_instances, 5), the last dimension 5 arrange as (x, y, w, h, t).
- Return type:
list[
DetDataSample]
backbones¶
- class mmrotate.models.backbones.ReResNet(depth: int, in_channels: int = 3, stem_channels: int = 64, base_channels: int = 64, expansion: int | None = None, num_stages: int = 4, strides: Sequence[int] = (1, 2, 2, 2), dilations: Sequence[int] = (1, 1, 1, 1), out_indices: Sequence[int] = (3,), style: str = 'pytorch', deep_stem: bool = False, avg_down: bool = False, frozen_stages: int = -1, conv_cfg: ConfigDict | dict | None = None, norm_cfg: ConfigDict | dict = {'requires_grad': True, 'type': 'BN'}, norm_eval: bool = False, with_cp: bool = False, zero_init_residual: bool = True, init_cfg: ConfigDict | dict | List[ConfigDict | dict] | None = None)[source]¶
ReResNet backbone.
Please refer to the paper for details.
- Parameters:
depth (int) – Network depth, from {18, 34, 50, 101, 152}.
in_channels (int) – Number of input image channels. Defaults to 3.
stem_channels (int) – Output channels of the stem layer. Defaults to 64.
base_channels (int) – Middle channels of the first stage. Defaults to 64.
expansion (int, optional) – The expansion for BasicBlock/Bottleneck. If not specified, it will firstly be obtained via
block.expansion. If the block has no attribute “expansion”, the following default values will be used: 1 for BasicBlock and 4 for Bottleneck. Defaults to None.num_stages (int) – Stages of the network. Defaults to 4.
strides (Sequence[int]) – Strides of the first block of each stage. Defaults to
(1, 2, 2, 2).dilations (Sequence[int]) – Dilation of each stage. Defaults to
(1, 1, 1, 1).out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Defaults to
(3, ).style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Defaults to False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Defaults to False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.
conv_cfg (
ConfigDictor dict, optional) – dictionary to construct and config conv layer. Defaults to Nonenorm_cfg (
ConfigDictor dict) – dictionary to construct and config norm layer. Defaults todict(type='BN')norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Defaults to False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Defaults to True.
init_cfg (
ConfigDictor dict or list[ConfigDictor dict], optional) – Initialization config dict. Defaults to None.
- property norm1: str¶
Get normalizion layer’s name.
necks¶
- class mmrotate.models.necks.ReFPN(in_channels: List[int], out_channels: int, num_outs: int, start_level: int = 0, end_level: int = -1, add_extra_convs: bool = False, extra_convs_on_inputs: bool = True, relu_before_extra_convs: bool = False, no_norm_on_lateral: bool = False, conv_cfg: ConfigDict | dict | None = None, norm_cfg: ConfigDict | dict | None = None, activation: str | None = None, init_cfg: ConfigDict | dict | List[ConfigDict | dict] = {'distribution': 'uniform', 'layer': 'Conv2d', 'type': 'Xavier'})[source]¶
ReFPN.
- Parameters:
in_channels (List[int]) – Number of input channels per scale.
out_channels (int) – Number of output channels (used at each scale)
num_outs (int) – Number of output scales.
start_level (int) – Index of the start input backbone level used to build the feature pyramid. Defaults to 0.
end_level (int) – Index of the end input backbone level (exclusive) to build the feature pyramid. Defaults to -1, which means the last level.
add_extra_convs (bool) – It decides whether to add conv layers on top of the original feature maps. Default to False.
extra_convs_on_inputs (bool) – It specifies the source feature map of the extra convs is the last feat map of neck inputs.
relu_before_extra_convs (bool) – Whether to apply relu before the extra conv. Defaults to False.
no_norm_on_lateral (bool) – Whether to apply norm on lateral. Defaults to False.
conv_cfg (
ConfigDictor dict, optional) – dictionary to construct and config conv layer. Defaults to Nonenorm_cfg (
ConfigDictor dict) – dictionary to construct and config norm layer. Defaults to Noneactivation (str, optional) – Activation layer in ConvModule. Defaults to None.
init_cfg (
ConfigDictor dict or list[ConfigDictor dict], optional) – Initialization config dict. Defaults to None.
dense_heads¶
- class mmrotate.models.dense_heads.AngleBranchRetinaHead(*args, use_encoded_angle: bool = True, shield_reg_angle: bool = False, use_normalized_angle_feat: bool = False, angle_coder: ConfigDict | dict = {'angle_version': 'le90', 'omega': 1, 'radius': 6, 'type': 'CSLCoder', 'window': 'gaussian'}, loss_angle: ConfigDict | dict = {'loss_weight': 1.0, 'type': 'mmdet.CrossEntropyLoss', 'use_sigmoid': True}, init_cfg: ConfigDict | dict | List[ConfigDict | dict] = {'layer': 'Conv2d', 'override': [{'bias_prob': 0.01, 'name': 'retina_cls', 'std': 0.01, 'type': 'Normal'}, {'bias_prob': 0.01, 'name': 'retina_angle_cls', 'std': 0.01, 'type': 'Normal'}], 'std': 0.01, 'type': 'Normal'}, **kwargs)[source]¶
Retina head with angle regression branch.
The head contains three subnetworks. The first classifies anchor boxes and the second regresses deltas for the anchors, the third regresses angles.
- Parameters:
use_encoded_angle (
ConfigDictor dict) – Decide whether to use encoded angle or gt angle as target. Defaults to True.shield_reg_angle (
ConfigDictor dict) – Decide whether to shield the angle loss from reg branch. Defaults to False.angle_coder (dict) – Config of angle coder.
loss_angle (dict) – Config of angle classification loss.
init_cfg (
ConfigDictor dict or list[ConfigDictor dict]) – Initialization config dict.
- forward_single(x: Tensor) Tuple[Tensor, Tensor, Tensor][source]¶
Forward feature of a single scale level.
- Parameters:
x (Tensor) – Features of a single scale level.
- Returns:
cls_score (Tensor): Cls scores for a single scale level the channels number is num_anchors * num_classes.
bbox_pred (Tensor): Box energies / deltas for a single scale level, the channels number is num_anchors * 5.
angle_pred (Tensor): Angle for a single scale level the channels number is num_anchors * encode_size.
- Return type:
tuple
- loss_by_feat(cls_scores: List[Tensor], bbox_preds: List[Tensor], angle_preds: List[Tensor], batch_gt_instances: List[InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: List[InstanceData] | None = None) dict[source]¶
Calculate the loss based on the features extracted by the detection head.
- Parameters:
cls_scores (list[Tensor]) – Box scores for each scale level has shape (N, num_anchors * num_classes, H, W).
bbox_preds (list[Tensor]) – Box energies / deltas for each scale level with shape (N, num_anchors * 5, H, W).
angle_preds (list[Tensor]) – Box angles for each scale level with shape (N, num_anchors * encode_size, H, W).
batch_gt_instances (list[
InstanceData]) – Batch of gt_instance. It usually includesbboxesandlabelsattributes.batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[
InstanceData], optional) – Batch of gt_instances_ignore. It includesbboxesattribute data that is ignored during training and testing. Defaults to None.
- Returns:
A dictionary of loss components.
- Return type:
dict
- loss_by_feat_single(cls_score: Tensor, bbox_pred: Tensor, angle_pred: Tensor, anchors: Tensor, labels: Tensor, label_weights: Tensor, bbox_targets: Tensor, bbox_weights: Tensor, angle_targets: Tensor, angle_weights: Tensor, avg_factor: int) tuple[source]¶
Calculate the loss of a single scale level based on the features extracted by the detection head.
- Parameters:
cls_score (Tensor) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W).
bbox_pred (Tensor) – Box energies / deltas for each scale level with shape (N, num_anchors * 5, H, W).
angle_pred (Tensor) – Box angles for each scale level with shape (N, num_anchors * encode_size, H, W).
anchors (Tensor) – Box reference for each scale level with shape (N, num_total_anchors, 5).
labels (Tensor) – Labels of each anchors with shape (N, num_total_anchors).
label_weights (Tensor) – Label weights of each anchor with shape (N, num_total_anchors)
bbox_targets (Tensor) – BBox regression targets of each anchor weight shape (N, num_total_anchors, 5).
bbox_weights (Tensor) – BBox regression loss weights of each anchor with shape (N, num_total_anchors, 5).
angle_targets (Tensor) – Angle regression targets of each anchor weight shape (N, num_total_anchors, 1).
angle_weights (Tensor) – Angle regression loss weights of each anchor with shape (N, num_total_anchors, 1).
avg_factor (int) – Average factor that is used to average the loss.
- Returns:
loss components.
- Return type:
tuple
- predict_by_feat(cls_scores: List[Tensor], bbox_preds: List[Tensor], angle_preds: List[Tensor], score_factors: List[Tensor] | None = None, batch_img_metas: List[dict] | None = None, cfg: ConfigDict | None = None, rescale: bool = False, with_nms: bool = True) List[InstanceData][source]¶
Transform a batch of output features extracted from the head into bbox results.
Note: When score_factors is not None, the cls_scores are usually multiplied by it then obtain the real score used in NMS, such as CenterNess in FCOS, IoU branch in ATSS.
- Parameters:
cls_scores (list[Tensor]) – Classification scores for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * num_classes, H, W).
bbox_preds (list[Tensor]) – Box energies / deltas for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * 4, H, W).
angle_preds (list[Tensor]) – Box angles for each scale level with shape (N, num_anchors * encode_size, H, W)
score_factors (list[Tensor], optional) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, num_priors * 1, H, W). Defaults to None.
batch_img_metas (list[dict], Optional) – Batch image meta info. Defaults to None.
cfg (ConfigDict, optional) – Test / postprocessing configuration, if None, test_cfg would be used. Defaults to None.
rescale (bool) – If True, return boxes in original image space. Defaults to False.
with_nms (bool) – If True, do nms before return boxes. Defaults to True.
- Returns:
Object detection results of each image after the post process. Each item usually contains following keys.
scores (Tensor): Classification scores, has a shape (num_instance, )
labels (Tensor): Labels of bboxes, has a shape (num_instances, ).
bboxes (Tensor): Has a shape (num_instances, 4), the last dimension 4 arrange as (x1, y1, x2, y2).
- Return type:
list[
InstanceData]
- class mmrotate.models.dense_heads.CFAHead(*args, topk: int = 6, anti_factor: float = 0.75, **kwargs)[source]¶
CFA head.
- Parameters:
topk (int) – Number of the highest topk points. Defaults to 6.
anti_factor (float) – Feature anti-aliasing coefficient. Defaults to 0.75.
- get_cfa_targets(proposals_list: List[Tensor], valid_flag_list: List[Tensor], batch_gt_instances: List[InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: List[InstanceData] | None = None, stage: str = 'init', unmap_outputs: bool = True, return_sampling_results: bool = False) tuple[source]¶
Compute corresponding GT box and classification targets for proposals.
- Parameters:
proposals_list (list[Tensor]) – Multi level points/bboxes of each image.
valid_flag_list (list[Tensor]) – Multi level valid flags of each image.
batch_gt_instances (list[
InstanceData]) – Batch of gt_instance. It usually includesbboxesandlabelsattributes.batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[
InstanceData], optional) – Batch of gt_instances_ignore. It includesbboxesattribute data that is ignored during training and testing. Defaults to None.stage (str) – ‘init’ or ‘refine’. Generate target for init stage or refine stage. Defaults to ‘init’.
unmap_outputs (bool) – Whether to map outputs back to the original set of anchors. Defaults to True.
return_sampling_results (bool) – Whether to return the sampling results. Defaults to False.
- Returns:
all_labels (list[Tensor]): Labels of each level.
all_label_weights (list[Tensor]): Label weights of each
level. - all_bbox_gt (list[Tensor]): Ground truth bbox of each level. - all_proposals (list[Tensor]): Proposals(points/bboxes) of each level. - all_proposal_weights (list[Tensor]): Proposal weights of each level. - pos_inds (list[Tensor]): Index of positive samples in all images. - gt_inds (list[Tensor]): Index of ground truth bbox in all images.
- Return type:
tuple
- get_pos_loss(cls_score: Tensor, pts_pred: Tensor, label: Tensor, bbox_gt: Tensor, label_weight: Tensor, convex_weight: Tensor, pos_inds: Tensor) Tensor[source]¶
Calculate loss of all potential positive samples obtained from first match process.
- Parameters:
cls_score (Tensor) – Box scores of single image with shape (num_anchors, num_classes)
pts_pred (Tensor) – Box energies / deltas of single image with shape (num_anchors, 4)
label (Tensor) – classification target of each anchor with shape (num_anchors,)
bbox_gt (Tensor) – Ground truth box.
label_weight (Tensor) – Classification loss weight of each anchor with shape (num_anchors).
convex_weight (Tensor) – Bbox weight of each anchor with shape (num_anchors, 4).
pos_inds (Tensor) – Index of all positive samples got from first assign process.
- Returns:
Losses of all positive samples in single image.
- Return type:
Tensor
- loss_by_feat(cls_scores: List[Tensor], pts_preds_init: List[Tensor], pts_preds_refine: List[Tensor], batch_gt_instances: List[InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: List[InstanceData] | None = None) Dict[str, Tensor][source]¶
Calculate the loss based on the features extracted by the detection head.
- Parameters:
cls_scores (list[Tensor]) – Box scores for each scale level, each is a 4D-tensor, of shape (batch_size, num_classes, h, w).
pts_preds_init (list[Tensor]) – Points for each scale level, each is a 3D-tensor, of shape (batch_size, h_i * w_i, num_points * 2).
pts_preds_refine (list[Tensor]) – Points refined for each scale level, each is a 3D-tensor, of shape (batch_size, h_i * w_i, num_points * 2).
batch_gt_instances (list[
InstanceData]) – Batch of gt_instance. It usually includesbboxesandlabelsattributes.batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[
InstanceData], Optional) – Batch of gt_instances_ignore. It includesbboxesattribute data that is ignored during training and testing. Defaults to None.
- Returns:
A dictionary of loss components.
- Return type:
dict[str, Tensor]
- loss_by_feat_single(pts_pred_init: Tensor, bbox_gt_init: Tensor, bbox_weights_init: Tensor, stride: int, avg_factor_init: int) Tuple[Tensor][source]¶
Calculate the loss of a single scale level based on the features extracted by the detection head.
- Parameters:
pts_pred_init (Tensor) – Points of shape (batch_size, h_i * w_i, num_points * 2).
bbox_gt_init (Tensor) – BBox regression targets in the init stage of shape (batch_size, h_i * w_i, 8).
bbox_weights_init (Tensor) – BBox regression loss weights in the init stage of shape (batch_size, h_i * w_i, 8).
stride (int) – Point stride.
avg_factor_init (int) – Average factor that is used to average the loss in the init stage.
- Returns:
loss components.
- Return type:
Tuple[Tensor]
- reassign(pos_losses: Tensor, label: Tensor, label_weight: Tensor, pts_pred_init: Tensor, convex_weight: Tensor, gt_instances: InstanceData, pos_inds: Tensor, pos_gt_inds: Tensor, num_proposals_each_level: List | None = None, num_level: int | None = None) tuple[source]¶
CFA reassign process.
- Parameters:
pos_losses (Tensor) – Losses of all positive samples in single image.
label (Tensor) – classification target of each anchor with shape (num_anchors,)
label_weight (Tensor) – Classification loss weight of each anchor with shape (num_anchors).
pts_pred_init (Tensor)
convex_weight (Tensor) – Bbox weight of each anchor with shape (num_anchors, 4).
gt_instances (
InstanceData) – Ground truth of instance annotations. It usually includesbboxesandlabelsattributes.pos_inds (Tensor) – Index of all positive samples got from first assign process.
pos_gt_inds (Tensor) – Gt_index of all positive samples got from first assign process.
num_proposals_each_level (list, optional) – Number of proposals of each level.
num_level (int, optional) – Number of level.
- Returns:
Usually returns a tuple containing learning targets.
label (Tensor): classification target of each anchor after
paa assign, with shape (num_anchors,) - label_weight (Tensor): Classification loss weight of each anchor after paa assign, with shape (num_anchors). - convex_weight (Tensor): Bbox weight of each anchor with shape (num_anchors, 4). - pos_normalize_term (list): pos normalize term for refine points losses.
- Return type:
tuple
- class mmrotate.models.dense_heads.H2RBoxHead(num_classes: int, in_channels: int, angle_version: str = 'le90', use_hbbox_loss: bool = False, scale_angle: bool = True, angle_coder: ConfigDict | dict = {'type': 'PseudoAngleCoder'}, h_bbox_coder: ConfigDict | dict = {'type': 'mmdet.DistancePointBBoxCoder'}, bbox_coder: ConfigDict | dict = {'type': 'DistanceAnglePointCoder'}, loss_cls: ConfigDict | dict = {'alpha': 0.25, 'gamma': 2.0, 'loss_weight': 1.0, 'type': 'mmdet.FocalLoss', 'use_sigmoid': True}, loss_bbox: ConfigDict | dict = {'loss_weight': 1.0, 'type': 'RotatedIoULoss'}, loss_centerness: ConfigDict | dict = {'loss_weight': 1.0, 'type': 'mmdet.CrossEntropyLoss', 'use_sigmoid': True}, loss_angle: ConfigDict | dict | None = None, loss_bbox_ss: ConfigDict | dict = {'loss_weight': 1.0, 'type': 'mmdet.IoULoss'}, rotation_agnostic_classes: list | None = None, weak_supervised: bool = True, square_classes: list | None = None, crop_size: Tuple[int, int] = (768, 768), **kwargs)[source]¶
Anchor-free head used in H2RBox.
- Parameters:
num_classes (int) – Number of categories excluding the background category.
in_channels (int) – Number of channels in the input feature map.
angle_version (str) – Angle representations. Defaults to ‘le90’.
use_hbbox_loss (bool) – If true, use horizontal bbox loss and loss_angle should not be None. Defaults to False.
scale_angle (bool) – If true, add scale to angle pred branch. Defaults to True.
angle_coder (
ConfigDictor dict) – Config of angle coder.h_bbox_coder (dict) – Config of horzional bbox coder, only used when use_hbbox_loss is True.
bbox_coder (
ConfigDictor dict) – Config of bbox coder. Defaults to ‘DistanceAnglePointCoder’.loss_cls (
ConfigDictor dict) – Config of classification loss.loss_bbox (
ConfigDictor dict) – Config of localization loss.loss_centerness (
ConfigDict, or dict) – Config of centerness loss.loss_angle (
ConfigDictor dict, Optional) – Config of angle loss.loss_bbox_ss (
ConfigDictor dict) – Config of consistency loss.rotation_agnostic_classes (list) – Ids of rotation agnostic category.
weak_supervised (bool) – If true, horizontal gtbox is input. Defaults to True.
square_classes (list) – Ids of the square category.
crop_size (tuple[int]) – Crop size from image center. Defaults to (768, 768).
Example
>>> self = H2RBoxHead(11, 7) >>> feats = [torch.rand(1, 7, s, s) for s in [4, 8, 16, 32, 64]] >>> cls_score, bbox_pred, angle_pred, centerness = self.forward(feats) >>> assert len(cls_score) == len(self.scales)
- forward_ss(feats: Tuple[Tensor]) Tuple[List[Tensor], List[Tensor]][source]¶
Forward features from the upstream network. :param feats: Features from the upstream network, each is
a 4D-tensor.
- Returns:
A tuple of each level outputs.
bbox_pred (list[Tensor]): Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_points * 4.
angle_pred (list[Tensor]): Box angle for each scale level, each is a 4D-tensor, the channel number is num_points * 1.
- Return type:
tuple
- forward_ss_single(feats: Tensor, scale: Scale, stride: int) Tuple[Tensor, Tensor][source]¶
Forward features of a single scale level in SS branch.
- Parameters:
feats (Tensor) – FPN feature maps of the specified stride.
( (scale) – obj: mmcv.cnn.Scale): Learnable scale module to resize the bbox prediction.
stride (int) – The corresponding stride for feature maps, only used to normalize the bbox prediction when self.norm_on_bbox is True.
- Returns:
- bbox predictions and angle predictions of input
feature maps.
- Return type:
tuple
- loss(x_ws: Tuple[Tensor], x_ss: Tuple[Tensor], rot: float, batch_gt_instances: InstanceData, batch_gt_instances_ignore: InstanceData, batch_img_metas: List[dict]) dict[source]¶
Perform forward propagation and loss calculation of the detection head on the features of the upstream network.
- Parameters:
x_ws (tuple[Tensor]) – Features from the weakly supervised network, each is a 4D-tensor.
x_ss (tuple[Tensor]) – Features from the self-supervised network, each is a 4D-tensor.
rot (float) – Angle of view rotation.
batch_gt_instances (list[
InstanceData]) – Batch of gt_instance. It usually includesbboxesandlabelsattributes.batch_gt_instances_ignore (list[
batch_gt_instances_ignore]) – Batch of gt_instances_ignore. It includesbboxesattribute data that is ignored during training and testing. Defaults to None.batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
- Returns:
A dictionary of loss components.
- Return type:
dict
- loss_by_feat(cls_scores: List[Tensor], bbox_preds: List[Tensor], angle_preds: List[Tensor], centernesses: List[Tensor], bbox_preds_ss: List[Tensor], angle_preds_ss: List[Tensor], rot: float, batch_gt_instances: List[InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: List[InstanceData] | None = None) Dict[str, Tensor][source]¶
Calculate the loss based on the features extracted by the detection head.
- Parameters:
cls_scores (list[Tensor]) – Box scores for each scale level in weakly supervised barch, each is a 4D-tensor, the channel number is num_points * num_classes.
bbox_preds (list[Tensor]) – Box energies / deltas for each scale level in weakly supervised barch, each is a 4D-tensor, the channel number is num_points * 4.
angle_preds (list[Tensor]) – Box angle for each scale level in weakly supervised barch, each is a 4D-tensor, the channel number is num_points * encode_size.
centernesses (list[Tensor]) – centerness for each scale level in weakly supervised barch, each is a 4D-tensor, the channel number is num_points * 1.
bbox_preds_ss (list[Tensor]) – Box energies / deltas for each scale level in self-supervised barch, each is a 4D-tensor, the channel number is num_points * 4.
angle_preds_ss (list[Tensor]) – Box angle for each scale level in self-supervised barch, each is a 4D-tensor, the channel number is num_points * encode_size.
rot (float) – Angle of view rotation.
batch_gt_instances (list[
InstanceData]) – Batch of gt_instance. It usually includesbboxesandlabelsattributes.batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[
InstanceData], Optional) – Batch of gt_instances_ignore. It includesbboxesattribute data that is ignored during training and testing. Defaults to None.
- Returns:
A dictionary of loss components.
- Return type:
dict[str, Tensor]
- predict_by_feat(cls_scores: List[Tensor], bbox_preds: List[Tensor], angle_preds: List[Tensor], score_factors: List[Tensor] | None = None, batch_img_metas: List[dict] | None = None, cfg: ConfigDict | None = None, rescale: bool = False, with_nms: bool = True) List[InstanceData][source]¶
Transform a batch of output features extracted from the head into bbox results. Note: When score_factors is not None, the cls_scores are usually multiplied by it then obtain the real score used in NMS, such as CenterNess in FCOS, IoU branch in ATSS. :param cls_scores: Classification scores for all
scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * num_classes, H, W).
- Parameters:
bbox_preds (list[Tensor]) – Box energies / deltas for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * 4, H, W).
angle_preds (list[Tensor]) – Box angle for each scale level with shape (N, num_points * encode_size, H, W)
score_factors (list[Tensor], optional) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, num_priors * 1, H, W). Defaults to None.
batch_img_metas (list[dict], Optional) – Batch image meta info. Defaults to None.
cfg (ConfigDict, optional) – Test / postprocessing configuration, if None, test_cfg would be used. Defaults to None.
rescale (bool) – If True, return boxes in original image space. Defaults to False.
with_nms (bool) – If True, do nms before return boxes. Defaults to True.
- Returns:
Object detection results of each image after the post process. Each item usually contains following keys.
scores (Tensor): Classification scores, has a shape (num_instance, )
labels (Tensor): Labels of bboxes, has a shape (num_instances, ).
bboxes (Tensor): Has a shape (num_instances, 5), the last dimension 5 arrange as (x, y, w, h, t).
- Return type:
list[
InstanceData]
- class mmrotate.models.dense_heads.H2RBoxV2Head(num_classes: int, in_channels: int, angle_version: str = 'le90', use_hbbox_loss: bool = False, scale_angle: bool = False, angle_coder: ConfigDict | dict = {'type': 'PseudoAngleCoder'}, h_bbox_coder: ConfigDict | dict = {'type': 'mmdet.DistancePointBBoxCoder'}, bbox_coder: ConfigDict | dict = {'type': 'DistanceAnglePointCoder'}, loss_cls: ConfigDict | dict = {'alpha': 0.25, 'gamma': 2.0, 'loss_weight': 1.0, 'type': 'mmdet.FocalLoss', 'use_sigmoid': True}, loss_bbox: ConfigDict | dict = {'loss_weight': 1.0, 'type': 'RotatedIoULoss'}, loss_centerness: ConfigDict | dict = {'loss_weight': 1.0, 'type': 'mmdet.CrossEntropyLoss', 'use_sigmoid': True}, loss_angle: ConfigDict | dict | None = None, loss_symmetry_ss: ConfigDict | dict = {'type': 'H2RBoxV2ConsistencyLoss'}, rotation_agnostic_classes: list | None = None, agnostic_resize_classes: list | None = None, use_circumiou_loss=True, use_standalone_angle=True, use_reweighted_loss_bbox=False, **kwargs)[source]¶
Anchor-free head used in `H2RBox-v2 <https://arxiv.org/abs/2304.04403`_.
- Parameters:
num_classes (int) – Number of categories excluding the background category.
in_channels (int) – Number of channels in the input feature map.
angle_version (str) – Angle representations. Defaults to ‘le90’.
use_hbbox_loss (bool) – If true, use horizontal bbox loss and loss_angle should not be None. Defaults to False.
scale_angle (bool) – If true, add scale to angle pred branch. Defaults to True.
angle_coder (
ConfigDictor dict) – Config of angle coder.h_bbox_coder (dict) – Config of horzional bbox coder, only used when use_hbbox_loss is True.
bbox_coder (
ConfigDictor dict) – Config of bbox coder. Defaults to ‘DistanceAnglePointCoder’.loss_cls (
ConfigDictor dict) – Config of classification loss.loss_bbox (
ConfigDictor dict) – Config of localization loss.loss_centerness (
ConfigDict, or dict) – Config of centerness loss.loss_angle (
ConfigDictor dict, Optional) – Config of angle loss.loss_bbox_ss (
ConfigDictor dict) – Config of consistency loss.rotation_agnostic_classes (list) – Ids of rotation agnostic category.
weak_supervised (bool) – If true, horizontal gtbox is input. Defaults to True.
square_classes (list) – Ids of the square category.
crop_size (tuple[int]) – Crop size from image center. Defaults to (768, 768).
Example
>>> self = H2RBoxHead(11, 7) >>> feats = [torch.rand(1, 7, s, s) for s in [4, 8, 16, 32, 64]] >>> cls_score, bbox_pred, angle_pred, centerness = self.forward(feats) >>> assert len(cls_score) == len(self.scales)
- get_targets(points: List[Tensor], batch_gt_instances: List[InstanceData]) Tuple[List[Tensor], List[Tensor], List[Tensor]][source]¶
Compute regression, classification and centerness targets for points in multiple images. :param points: Points of each fpn level, each has shape
(num_points, 2).
- Parameters:
batch_gt_instances (list[
InstanceData]) – Batch of gt_instance. It usually includesbboxesandlabelsattributes.- Returns:
Targets of each level. - concat_lvl_labels (list[Tensor]): Labels of each level. - concat_lvl_bbox_targets (list[Tensor]): BBox targets of each level. - concat_lvl_angle_targets (list[Tensor]): Angle targets of each level.
- Return type:
tuple
- loss_by_feat(cls_scores: List[Tensor], bbox_preds: List[Tensor], angle_preds: List[Tensor], centernesses: List[Tensor], batch_gt_instances: List[InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: List[InstanceData] | None = None) Dict[str, Tensor][source]¶
Calculate the loss based on the features extracted by the detection head.
- Parameters:
cls_scores (list[Tensor]) – Box scores for each scale level in weakly supervised barch, each is a 4D-tensor, the channel number is num_points * num_classes.
bbox_preds (list[Tensor]) – Box energies / deltas for each scale level in weakly supervised barch, each is a 4D-tensor, the channel number is num_points * 4.
angle_preds (list[Tensor]) – Box angle for each scale level in weakly supervised barch, each is a 4D-tensor, the channel number is num_points * encode_size.
centernesses (list[Tensor]) – centerness for each scale level in weakly supervised barch, each is a 4D-tensor, the channel number is num_points * 1.
bbox_preds_ss (list[Tensor]) – Box energies / deltas for each scale level in self-supervised barch, each is a 4D-tensor, the channel number is num_points * 4.
angle_preds_ss (list[Tensor]) – Box angle for each scale level in self-supervised barch, each is a 4D-tensor, the channel number is num_points * encode_size.
rot (float) – Angle of view rotation.
batch_gt_instances (list[
InstanceData]) – Batch of gt_instance. It usually includesbboxesandlabelsattributes.batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[
InstanceData], Optional) – Batch of gt_instances_ignore. It includesbboxesattribute data that is ignored during training and testing. Defaults to None.
- Returns:
A dictionary of loss components.
- Return type:
dict[str, Tensor]
- class mmrotate.models.dense_heads.OrientedRPNHead(in_channels: int, num_classes: int = 1, init_cfg: ConfigDict | dict | List[ConfigDict | dict] = {'layer': 'Conv2d', 'std': 0.01, 'type': 'Normal'}, num_convs: int = 1, **kwargs)[source]¶
Oriented RPN head for Oriented R-CNN.
- class mmrotate.models.dense_heads.OrientedRepPointsHead(*args, loss_spatial_init: ConfigDict | dict = {'loss_weight': 0.05, 'type': 'SpatialBorderLoss'}, loss_spatial_refine: ConfigDict | dict = {'loss_weight': 0.1, 'type': 'SpatialBorderLoss'}, top_ratio: float = 0.4, init_qua_weight: float = 0.2, ori_qua_weight: float = 0.3, poc_qua_weight: float = 0.1, **kwargs)[source]¶
Oriented RepPoints head -<https://arxiv.org/pdf/2105.11111v4.pdf>. The head contains initial and refined stages based on RepPoints. The initial stage regresses coarse point sets, and the refine stage further regresses the fine point sets. The APAA scheme based on the quality of point set samples in the paper is employed in refined stage.
- Parameters:
loss_spatial_init (
ConfigDictor dict) – Config of initial spatial loss.loss_spatial_refine (
ConfigDictor dict) – Config of refine spatial loss.top_ratio (float) – Ratio of top high-quality point sets. Defaults to 0.4.
init_qua_weight (float) – Quality weight of initial stage. Defaults to 0.2.
ori_qua_weight (float) – Orientation quality weight. Defaults to 0.3.
poc_qua_weight (float) – Point-wise correlation quality weight. Defaults to 0.1.
- dynamic_pointset_samples_selection(quality: Tensor, label: Tensor, label_weight: Tensor, bbox_weight: Tensor, pos_inds: Tensor, pos_gt_inds: Tensor, num_proposals_each_level: List[int] | None = None, num_level: int | None = None) tuple[source]¶
The dynamic top k selection of point set samples based on the quality assessment values.
- Parameters:
quality (Tensor) – the quality values of positive point set samples
label (Tensor) – gt label with shape (N)
label_weight (Tensor) – label weight with shape (N)
bbox_weight (Tensor) – box weight with shape (N)
pos_inds (Tensor) – the inds of positive point set samples
pos_gt_inds (Tensor) – the inds of positive ground truth
num_proposals_each_level (list[int]) – proposals number of each level
num_level (int) – the level number
- Returns:
label: gt label with shape (N)
label_weight: label weight with shape (N)
bbox_weight: box weight with shape (N)
num_pos (int): the number of selected positive point samples with high-quality
pos_normalize_term (Tensor): the corresponding positive normalize term
- Return type:
tuple
- feature_cosine_similarity(points_features: Tensor) Tensor[source]¶
Compute the points features similarity for points-wise correlation.
- Parameters:
points_features (Tensor) – sampling point feature with shape (N_pointsets, N_points, C)
- Returns:
max feature similarity in each point set with shape (N_points_set, N_points, C)
- Return type:
max_correlation (Tensor)
- get_adaptive_points_feature(features: Tensor, pt_locations: Tensor, stride: int) Tensor[source]¶
Get the points features from the locations of predicted points.
- Parameters:
features (Tensor) – base feature with shape (B,C,W,H)
pt_locations (Tensor) – locations of points in each point set with shape (B, N_points_set(number of point set), N_points(number of points in each point set) *2)
stride (int) – points strdie
- Returns:
sampling features with (B, C, N_points_set, N_points)
- Return type:
Tensor
- get_targets(proposals_list: List[Tensor], valid_flag_list: List[Tensor], batch_gt_instances: List[InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: List[InstanceData] | None = None, stage: str = 'init', unmap_outputs: bool = True) tuple[source]¶
Compute corresponding GT box and classification targets for proposals.
- Parameters:
proposals_list (list[Tensor]) – Multi level points/bboxes of each image.
valid_flag_list (list[Tensor]) – Multi level valid flags of each image.
batch_gt_instances (list[
InstanceData]) – Batch of gt_instance. It usually includesbboxesandlabelsattributes.batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[
InstanceData], optional) – Batch of gt_instances_ignore. It includesbboxesattribute data that is ignored during training and testing. Defaults to None.stage (str) – ‘init’ or ‘refine’. Generate target for init stage or refine stage.
unmap_outputs (bool) – Whether to map outputs back to the original set of anchors.
- Returns:
labels_list (list[Tensor]): Labels of each level.
label_weights_list (list[Tensor]): Label weights of each level.
bbox_gt_list (list[Tensor]): Ground truth bbox of each level.
proposals_list (list[Tensor]): Proposals(points/bboxes) of each level.
proposal_weights_list (list[Tensor]): Proposal weights of each level.
avg_factor (int): Average factor that is used to average the loss. When using sampling method, avg_factor is usually the sum of positive and negative priors. When using PseudoSampler, avg_factor is usually equal to the number of positive priors.
- Return type:
tuple
- init_loss_single(pts_pred_init: Tensor, bbox_gt_init: Tensor, bbox_weights_init: Tensor, stride: int) Tuple[Tensor, Tensor][source]¶
Single initial stage loss function.
- Parameters:
pts_pred_init (Tensor) – Initial point sets prediction with shape (N, 9*2)
bbox_gt_init (Tensor) – BBox regression targets in the init stage of shape (batch_size, h_i * w_i, 8).
bbox_weights_init (Tensor) – BBox regression loss weights in the init stage of shape (batch_size, h_i * w_i, 8).
stride (int) – Point stride.
- Returns:
loss_pts_init (Tensor): Initial bbox loss.
loss_border_init (Tensor): Initial spatial border loss.
- Return type:
tuple
- loss_by_feat(cls_scores: List[Tensor], pts_preds_init: List[Tensor], pts_preds_refine: List[Tensor], base_feat: List[Tensor], batch_gt_instances: List[InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: List[InstanceData] | None = None) Dict[str, Tensor][source]¶
Calculate the loss based on the features extracted by the detection head.
- Parameters:
cls_scores (list[Tensor]) – Box scores for each scale level, each is a 4D-tensor, of shape (batch_size, num_classes, h, w).
pts_preds_init (list[Tensor]) – Points for each scale level, each is a 3D-tensor, of shape (batch_size, h_i * w_i, num_points * 2).
pts_preds_refine (list[Tensor]) – Points refined for each scale level, each is a 3D-tensor, of shape (batch_size, h_i * w_i, num_points * 2).
batch_gt_instances (list[
InstanceData]) – Batch of gt_instance. It usually includesbboxesandlabelsattributes.batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[
InstanceData], Optional) – Batch of gt_instances_ignore. It includesbboxesattribute data that is ignored during training and testing. Defaults to None.
- Returns:
A dictionary of loss components.
- Return type:
dict[str, Tensor]
- loss_by_feat_single(cls_score: Tensor, pts_pred_init: Tensor, pts_pred_refine: Tensor, labels: Tensor, label_weights, bbox_gt_init: Tensor, bbox_weights_init: Tensor, bbox_gt_refine: Tensor, bbox_weights_refine: Tensor, stride: int, avg_factor_init: int, avg_factor_refine: int) Tuple[Tensor][source]¶
Calculate the loss of a single scale level based on the features extracted by the detection head.
- Parameters:
cls_score (Tensor) – Box scores for each scale level Has shape (N, num_classes, h_i, w_i).
pts_pred_init (Tensor) – Points of shape (batch_size, h_i * w_i, num_points * 2).
pts_pred_refine (Tensor) – Points refined of shape (batch_size, h_i * w_i, num_points * 2).
labels (Tensor) – Ground truth class indices with shape (batch_size, h_i * w_i).
label_weights (Tensor) – Label weights of shape (batch_size, h_i * w_i).
bbox_gt_init (Tensor) – BBox regression targets in the init stage of shape (batch_size, h_i * w_i, 8).
bbox_weights_init (Tensor) – BBox regression loss weights in the init stage of shape (batch_size, h_i * w_i, 8).
bbox_gt_refine (Tensor) – BBox regression targets in the refine stage of shape (batch_size, h_i * w_i, 8).
bbox_weights_refine (Tensor) – BBox regression loss weights in the refine stage of shape (batch_size, h_i * w_i, 8).
stride (int) – Point stride.
avg_factor_init (int) – Average factor that is used to average the loss in the init stage.
avg_factor_refine (int) – Average factor that is used to average the loss in the refine stage.
- Returns:
loss components.
- Return type:
Tuple[Tensor]
- pointsets_quality_assessment(pts_features: Tensor, cls_score: Tensor, pts_pred_init: Tensor, pts_pred_refine: Tensor, label: Tensor, bbox_gt: Tensor, label_weight: Tensor, bbox_weight: Tensor, pos_inds: Tensor) Tensor[source]¶
Assess the quality of each point set from the classification, localization, orientation, and point-wise correlation based on the assigned point sets samples.
- Parameters:
pts_features (Tensor) – points features with shape (N, 9, C)
cls_score (Tensor) – classification scores with shape (N, class_num)
pts_pred_init (Tensor) – initial point sets prediction with shape (N, 9*2)
pts_pred_refine (Tensor) – refined point sets prediction with shape (N, 9*2)
label (Tensor) – gt label with shape (N)
bbox_gt (Tensor) – gt bbox of polygon with shape (N, 8)
label_weight (Tensor) – label weight with shape (N)
bbox_weight (Tensor) – box weight with shape (N)
pos_inds (Tensor) – the inds of positive point set samples
- Returns:
weighted quality values for positive point set samples.
- Return type:
qua (Tensor)
- sampling_points(polygons: Tensor, points_num: int, device: str) Tensor[source]¶
Sample edge points for polygon.
- Parameters:
polygons (Tensor) – polygons with shape (N, 8)
points_num (int) – number of sampling points for each polygon edge. 10 by default.
device (str) – The device the tensor will be put on. Defaults to
cuda.
- Returns:
sampling points with shape (N, points_num*4, 2)
- Return type:
sampling_points (Tensor)
- class mmrotate.models.dense_heads.R3Head(*args, loss_bbox_type: str = 'normal', **kwargs)[source]¶
An anchor-based head used in R3Det.
- filter_bboxes(cls_scores: List[Tensor], bbox_preds: List[Tensor]) List[List[Tensor]][source]¶
Filter predicted bounding boxes at each position of the feature maps. Only one bounding boxes with highest score will be left at each position. This filter will be used in R3Det prior to the first feature refinement stage.
- Parameters:
cls_scores (list[Tensor]) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W)
bbox_preds (list[Tensor]) – Box energies / deltas for each scale level with shape (N, num_anchors * 5, H, W)
- Returns:
best or refined rbboxes of each level of each image.
- Return type:
list[list[Tensor]]
- class mmrotate.models.dense_heads.R3RefineHead(num_classes: int, in_channels: int, frm_cfg: dict | None = None, **kwargs)[source]¶
An anchor-based head used in R3Det.
- Parameters:
num_classes (int) – Number of categories excluding the background category.
in_channels (int) – Number of channels in the input feature map.
frm_cfg (dict) – Config of the feature refine module.
- feature_refine(x: List[Tensor], rois: List[List[Tensor]]) List[Tensor][source]¶
Refine the input feature use feature refine module.
- Parameters:
x (list[Tensor]) – feature maps of multiple scales.
rois (list[list[Tensor]]) – input rbboxes of multiple scales of multiple images, output by former stages and are to be refined.
- Returns:
refined feature maps of multiple scales.
- Return type:
list[Tensor]
- get_anchors(featmap_sizes: List[tuple], batch_img_metas: List[dict], device: device | str = 'cuda') Tuple[List[List[Tensor]], List[List[Tensor]]][source]¶
Get anchors according to feature map sizes.
- Parameters:
featmap_sizes (list[tuple]) – Multi-level feature map sizes.
batch_img_metas (list[dict]) – Image meta info.
device (torch.device | str) – Device for returned tensors. Defaults to cuda.
- Returns:
anchor_list (list[list[Tensor]]): Anchors of each image.
valid_flag_list (list[list[Tensor]]): Valid flags of each image.
- Return type:
tuple
- loss_by_feat(cls_scores: List[Tensor], bbox_preds: List[Tensor], batch_gt_instances: List[InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: List[InstanceData] | None = None, rois: List[Tensor] | None = None) dict[source]¶
Calculate the loss based on the features extracted by the detection head.
- Parameters:
cls_scores (list[Tensor]) – Box scores for each scale level has shape (N, num_anchors * num_classes, H, W).
bbox_preds (list[Tensor]) – Box energies / deltas for each scale level with shape (N, num_anchors * 4, H, W).
batch_gt_instances (list[
InstanceData]) – Batch of gt_instance. It usually includesbboxesandlabelsattributes.batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[
InstanceData], optional) – Batch of gt_instances_ignore. It includesbboxesattribute data that is ignored during training and testing. Defaults to None.rois (list[Tensor])
- Returns:
A dictionary of loss components.
- Return type:
dict
- predict_by_feat(cls_scores: List[Tensor], bbox_preds: List[Tensor], score_factors: List[Tensor] | None = None, rois: List[Tensor] | None = None, batch_img_metas: List[dict] | None = None, cfg: ConfigDict | None = None, rescale: bool = False, with_nms: bool = True) List[InstanceData][source]¶
Transform a batch of output features extracted from the head into bbox results.
Note: When score_factors is not None, the cls_scores are usually multiplied by it then obtain the real score used in NMS, such as CenterNess in FCOS, IoU branch in ATSS.
- Parameters:
cls_scores (list[Tensor]) – Classification scores for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * num_classes, H, W).
bbox_preds (list[Tensor]) – Box energies / deltas for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * 4, H, W).
score_factors (list[Tensor], optional) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, num_priors * 1, H, W). Defaults to None.
rois (list[Tensor])
batch_img_metas (list[dict], Optional) – Batch image meta info. Defaults to None.
cfg (ConfigDict, optional) – Test / postprocessing configuration, if None, test_cfg would be used. Defaults to None.
rescale (bool) – If True, return boxes in original image space. Defaults to False.
with_nms (bool) – If True, do nms before return boxes. Defaults to True.
- Returns:
Object detection results of each image after the post process. Each item usually contains following keys.
scores (Tensor): Classification scores, has a shape (num_instance, )
labels (Tensor): Labels of bboxes, has a shape (num_instances, ).
bboxes (Tensor): Has a shape (num_instances, 4), the last dimension 4 arrange as (x1, y1, x2, y2).
- Return type:
list[
InstanceData]
- refine_bboxes(cls_scores: List[Tensor], bbox_preds: List[Tensor], rois: List[List[Tensor]]) List[List[Tensor]][source]¶
Refine predicted bounding boxes at each position of the feature maps. This method will be used in R3Det in refinement stages.
- Parameters:
cls_scores (list[Tensor]) – Box scores for each scale level Has shape (N, num_classes, H, W)
bbox_preds (list[Tensor]) – Box energies / deltas for each scale level with shape (N, 5, H, W)
rois (list[list[Tensor]]) – input rbboxes of each level of each image. rois output by former stages and are to be refined
- Returns:
best or refined rbboxes of each level of each image.
- Return type:
list[list[Tensor]]
- class mmrotate.models.dense_heads.RotatedATSSHead(num_classes: int, in_channels: int, pred_kernel_size: int = 3, stacked_convs: int = 4, conv_cfg: ConfigDict | dict | None = None, norm_cfg: ConfigDict | dict = {'num_groups': 32, 'requires_grad': True, 'type': 'GN'}, reg_decoded_bbox: bool = True, loss_centerness: ConfigDict | dict = {'loss_weight': 1.0, 'type': 'CrossEntropyLoss', 'use_sigmoid': True}, init_cfg: ConfigDict | dict | List[ConfigDict | dict] = {'layer': 'Conv2d', 'override': {'bias_prob': 0.01, 'name': 'atss_cls', 'std': 0.01, 'type': 'Normal'}, 'std': 0.01, 'type': 'Normal'}, **kwargs)[source]¶
Detection Head of ATSS.
ATSS head structure is similar with FCOS, however ATSS use anchor boxes and assign label by Adaptive Training Sample Selection instead max-iou. :param num_classes: Number of categories excluding the background
category.
- Parameters:
in_channels (int) – Number of channels in the input feature map.
pred_kernel_size (int) – Kernel size of
nn.Conv2dstacked_convs (int) – Number of stacking convs of the head.
conv_cfg (
ConfigDictor dict, optional) – Config dict for convolution layer. Defaults to None.norm_cfg (
ConfigDictor dict) – Config dict for normalization layer. Defaults todict(type='GN', num_groups=32, requires_grad=True).reg_decoded_bbox (bool) – If true, the regression loss would be applied directly on decoded bounding boxes, converting both the predicted boxes and regression targets to absolute coordinates format. Defaults to False. It should be True when using IoULoss, GIoULoss, or DIoULoss in the bbox head.
loss_centerness (
ConfigDictor dict) – Config of centerness loss. Defaults todict(type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0).
:param init_cfg (
ConfigDictor dict or list[dict] or: list[ConfigDict]): Initialization config dict.- centerness_target(anchors: Tensor, gts: Tensor) Tensor[source]¶
Calculate the centerness between anchors and gts.
Only calculate pos centerness targets, otherwise there may be nan.
- Parameters:
anchors (Tensor) – Anchors with shape (N, 5), <cx, cy, w, h, t> format.
gts (Tensor) – Ground truth bboxes with shape (N, 5), <cx, cy, w, h, t> format.
- Returns:
Centerness between anchors and gts.
- Return type:
Tensor
- get_targets(anchor_list: List[List[Tensor]], valid_flag_list: List[List[Tensor]], batch_gt_instances: List[InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: List[InstanceData] | None = None, unmap_outputs: bool = True) tuple[source]¶
Get targets for ATSS head.
This method is almost the same as AnchorHead.get_targets(). Besides returning the targets as the parent method does, it also returns the anchors as the first element of the returned tuple.
- loss_by_feat_single(anchors: Tensor, cls_score: Tensor, bbox_pred: Tensor, centerness: Tensor, labels: Tensor, label_weights: Tensor, bbox_targets: Tensor, avg_factor: float) dict[source]¶
Calculate the loss of a single scale level based on the features extracted by the detection head. :param cls_score: Box scores for each scale level
Has shape (N, num_anchors * num_classes, H, W).
- Parameters:
bbox_pred (Tensor) – Box energies / deltas for each scale level with shape (N, num_anchors * 4, H, W).
anchors (Tensor) – Box reference for each scale level with shape (N, num_total_anchors, 4).
labels (Tensor) – Labels of each anchors with shape (N, num_total_anchors).
label_weights (Tensor) – Label weights of each anchor with shape (N, num_total_anchors)
bbox_targets (Tensor) – BBox regression targets of each anchor weight shape (N, num_total_anchors, 4).
avg_factor (float) – Average factor that is used to average the loss. When using sampling method, avg_factor is usually the sum of positive and negative priors. When using PseudoSampler, avg_factor is usually equal to the number of positive priors.
- Returns:
A dictionary of loss components.
- Return type:
dict[str, Tensor]
- class mmrotate.models.dense_heads.RotatedFCOSHead(num_classes: int, in_channels: int, angle_version: str = 'le90', use_hbbox_loss: bool = False, scale_angle: bool = True, angle_coder: ConfigDict | dict = {'type': 'PseudoAngleCoder'}, h_bbox_coder: ConfigDict | dict = {'type': 'mmdet.DistancePointBBoxCoder'}, bbox_coder: ConfigDict | dict = {'type': 'DistanceAnglePointCoder'}, loss_cls: ConfigDict | dict = {'alpha': 0.25, 'gamma': 2.0, 'loss_weight': 1.0, 'type': 'mmdet.FocalLoss', 'use_sigmoid': True}, loss_bbox: ConfigDict | dict = {'loss_weight': 1.0, 'type': 'RotatedIoULoss'}, loss_centerness: ConfigDict | dict = {'loss_weight': 1.0, 'type': 'mmdet.CrossEntropyLoss', 'use_sigmoid': True}, loss_angle: ConfigDict | dict | None = None, **kwargs)[source]¶
Anchor-free head used in FCOS.
Compared with FCOS head, Rotated FCOS head add a angle branch to support rotated object detection.
- Parameters:
num_classes (int) – Number of categories excluding the background category.
in_channels (int) – Number of channels in the input feature map.
angle_version (str) – Angle representations. Defaults to ‘le90’.
use_hbbox_loss (bool) – If true, use horizontal bbox loss and loss_angle should not be None. Default to False.
scale_angle (bool) – If true, add scale to angle pred branch. Default to True.
angle_coder (
ConfigDictor dict) – Config of angle coder.h_bbox_coder (dict) – Config of horzional bbox coder, only used when use_hbbox_loss is True.
bbox_coder (
ConfigDictor dict) – Config of bbox coder. Defaults to ‘DistanceAnglePointCoder’.loss_cls (
ConfigDictor dict) – Config of classification loss.loss_bbox (
ConfigDictor dict) – Config of localization loss.loss_centerness (
ConfigDict, or dict) – Config of centerness loss.loss_angle (
ConfigDictor dict, Optional) – Config of angle loss.
Example
>>> self = RotatedFCOSHead(11, 7) >>> feats = [torch.rand(1, 7, s, s) for s in [4, 8, 16, 32, 64]] >>> cls_score, bbox_pred, angle_pred, centerness = self.forward(feats) >>> assert len(cls_score) == len(self.scales)
- forward_single(x: Tensor, scale: Scale, stride: int) Tuple[Tensor, Tensor, Tensor, Tensor][source]¶
Forward features of a single scale level.
- Parameters:
x (Tensor) – FPN feature maps of the specified stride.
( (scale) – obj: mmcv.cnn.Scale): Learnable scale module to resize the bbox prediction.
stride (int) – The corresponding stride for feature maps, only used to normalize the bbox prediction when self.norm_on_bbox is True.
- Returns:
scores for each class, bbox predictions, angle predictions and centerness predictions of input feature maps.
- Return type:
tuple
- get_targets(points: List[Tensor], batch_gt_instances: List[InstanceData]) Tuple[List[Tensor], List[Tensor], List[Tensor]][source]¶
Compute regression, classification and centerness targets for points in multiple images. :param points: Points of each fpn level, each has shape
(num_points, 2).
- Parameters:
batch_gt_instances (list[
InstanceData]) – Batch of gt_instance. It usually includesbboxesandlabelsattributes.- Returns:
Targets of each level. - concat_lvl_labels (list[Tensor]): Labels of each level. - concat_lvl_bbox_targets (list[Tensor]): BBox targets of each level. - concat_lvl_angle_targets (list[Tensor]): Angle targets of each level.
- Return type:
tuple
- loss_by_feat(cls_scores: List[Tensor], bbox_preds: List[Tensor], angle_preds: List[Tensor], centernesses: List[Tensor], batch_gt_instances: List[InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: List[InstanceData] | None = None) Dict[str, Tensor][source]¶
Calculate the loss based on the features extracted by the detection head.
- Parameters:
cls_scores (list[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_points * num_classes.
bbox_preds (list[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_points * 4.
angle_preds (list[Tensor]) – Box angle for each scale level, each is a 4D-tensor, the channel number is num_points * encode_size.
centernesses (list[Tensor]) – centerness for each scale level, each is a 4D-tensor, the channel number is num_points * 1.
batch_gt_instances (list[
InstanceData]) – Batch of gt_instance. It usually includesbboxesandlabelsattributes.batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[
InstanceData], Optional) – Batch of gt_instances_ignore. It includesbboxesattribute data that is ignored during training and testing. Defaults to None.
- Returns:
A dictionary of loss components.
- Return type:
dict[str, Tensor]
- predict_by_feat(cls_scores: List[Tensor], bbox_preds: List[Tensor], angle_preds: List[Tensor], score_factors: List[Tensor] | None = None, batch_img_metas: List[dict] | None = None, cfg: ConfigDict | None = None, rescale: bool = False, with_nms: bool = True) List[InstanceData][source]¶
Transform a batch of output features extracted from the head into bbox results. Note: When score_factors is not None, the cls_scores are usually multiplied by it then obtain the real score used in NMS, such as CenterNess in FCOS, IoU branch in ATSS. :param cls_scores: Classification scores for all
scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * num_classes, H, W).
- Parameters:
bbox_preds (list[Tensor]) – Box energies / deltas for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * 4, H, W).
angle_preds (list[Tensor]) – Box angle for each scale level with shape (N, num_points * encode_size, H, W)
score_factors (list[Tensor], optional) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, num_priors * 1, H, W). Defaults to None.
batch_img_metas (list[dict], Optional) – Batch image meta info. Defaults to None.
cfg (ConfigDict, optional) – Test / postprocessing configuration, if None, test_cfg would be used. Defaults to None.
rescale (bool) – If True, return boxes in original image space. Defaults to False.
with_nms (bool) – If True, do nms before return boxes. Defaults to True.
- Returns:
Object detection results of each image after the post process. Each item usually contains following keys.
scores (Tensor): Classification scores, has a shape (num_instance, )
labels (Tensor): Labels of bboxes, has a shape (num_instances, ).
bboxes (Tensor): Has a shape (num_instances, 5), the last dimension 5 arrange as (x, y, w, h, t).
- Return type:
list[
InstanceData]
- class mmrotate.models.dense_heads.RotatedRTMDetHead(num_classes: int, in_channels: int, angle_version: str = 'le90', use_hbbox_loss: bool = False, scale_angle: bool = True, angle_coder: ConfigDict | dict = {'type': 'PseudoAngleCoder'}, loss_angle: ConfigDict | dict | None = None, **kwargs)[source]¶
Detection Head of Rotated RTMDet.
- Parameters:
num_classes (int) – Number of categories excluding the background category.
in_channels (int) – Number of channels in the input feature map.
angle_version (str) – Angle representations. Defaults to ‘le90’.
use_hbbox_loss (bool) – If true, use horizontal bbox loss and loss_angle should not be None. Default to False.
scale_angle (bool) – If true, add scale to angle pred branch. Default to True.
angle_coder (
ConfigDictor dict) – Config of angle coder.loss_angle (
ConfigDictor dict, Optional) – Config of angle loss.
- forward(feats: Tuple[Tensor, ...]) tuple[source]¶
Forward features from the upstream network.
- Parameters:
feats (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
- Returns:
Usually a tuple of classification scores and bbox prediction - cls_scores (list[Tensor]): Classification scores for all scale
levels, each is a 4D-tensor, the channels number is num_base_priors * num_classes.
bbox_preds (list[Tensor]): Box energies / deltas for all scale levels, each is a 4D-tensor, the channels number is num_base_priors * 4.
angle_preds (list[Tensor]): Angle prediction for all scale levels, each is a 4D-tensor, the channels number is num_base_priors * angle_dim.
- Return type:
tuple
- loss_by_feat(cls_scores: List[Tensor], bbox_preds: List[Tensor], angle_preds: List[Tensor], batch_gt_instances: List[InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: List[InstanceData] | None = None)[source]¶
Compute losses of the head.
- Parameters:
cls_scores (list[Tensor]) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W).
bbox_preds (list[Tensor]) – Box predict for each scale level with shape (N, num_anchors * 4, H, W) in [t, b, l, r] format.
bbox_preds – Angle pred for each scale level with shape (N, num_anchors * angle_dim, H, W).
batch_gt_instances (list[
InstanceData]) – Batch of gt_instance. It usually includesbboxesandlabelsattributes.batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[
InstanceData], Optional) – Batch of gt_instances_ignore. It includesbboxesattribute data that is ignored during training and testing. Defaults to None.
- Returns:
A dictionary of loss components.
- Return type:
dict[str, Tensor]
- loss_by_feat_single(cls_score: Tensor, bbox_pred: Tensor, angle_pred: Tensor, labels: Tensor, label_weights: Tensor, bbox_targets: Tensor, assign_metrics: Tensor, stride: List[int])[source]¶
Compute loss of a single scale level.
- Parameters:
cls_score (Tensor) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W).
bbox_pred (Tensor) – Decoded bboxes for each scale level with shape (N, num_anchors * 5, H, W) for rbox loss or (N, num_anchors * 4, H, W) for hbox loss.
angle_pred (Tensor) – Decoded bboxes for each scale level with shape (N, num_anchors * angle_dim, H, W).
labels (Tensor) – Labels of each anchors with shape (N, num_total_anchors).
label_weights (Tensor) – Label weights of each anchor with shape (N, num_total_anchors).
bbox_targets (Tensor) – BBox regression targets of each anchor with shape (N, num_total_anchors, 4).
assign_metrics (Tensor) – Assign metrics with shape (N, num_total_anchors).
stride (List[int]) – Downsample stride of the feature map.
- Returns:
A dictionary of loss components.
- Return type:
dict[str, Tensor]
- predict_by_feat(cls_scores: List[Tensor], bbox_preds: List[Tensor], angle_preds: List[Tensor], score_factors: List[Tensor] | None = None, batch_img_metas: List[dict] | None = None, cfg: ConfigDict | None = None, rescale: bool = False, with_nms: bool = True) List[InstanceData][source]¶
Transform a batch of output features extracted from the head into bbox results. Note: When score_factors is not None, the cls_scores are usually multiplied by it then obtain the real score used in NMS, such as CenterNess in FCOS, IoU branch in ATSS. :param cls_scores: Classification scores for all
scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * num_classes, H, W).
- Parameters:
bbox_preds (list[Tensor]) – Box energies / deltas for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * 4, H, W).
angle_preds (list[Tensor]) – Box angle for each scale level with shape (N, num_points * angle_dim, H, W)
score_factors (list[Tensor], optional) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, num_priors * 1, H, W). Defaults to None.
batch_img_metas (list[dict], Optional) – Batch image meta info. Defaults to None.
cfg (ConfigDict, optional) – Test / postprocessing configuration, if None, test_cfg would be used. Defaults to None.
rescale (bool) – If True, return boxes in original image space. Defaults to False.
with_nms (bool) – If True, do nms before return boxes. Defaults to True.
- Returns:
Object detection results of each image after the post process. Each item usually contains following keys.
scores (Tensor): Classification scores, has a shape (num_instance, )
labels (Tensor): Labels of bboxes, has a shape (num_instances, ).
bboxes (Tensor): Has a shape (num_instances, 5), the last dimension 5 arrange as (x, y, w, h, t).
- Return type:
list[
InstanceData]
- class mmrotate.models.dense_heads.RotatedRTMDetSepBNHead(num_classes: int, in_channels: int, share_conv: bool = True, scale_angle: bool = False, norm_cfg: ConfigDict | dict = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: ConfigDict | dict = {'type': 'SiLU'}, pred_kernel_size: int = 1, exp_on_reg: bool = False, **kwargs)[source]¶
Rotated RTMDetHead with separated BN layers and shared conv layers.
- Parameters:
num_classes (int) – Number of categories excluding the background category.
in_channels (int) – Number of channels in the input feature map.
share_conv (bool) – Whether to share conv layers between stages. Defaults to True.
scale_angle (bool) – Does not support in RotatedRTMDetSepBNHead, Defaults to False.
norm_cfg (
ConfigDictor dict)) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).act_cfg (
ConfigDictor dict)) – Config dict for activation layer. Defaults to dict(type=’SiLU’).pred_kernel_size (int) – Kernel size of prediction layer. Defaults to 1.
exp_on_reg (bool) – Whether to apply exponential on bbox_pred. Defaults to False.
- forward(feats: Tuple[Tensor, ...]) tuple[source]¶
Forward features from the upstream network.
- Parameters:
feats (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
- Returns:
Usually a tuple of classification scores and bbox prediction - cls_scores (list[Tensor]): Classification scores for all scale
levels, each is a 4D-tensor, the channels number is num_base_priors * num_classes.
bbox_preds (list[Tensor]): Box energies / deltas for all scale levels, each is a 4D-tensor, the channels number is num_base_priors * 4.
angle_preds (list[Tensor]): Angle prediction for all scale levels, each is a 4D-tensor, the channels number is num_base_priors * angle_dim.
- Return type:
tuple
- class mmrotate.models.dense_heads.RotatedRepPointsHead(*args, **kwargs)[source]¶
RotatedRepPoint head.
- Parameters:
num_classes (int) – Number of categories excluding the background category.
in_channels (int) – Number of channels in the input feature map.
point_feat_channels (int) – Number of channels of points features.
num_points (int) – Number of points.
gradient_mul (float) – The multiplier to gradients from points refinement and recognition.
point_strides (Sequence[int]) – points strides.
point_base_scale (int) – bbox scale for assigning labels.
loss_cls (
ConfigDictor dict) – Config of classification loss.loss_bbox_init (
ConfigDictor dict) – Config of initial points loss.loss_bbox_refine (
ConfigDictor dict) – Config of points loss in refinement.transform_method (str) – The methods to transform RepPoints to qbbox, which cannot be ‘moment’ in here.
init_cfg (
ConfigDictor dict or list[ConfigDictor dict]) – Initialization config dict.
- loss_by_feat(cls_scores: List[Tensor], pts_preds_init: List[Tensor], pts_preds_refine: List[Tensor], batch_gt_instances: List[InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: List[InstanceData] | None = None) Dict[str, Tensor][source]¶
Calculate the loss based on the features extracted by the detection head. :param cls_scores: Box scores for each scale level,
each is a 4D-tensor, of shape (batch_size, num_classes, h, w).
- Parameters:
pts_preds_init (list[Tensor]) – Points for each scale level, each is a 3D-tensor, of shape (batch_size, h_i * w_i, num_points * 2).
pts_preds_refine (list[Tensor]) – Points refined for each scale level, each is a 3D-tensor, of shape (batch_size, h_i * w_i, num_points * 2).
batch_gt_instances (list[
InstanceData]) – Batch of gt_instance. It usually includesbboxesandlabelsattributes.batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[
InstanceData], Optional) – Batch of gt_instances_ignore. It includesbboxesattribute data that is ignored during training and testing. Defaults to None.
- Returns:
A dictionary of loss components.
- Return type:
dict[str, Tensor]
- loss_by_feat_single(cls_score: Tensor, pts_pred_init: Tensor, pts_pred_refine: Tensor, labels: Tensor, label_weights, bbox_gt_init: Tensor, bbox_weights_init: Tensor, bbox_gt_refine: Tensor, bbox_weights_refine: Tensor, stride: int, avg_factor_init: int, avg_factor_refine: int) Tuple[Tensor][source]¶
Calculate the loss of a single scale level based on the features extracted by the detection head. :param cls_score: Box scores for each scale level
Has shape (N, num_classes, h_i, w_i).
- Parameters:
pts_pred_init (Tensor) – Points of shape (batch_size, h_i * w_i, num_points * 2).
pts_pred_refine (Tensor) – Points refined of shape (batch_size, h_i * w_i, num_points * 2).
labels (Tensor) – Ground truth class indices with shape (batch_size, h_i * w_i).
label_weights (Tensor) – Label weights of shape (batch_size, h_i * w_i).
bbox_gt_init (Tensor) – BBox regression targets in the init stage of shape (batch_size, h_i * w_i, 8).
bbox_weights_init (Tensor) – BBox regression loss weights in the init stage of shape (batch_size, h_i * w_i, 8).
bbox_gt_refine (Tensor) – BBox regression targets in the refine stage of shape (batch_size, h_i * w_i, 8).
bbox_weights_refine (Tensor) – BBox regression loss weights in the refine stage of shape (batch_size, h_i * w_i, 8).
stride (int) – Point stride.
avg_factor_init (int) – Average factor that is used to average the loss in the init stage.
avg_factor_refine (int) – Average factor that is used to average the loss in the refine stage.
- Returns:
loss components.
- Return type:
Tuple[Tensor]
- class mmrotate.models.dense_heads.RotatedRetinaHead(*args, loss_bbox_type: str = 'normal', **kwargs)[source]¶
Rotated retina head.
- Parameters:
loss_bbox_type (str) – Set the input type of
loss_bbox. Defaults to ‘normal’.
- loss_by_feat_single(cls_score: Tensor, bbox_pred: Tensor, anchors: Tensor, labels: Tensor, label_weights: Tensor, bbox_targets: Tensor, bbox_weights: Tensor, avg_factor: int) tuple[source]¶
Calculate the loss of a single scale level based on the features extracted by the detection head.
- Parameters:
cls_score (Tensor) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W).
bbox_pred (Tensor) – Box energies / deltas for each scale level with shape (N, num_anchors * 4, H, W).
anchors (Tensor) – Box reference for each scale level with shape (N, num_total_anchors, 4).
labels (Tensor) – Labels of each anchors with shape (N, num_total_anchors).
label_weights (Tensor) – Label weights of each anchor with shape (N, num_total_anchors)
bbox_targets (Tensor) – BBox regression targets of each anchor weight shape (N, num_total_anchors, 4).
bbox_weights (Tensor) – BBox regression loss weights of each anchor with shape (N, num_total_anchors, 4).
avg_factor (int) – Average factor that is used to average the loss.
- Returns:
loss components.
- Return type:
tuple
- class mmrotate.models.dense_heads.S2AHead(*args, loss_bbox_type: str = 'normal', **kwargs)[source]¶
An anchor-based head used in S2A-Net.
- filter_bboxes(cls_scores: List[Tensor], bbox_preds: List[Tensor]) List[List[Tensor]][source]¶
This function will be used in S2ANet, whose num_anchors=1.
- Parameters:
cls_scores (list[Tensor]) – Box scores for each scale level Has shape (N, num_classes, H, W)
bbox_preds (list[Tensor]) – Box energies / deltas for each scale level with shape (N, 5, H, W)
- Returns:
refined rbboxes of each level of each image.
- Return type:
list[list[Tensor]]
- class mmrotate.models.dense_heads.S2ARefineHead(num_classes: int, in_channels: int, frm_cfg: dict | None = None, **kwargs)[source]¶
Rotated Anchor-based refine head. It’s a part of the Oriented Detection Module (ODM), which produces orientation-sensitive features for classification and orientation-invariant features for localization.
- Parameters:
num_classes (int) – Number of categories excluding the background category.
in_channels (int) – Number of channels in the input feature map.
frm_cfg (dict) – Config of the feature refine module.
- feature_refine(x: List[Tensor], rois: List[List[Tensor]]) List[Tensor][source]¶
Refine the input feature use feature refine module.
- Parameters:
x (list[Tensor]) – feature maps of multiple scales.
rois (list[list[Tensor]]) – input rbboxes of multiple scales of multiple images, output by former stages and are to be refined.
- Returns:
refined feature maps of multiple scales.
- Return type:
list[Tensor]
- forward_single(x: Tensor) Tuple[Tensor, Tensor][source]¶
Forward feature of a single scale level.
- Parameters:
x (Tensor) – Features of a single scale level.
- Returns:
cls_score (Tensor): Cls scores for a single scale level the channels number is num_anchors * num_classes.
bbox_pred (Tensor): Box energies / deltas for a single scale level, the channels number is num_anchors * 4.
- Return type:
tuple
- get_anchors(featmap_sizes: List[tuple], batch_img_metas: List[dict], device: device | str = 'cuda') Tuple[List[List[Tensor]], List[List[Tensor]]][source]¶
Get anchors according to feature map sizes.
- Parameters:
featmap_sizes (list[tuple]) – Multi-level feature map sizes.
batch_img_metas (list[dict]) – Image meta info.
device (torch.device | str) – Device for returned tensors. Defaults to cuda.
- Returns:
anchor_list (list[list[Tensor]]): Anchors of each image.
valid_flag_list (list[list[Tensor]]): Valid flags of each image.
- Return type:
tuple
- loss_by_feat(cls_scores: List[Tensor], bbox_preds: List[Tensor], batch_gt_instances: List[InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: List[InstanceData] | None = None, rois: List[Tensor] | None = None) dict[source]¶
Calculate the loss based on the features extracted by the detection head.
- Parameters:
cls_scores (list[Tensor]) – Box scores for each scale level has shape (N, num_anchors * num_classes, H, W).
bbox_preds (list[Tensor]) – Box energies / deltas for each scale level with shape (N, num_anchors * 4, H, W).
batch_gt_instances (list[
InstanceData]) – Batch of gt_instance. It usually includesbboxesandlabelsattributes.batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[
InstanceData], optional) – Batch of gt_instances_ignore. It includesbboxesattribute data that is ignored during training and testing. Defaults to None.rois (list[Tensor])
- Returns:
A dictionary of loss components.
- Return type:
dict
- predict_by_feat(cls_scores: List[Tensor], bbox_preds: List[Tensor], score_factors: List[Tensor] | None = None, rois: List[Tensor] | None = None, batch_img_metas: List[dict] | None = None, cfg: ConfigDict | None = None, rescale: bool = False, with_nms: bool = True) List[InstanceData][source]¶
Transform a batch of output features extracted from the head into bbox results.
Note: When score_factors is not None, the cls_scores are usually multiplied by it then obtain the real score used in NMS, such as CenterNess in FCOS, IoU branch in ATSS.
- Parameters:
cls_scores (list[Tensor]) – Classification scores for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * num_classes, H, W).
bbox_preds (list[Tensor]) – Box energies / deltas for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * 4, H, W).
score_factors (list[Tensor], optional) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, num_priors * 1, H, W). Defaults to None.
rois (list[Tensor])
batch_img_metas (list[dict], Optional) – Batch image meta info. Defaults to None.
cfg (ConfigDict, optional) – Test / postprocessing configuration, if None, test_cfg would be used. Defaults to None.
rescale (bool) – If True, return boxes in original image space. Defaults to False.
with_nms (bool) – If True, do nms before return boxes. Defaults to True.
- Returns:
Object detection results of each image after the post process. Each item usually contains following keys.
scores (Tensor): Classification scores, has a shape (num_instance, )
labels (Tensor): Labels of bboxes, has a shape (num_instances, ).
bboxes (Tensor): Has a shape (num_instances, 4), the last dimension 4 arrange as (x1, y1, x2, y2).
- Return type:
list[
InstanceData]
- refine_bboxes(cls_scores: List[Tensor], bbox_preds: List[Tensor], rois: List[List[Tensor]]) List[List[Tensor]][source]¶
Refine predicted bounding boxes at each position of the feature maps. This method will be used in R3Det in refinement stages.
- Parameters:
cls_scores (list[Tensor]) – Box scores for each scale level Has shape (N, num_classes, H, W)
bbox_preds (list[Tensor]) – Box energies / deltas for each scale level with shape (N, 5, H, W)
rois (list[list[Tensor]]) – input rbboxes of each level of each image. rois output by former stages and are to be refined
- Returns:
best or refined rbboxes of each level of each image.
- Return type:
list[list[Tensor]]
- class mmrotate.models.dense_heads.SAMRepPointsHead(*args, **kwargs)[source]¶
SAM RepPoints head.
- get_targets(proposals_list: List[Tensor], valid_flag_list: List[Tensor], batch_gt_instances: List[InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: List[InstanceData] | None = None, stage: str = 'init', unmap_outputs: bool = True) tuple[source]¶
Compute corresponding GT box and classification targets for proposals.
- Parameters:
proposals_list (list[Tensor]) – Multi level points/bboxes of each image.
valid_flag_list (list[Tensor]) – Multi level valid flags of each image.
batch_gt_instances (list[
InstanceData]) – Batch of gt_instance. It usually includesbboxesandlabelsattributes.batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[
InstanceData], optional) – Batch of gt_instances_ignore. It includesbboxesattribute data that is ignored during training and testing. Defaults to None.stage (str) – ‘init’ or ‘refine’. Generate target for init stage or refine stage. Defaults to ‘init’.
unmap_outputs (bool) – Whether to map outputs back to the original set of anchors. Defaults to True.
- Returns:
labels_list (list[Tensor]): Labels of each level.
label_weights_list (list[Tensor]): Label weights of each
level. - bbox_gt_list (list[Tensor]): Ground truth bbox of each level. - proposals_list (list[Tensor]): Proposals(points/bboxes) of each level. - proposal_weights_list (list[Tensor]): Proposal weights of each level. - avg_factor (int): Average factor that is used to average the loss. When using sampling method, avg_factor is usually the sum of positive and negative priors. When using PseudoSampler, avg_factor is usually equal to the number of positive priors.
- Return type:
tuple
- loss_by_feat(cls_scores: List[Tensor], pts_preds_init: List[Tensor], pts_preds_refine: List[Tensor], batch_gt_instances: List[InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: List[InstanceData] | None = None) Dict[str, Tensor][source]¶
Calculate the loss based on the features extracted by the detection head.
- Parameters:
cls_scores (list[Tensor]) – Box scores for each scale level, each is a 4D-tensor, of shape (batch_size, num_classes, h, w).
pts_preds_init (list[Tensor]) – Points for each scale level, each is a 3D-tensor, of shape (batch_size, h_i * w_i, num_points * 2).
pts_preds_refine (list[Tensor]) – Points refined for each scale level, each is a 3D-tensor, of shape (batch_size, h_i * w_i, num_points * 2).
batch_gt_instances (list[
InstanceData]) – Batch of gt_instance. It usually includesbboxesandlabelsattributes.batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[
InstanceData], Optional) – Batch of gt_instances_ignore. It includesbboxesattribute data that is ignored during training and testing. Defaults to None.
- Returns:
A dictionary of loss components.
- Return type:
dict[str, Tensor]
- loss_by_feat_single(cls_score: Tensor, pts_pred_init: Tensor, pts_pred_refine: Tensor, labels: Tensor, label_weights, bbox_gt_init: Tensor, bbox_weights_init: Tensor, sam_weights_init: Tensor, bbox_gt_refine: Tensor, bbox_weights_refine: Tensor, sam_weights_refine: Tensor, stride: int, avg_factor_refine: int) Tuple[Tensor][source]¶
Calculate the loss of a single scale level based on the features extracted by the detection head.
- Parameters:
cls_score (Tensor) – Box scores for each scale level Has shape (N, num_classes, h_i, w_i).
pts_pred_init (Tensor) – Points of shape (batch_size, h_i * w_i, num_points * 2).
pts_pred_refine (Tensor) – Points refined of shape (batch_size, h_i * w_i, num_points * 2).
labels (Tensor) – Ground truth class indices with shape (batch_size, h_i * w_i).
label_weights (Tensor) – Label weights of shape (batch_size, h_i * w_i).
bbox_gt_init (Tensor) – BBox regression targets in the init stage of shape (batch_size, h_i * w_i, 8).
bbox_weights_init (Tensor) – BBox regression loss weights in the init stage of shape (batch_size, h_i * w_i, 8).
sam_weights_init (Tensor)
bbox_gt_refine (Tensor) – BBox regression targets in the refine stage of shape (batch_size, h_i * w_i, 8).
bbox_weights_refine (Tensor) – BBox regression loss weights in the refine stage of shape (batch_size, h_i * w_i, 8).
sam_weights_refine (Tensor)
stride (int) – Point stride.
avg_factor_refine (int) – Average factor that is used to average the loss in the refine stage.
- Returns:
loss components.
- Return type:
Tuple[Tensor]
roi_heads¶
- class mmrotate.models.roi_heads.GVRatioRoIHead(bbox_roi_extractor: ConfigDict | dict | List[ConfigDict | dict] | None = None, bbox_head: ConfigDict | dict | List[ConfigDict | dict] | None = None, mask_roi_extractor: ConfigDict | dict | List[ConfigDict | dict] | None = None, mask_head: ConfigDict | dict | List[ConfigDict | dict] | None = None, shared_head: ConfigDict | dict | None = None, train_cfg: ConfigDict | dict | None = None, test_cfg: ConfigDict | dict | None = None, init_cfg: ConfigDict | dict | List[ConfigDict | dict] | None = None)[source]¶
Gliding vertex roi head including one bbox head and one mask head.
- bbox_loss(x: Tuple[Tensor], sampling_results: List[SamplingResult]) dict[source]¶
Perform forward propagation and loss calculation of the bbox head on the features of the upstream network.
- Parameters:
x (tuple[Tensor]) – List of multi-level img features.
(list["obj (sampling_results) – SamplingResult]): Sampling results.
- Returns:
Usually returns a dictionary with keys:
cls_score (Tensor): Classification scores.
bbox_pred (Tensor): Box energies / deltas.
fix_pred (Tensor): fix / deltas.
ratio_pred (Tensor): ratio / deltas.
bbox_feats (Tensor): Extract bbox RoI features.
loss_bbox (dict): A dictionary of bbox loss components.
- Return type:
dict[str, Tensor]
- forward(x: Tuple[Tensor], rpn_results_list: List[InstanceData]) tuple[source]¶
Network forward process. Usually includes backbone, neck and head forward without any post-processing.
- Parameters:
x (List[Tensor]) – Multi-level features that may have different resolutions.
rpn_results_list (list[
InstanceData]) – List of region proposals.
- Returns
tuple: A tuple of features from
bbox_headandmask_headforward.
- predict_bbox(x: Tuple[Tensor], batch_img_metas: List[dict], rpn_results_list: List[InstanceData], rcnn_test_cfg: ConfigDict | dict, rescale: bool = False) List[InstanceData][source]¶
Perform forward propagation of the bbox head and predict detection results on the features of the upstream network.
- Parameters:
x (tuple[Tensor]) – Feature maps of all scale level.
batch_img_metas (list[dict]) – List of image information.
rpn_results_list (list[
InstanceData]) – List of region proposals.(obj (rcnn_test_cfg) – ConfigDict): test_cfg of R-CNN.
rescale (bool) – If True, return boxes in original image space. Defaults to False.
- Returns:
Detection results of each image after the post process. Each item usually contains following keys.
scores (Tensor): Classification scores, has a shape (num_instance, )
labels (Tensor): Labels of bboxes, has a shape (num_instances, ).
bboxes (Tensor): Has a shape (num_instances, 4), the last dimension 4 arrange as (x1, y1, x2, y2).
- Return type:
list[
InstanceData]
Rotated Shared2FC RBBox head.
- Parameters:
loss_bbox_type (str) – Set the input type of
loss_bbox. Defaults to ‘normal’.
Calculate the loss based on the network predictions and targets.
- Parameters:
cls_score (Tensor) – Classification prediction results of all class, has shape (batch_size * num_proposals_single_image, num_classes)
bbox_pred (Tensor) – Regression prediction results, has shape (batch_size * num_proposals_single_image, 4), the last dimension 4 represents [tl_x, tl_y, br_x, br_y].
rois (Tensor) – RoIs with the shape (batch_size * num_proposals_single_image, 5) where the first column indicates batch id of each RoI.
labels (Tensor) – Gt_labels for all proposals in a batch, has shape (batch_size * num_proposals_single_image, ).
label_weights (Tensor) – Labels_weights for all proposals in a batch, has shape (batch_size * num_proposals_single_image, ).
bbox_targets (Tensor) – Regression target for all proposals in a batch, has shape (batch_size * num_proposals_single_image, 4), the last dimension 4 represents [tl_x, tl_y, br_x, br_y].
bbox_weights (Tensor) – Regression weights for all proposals in a batch, has shape (batch_size * num_proposals_single_image, 4).
reduction_override (str, optional) – The reduction method used to override the original reduction method of the loss. Options are “none”, “mean” and “sum”. Defaults to None,
- Returns:
A dictionary of loss.
- Return type:
dict
- class mmrotate.models.roi_heads.RotatedSingleRoIExtractor(roi_layer, out_channels, featmap_strides, finest_scale=56, init_cfg=None)[source]¶
Extract RoI features from a single level feature map.
If there are multiple input feature levels, each RoI is mapped to a level according to its scale. The mapping rule is proposed in FPN.
- Parameters:
roi_layer (dict) – Specify RoI layer type and arguments.
out_channels (int) – Output channels of RoI layers.
featmap_strides (List[int]) – Strides of input feature maps.
finest_scale (int) – Scale threshold of mapping to level 0. Default: 56.
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None
- build_roi_layers(layer_cfg, featmap_strides)[source]¶
Build RoI operator to extract feature from each level feature map.
- Parameters:
layer_cfg (dict) – Dictionary to construct and config RoI layer operation. Options are modules under
mmcv/opssuch asRoIAlign.featmap_strides (List[int]) – The stride of input feature map w.r.t to the original image size, which would be used to scale RoI coordinate (original image coordinate system) to feature coordinate system.
- Returns:
The RoI extractor modules for each level feature map.
- Return type:
nn.ModuleList
- forward(feats, rois, roi_scale_factor=None)[source]¶
Forward function.
- Parameters:
feats (torch.Tensor) – Input features.
rois (torch.Tensor) – Input RoIs, shape (k, 5).
scale_factor (float) – Scale factor that RoI will be multiplied by.
- Returns:
Scaled RoI features.
- Return type:
torch.Tensor
- map_roi_levels(rois, num_levels)[source]¶
Map rois to corresponding feature levels by scales.
scale < finest_scale * 2: level 0
finest_scale * 2 <= scale < finest_scale * 4: level 1
finest_scale * 4 <= scale < finest_scale * 8: level 2
scale >= finest_scale * 8: level 3
- Parameters:
rois (torch.Tensor) – Input RoIs, shape (k, 5).
num_levels (int) – Total level number.
- Returns:
Level index (0-based) of each RoI, shape (k, )
- Return type:
Tensor
losses¶
- class mmrotate.models.losses.BCConvexGIoULoss(reduction='mean', loss_weight=1.0)[source]¶
BCConvex GIoU loss.
Computing the BCConvex GIoU loss between a set of predicted convexes and target convexes. :param reduction: The reduction method of the loss. Defaults
to ‘mean’.
- Parameters:
loss_weight (float, optional) – The weight of loss. Defaults to 1.0.
- Returns:
Loss tensor.
- Return type:
torch.Tensor
- forward(pred, target, weight=None, avg_factor=None, reduction_override=None, **kwargs)[source]¶
Forward function.
- Parameters:
pred (torch.Tensor) – Predicted convexes.
target (torch.Tensor) – Corresponding gt convexes.
weight (torch.Tensor, optional) – The weight of loss for each prediction. Defaults to None.
avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.
reduction_override (str, optional) – The reduction method used to override the original reduction method of the loss. Defaults to None.
- class mmrotate.models.losses.ConvexGIoULoss(reduction='mean', loss_weight=1.0)[source]¶
Convex GIoU loss.
Computing the Convex GIoU loss between a set of predicted convexes and target convexes. :param reduction: The reduction method of the loss. Defaults
to ‘mean’.
- Parameters:
loss_weight (float, optional) – The weight of loss. Defaults to 1.0.
- Returns:
Loss tensor.
- Return type:
torch.Tensor
- forward(pred, target, weight=None, avg_factor=None, reduction_override=None, **kwargs)[source]¶
Forward function.
- Parameters:
pred (torch.Tensor) – Predicted convexes.
target (torch.Tensor) – Corresponding gt convexes.
weight (torch.Tensor, optional) – The weight of loss for each prediction. Defaults to None.
avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.
reduction_override (str, optional) – The reduction method used to override the original reduction method of the loss. Defaults to None.
- class mmrotate.models.losses.GDLoss(loss_type, representation='xy_wh_r', fun='log1p', tau=0.0, alpha=1.0, reduction='mean', loss_weight=1.0, **kwargs)[source]¶
Gaussian based loss.
- Parameters:
loss_type (str) – Type of loss.
representation (str, optional) – Coordinate System.
fun (str, optional) – The function applied to distance. Defaults to ‘log1p’.
tau (float, optional) – Defaults to 1.0.
alpha (float, optional) – Defaults to 1.0.
reduction (str, optional) – The reduction method of the loss. Defaults to ‘mean’.
loss_weight (float, optional) – The weight of loss. Defaults to 1.0.
- Returns:
loss (torch.Tensor)
- forward(pred, target, weight=None, avg_factor=None, reduction_override=None, **kwargs)[source]¶
Forward function.
- Parameters:
pred (torch.Tensor) – Predicted convexes.
target (torch.Tensor) – Corresponding gt convexes.
weight (torch.Tensor, optional) – The weight of loss for each prediction. Defaults to None.
avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.
reduction_override (str, optional) – The reduction method used to override the original reduction method of the loss. Defaults to None.
- class mmrotate.models.losses.GDLoss_v1(loss_type, fun='sqrt', tau=1.0, reduction='mean', loss_weight=1.0, **kwargs)[source]¶
Gaussian based loss.
- Parameters:
loss_type (str) – Type of loss.
fun (str, optional) – The function applied to distance. Defaults to ‘log1p’.
tau (float, optional) – Defaults to 1.0.
reduction (str, optional) – The reduction method of the loss. Defaults to ‘mean’.
loss_weight (float, optional) – The weight of loss. Defaults to 1.0.
- Returns:
loss (torch.Tensor)
- forward(pred, target, weight=None, avg_factor=None, reduction_override=None, **kwargs)[source]¶
Forward function.
- Parameters:
pred (torch.Tensor) – Predicted convexes.
target (torch.Tensor) – Corresponding gt convexes.
weight (torch.Tensor, optional) – The weight of loss for each prediction. Defaults to None.
avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.
reduction_override (str, optional) – The reduction method used to override the original reduction method of the loss. Defaults to None.
- class mmrotate.models.losses.H2RBoxConsistencyLoss(center_loss_cfg: ConfigDict | dict = {'loss_weight': 0.0, 'type': 'mmdet.L1Loss'}, shape_loss_cfg: ConfigDict | dict = {'loss_weight': 1.0, 'type': 'mmdet.IoULoss'}, angle_loss_cfg: ConfigDict | dict = {'loss_weight': 1.0, 'type': 'mmdet.L1Loss'}, reduction: str = 'mean', loss_weight: float = 1.0)[source]¶
- forward(pred: Tensor, target: Tensor, weight: Tensor, avg_factor: int | None = None, reduction_override: str | None = None) Tensor[source]¶
Forward function.
- Parameters:
pred (Tensor) – Predicted boxes.
target (Tensor) – Corresponding gt boxes.
weight (Tensor) – The weight of loss for each prediction.
avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.
reduction_override (str, optional) – The reduction method used to override the original reduction method of the loss. Defaults to None.
- Returns:
Calculated loss (Tensor)
- class mmrotate.models.losses.H2RBoxV2ConsistencyLoss(loss_rot: ConfigDict | dict = {'beta': 0.1, 'loss_weight': 1.0, 'type': 'mmdet.SmoothL1Loss'}, loss_flp: ConfigDict | dict = {'beta': 0.1, 'loss_weight': 0.05, 'type': 'mmdet.SmoothL1Loss'}, use_snap_loss: bool = True, reduction: str = 'mean')[source]¶
- forward(pred_ori: Tensor, pred_rot: Tensor, pred_flp: Tensor, target_ori: Tensor, target_rot: Tensor, agnostic_mask: Tensor | None = None, avg_factor: int | None = None, reduction_override: str | None = None) Tensor[source]¶
Forward function.
- Parameters:
pred (Tensor) – Predicted boxes.
target (Tensor) – Corresponding gt boxes.
weight (Tensor) – The weight of loss for each prediction.
avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.
reduction_override (str, optional) – The reduction method used to override the original reduction method of the loss. Defaults to None.
- Returns:
Calculated loss (Tensor)
- class mmrotate.models.losses.KFLoss(fun='none', reduction='mean', loss_weight=1.0, **kwargs)[source]¶
Kalman filter based loss.
- Parameters:
fun (str, optional) – The function applied to distance. Defaults to ‘log1p’.
reduction (str, optional) – The reduction method of the loss. Defaults to ‘mean’.
loss_weight (float, optional) – The weight of loss. Defaults to 1.0.
- Returns:
loss (torch.Tensor)
- forward(pred, target, weight=None, avg_factor=None, pred_decode=None, targets_decode=None, reduction_override=None, **kwargs)[source]¶
Forward function.
- Parameters:
pred (torch.Tensor) – Predicted convexes.
target (torch.Tensor) – Corresponding gt convexes.
weight (torch.Tensor, optional) – The weight of loss for each prediction. Defaults to None.
avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.
pred_decode (torch.Tensor) – Predicted decode bboxes.
targets_decode (torch.Tensor) – Corresponding gt decode bboxes.
reduction_override (str, optional) – The reduction method used to override the original reduction method of the loss. Defaults to None.
- Returns:
loss (torch.Tensor)
- class mmrotate.models.losses.RotatedIoULoss(linear=False, eps=1e-06, reduction='mean', loss_weight=1.0, mode='log')[source]¶
RotatedIoULoss.
Computing the IoU loss between a set of predicted rbboxes and target rbboxes. :param linear: If True, use linear scale of loss else determined
by mode. Default: False.
- Parameters:
eps (float) – Eps to avoid log(0).
reduction (str) – Options are “none”, “mean” and “sum”.
loss_weight (float) – Weight of loss.
mode (str) – Loss scaling mode, including “linear”, “square”, and “log”. Default: ‘log’
- forward(pred, target, weight=None, avg_factor=None, reduction_override=None, **kwargs)[source]¶
Forward function.
- Parameters:
pred (torch.Tensor) – The prediction.
target (torch.Tensor) – The learning target of the prediction.
weight (torch.Tensor, optional) – The weight of loss for each prediction. Defaults to None.
avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.
reduction_override (str, optional) – The reduction method used to override the original reduction method of the loss. Defaults to None. Options are “none”, “mean” and “sum”.
- class mmrotate.models.losses.SmoothFocalLoss(gamma=2.0, alpha=0.25, reduction='mean', loss_weight=1.0)[source]¶
Smooth Focal Loss. Implementation of Circular Smooth Label (CSL).
- Parameters:
gamma (float, optional) – The gamma for calculating the modulating factor. Defaults to 2.0.
alpha (float, optional) – A balanced form for Focal Loss. Defaults to 0.25.
reduction (str, optional) – The method used to reduce the loss into a scalar. Defaults to ‘mean’. Options are “none”, “mean” and “sum”.
loss_weight (float, optional) – Weight of loss. Defaults to 1.0.
- Returns:
loss (torch.Tensor)
- forward(pred, target, weight=None, avg_factor=None, reduction_override=None)[source]¶
Forward function.
- Parameters:
pred (torch.Tensor) – The prediction.
target (torch.Tensor) – The learning label of the prediction.
weight (torch.Tensor, optional) – The weight of loss for each prediction. Defaults to None.
avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.
reduction_override (str, optional) – The reduction method used to override the original reduction method of the loss. Options are “none”, “mean” and “sum”.
- Returns:
The calculated loss
- Return type:
torch.Tensor
- class mmrotate.models.losses.SpatialBorderLoss(loss_weight=1.0)[source]¶
Spatial Border loss for learning points in Oriented RepPoints.
- Parameters:
pts (torch.Tensor) – point sets with shape (N, 9*2). Default points number in each point set is 9.
gt_bboxes (torch.Tensor) – gt_bboxes with polygon form with shape(N, 8)
- Returns:
spatial border loss.
- Return type:
torch.Tensor
- forward(pts, gt_bboxes, weight, *args, **kwargs)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
utils¶
- class mmrotate.models.utils.ORConv2d(in_channels, out_channels, kernel_size=3, arf_config=None, stride=1, padding=0, dilation=1, groups=1, bias=True)[source]¶
Oriented 2-D convolution.
- Parameters:
in_channels (List[int]) – Number of input channels per scale.
out_channels (int) – Number of output channels (used at each scale).
kernel_size (int, optional) – The size of kernel.
arf_config (tuple, optional) – a tuple consist of nOrientation and nRotation.
stride (int, optional) – Stride of the convolution. Default: 1.
padding (int or tuple) – Zero-padding added to both sides of the input. Default: 0.
dilation (int or tuple) – Spacing between kernel elements. Default: 1.
groups (int) – Number of blocked connections from input. channels to output channels. Default: 1.
bias (bool) – If True, adds a learnable bias to the output. Default: False.
- class mmrotate.models.utils.RotationInvariantPooling(nInputPlane, nOrientation=8)[source]¶
Rotating invariant pooling module.
- Parameters:
nInputPlane (int) – The number of Input plane.
nOrientation (int, optional) – The number of oriented channels.
- mmrotate.models.utils.convex_overlaps(gt_bboxes, points)[source]¶
Compute overlaps between polygons and points.
- Parameters:
gt_rbboxes (torch.Tensor) – Groundtruth polygons, shape (k, 8).
points (torch.Tensor) – Points to be assigned, shape(n, 18).
- Returns:
- Overlaps between k gt_bboxes and n bboxes,
shape(k, n).
- Return type:
overlaps (torch.Tensor)
- mmrotate.models.utils.get_num_level_anchors_inside(num_level_anchors, inside_flags)[source]¶
Get number of every level anchors inside.
- Parameters:
num_level_anchors (List[int]) – List of number of every level’s anchors.
inside_flags (torch.Tensor) – Flags of all anchors.
- Returns:
List of number of inside anchors.
- Return type:
List[int]
- mmrotate.models.utils.levels_to_images(mlvl_tensor, flatten=False)[source]¶
Concat multi-level feature maps by image.
[feature_level0, feature_level1…] -> [feature_image0, feature_image1…] Convert the shape of each element in mlvl_tensor from (N, C, H, W) to (N, H*W , C), then split the element to N elements with shape (H*W, C), and concat elements in same image of all level along first dimension.
- Parameters:
mlvl_tensor (list[torch.Tensor]) – list of Tensor which collect from corresponding level. Each element is of shape (N, C, H, W)
flatten (bool, optional) – if shape of mlvl_tensor is (N, C, H, W) set False, if shape of mlvl_tensor is (N, H, W, C) set True.
- Returns:
- A list that contains N tensors and each tensor is
of shape (num_elements, C)
- Return type:
list[torch.Tensor]
- mmrotate.models.utils.points_center_pts(RPoints, y_first=True)[source]¶
Compute center point of Pointsets.
- Parameters:
RPoints (torch.Tensor) – the lists of Pointsets, shape (k, 18).
y_first (bool, optional) – if True, the sequence of Pointsets is (y,x).
- Returns:
- the mean_center coordination of Pointsets,
shape (k, 18).
- Return type:
center_pts (torch.Tensor)
mmrotate.utils¶
- mmrotate.utils.get_multiscale_patch(sizes, steps, ratios)[source]¶
Get multiscale patch sizes and steps.
- Parameters:
sizes (list) – A list of patch sizes.
steps (list) – A list of steps to slide patches.
ratios (list) – Multiscale ratios. devidie to each size and step and generate patches in new scales.
- Returns:
A list of multiscale patch sizes. new_steps (list): A list of steps corresponding to new_sizes.
- Return type:
new_sizes (list)
- mmrotate.utils.get_test_pipeline_cfg(cfg: str | ConfigDict) ConfigDict[source]¶
Get the test dataset pipeline from entire config.
- Parameters:
cfg (str or
ConfigDict) – the entire config. Can be a config file or aConfigDict.- Returns:
the config of test dataset.
- Return type:
ConfigDict
- mmrotate.utils.merge_results_by_nms(results: List[DetDataSample], offsets: ndarray, img_shape: Tuple[int, int], nms_cfg: dict) DetDataSample[source]¶
Merge patch results by nms.
- Parameters:
results (List[
DetDataSample]) – A list of patches results.offsets (
np.ndarray) – Positions of the left top points of patches.img_shape (Tuple[int, int]) – A tuple of the huge image’s width and height.
nms_cfg (dict) – it should specify nms type and other parameters like iou_threshold.
- Returns:
merged results.
- Return type:
DetDataSample
- mmrotate.utils.register_all_modules(init_default_scope: bool = True) None[source]¶
Register all modules in mmrotate into the registries.
- Parameters:
init_default_scope (bool) – Whether initialize the mmrotate default scope. When init_default_scope=True, the global default scope will be set to mmrotate, anmmrotate all registries will build modules from mmrotate’s registry node. To understand more about the registry, please refer to https://github.com/vbti-development/onedl-mmengine/blob/main/docs/en/tutorials/registry.md Defaults to True.
- mmrotate.utils.slide_window(width, height, sizes, steps, img_rate_thr=0.6)[source]¶
Slide windows in images and get window position.
- Parameters:
width (int) – The width of the image.
height (int) – The height of the image.
sizes (list) – List of window’s sizes.
steps (list) – List of window’s steps.
img_rate_thr (float) – Threshold of window area divided by image area.
- Returns:
Information of valid windows.
- Return type:
np.ndarray