mmtrack.apis¶
mmtrack.core¶
anchor¶
evaluation¶
motion¶
optimizer¶
track¶
utils¶
mmtrack.datasets¶
datasets¶
parsers¶
pipelines¶
samplers¶
- class mmtrack.datasets.samplers.DistributedVideoSampler(dataset, num_replicas=None, rank=None, shuffle=False)[source]¶
Put videos to multi gpus during testing.
- Parameters
dataset (Dataset) – Test dataset that must has data_infos attribute. Each data_info in data_infos record information of one frame, and each video must has one data_info that includes data_info[‘frame_id’] == 0.
num_replicas (int) – The number of gpus. Defaults to None.
rank (int) – Gpu rank id. Defaults to None.
shuffle (bool) – If True, shuffle the dataset. Defaults to False.
mmtrack.models¶
mot¶
sot¶
vid¶
aggregators¶
- class mmtrack.models.aggregators.EmbedAggregator(num_convs=1, channels=256, kernel_size=3, norm_cfg=None, act_cfg={'type': 'ReLU'}, init_cfg=None)[source]¶
Embedding convs to aggregate multi feature maps.
This module is proposed in “Flow-Guided Feature Aggregation for Video Object Detection”. FGFA.
- Parameters
num_convs (int) – Number of embedding convs.
channels (int) – Channels of embedding convs. Defaults to 256.
kernel_size (int) – Kernel size of embedding convs, Defaults to 3.
norm_cfg (dict) – Configuration of normlization method after each conv. Defaults to None.
act_cfg (dict) – Configuration of activation method after each conv. Defaults to dict(type=’ReLU’).
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.
- forward(x, ref_x)[source]¶
Aggregate reference feature maps ref_x.
The aggregation mainly contains two steps: 1. Computing the cos similarity between x and ref_x. 2. Use the normlized (i.e. softmax) cos similarity to weightedly sum ref_x.
- Parameters
x (Tensor) – of shape [1, C, H, W]
ref_x (Tensor) – of shape [N, C, H, W]. N is the number of reference feature maps.
- Returns
The aggregated feature map with shape [1, C, H, W].
- Return type
Tensor
- class mmtrack.models.aggregators.SelsaAggregator(in_channels, num_attention_blocks=16, init_cfg=None)[source]¶
Selsa aggregator module.
This module is proposed in “Sequence Level Semantics Aggregation for Video Object Detection”. SELSA.
- Parameters
in_channels (int) – The number of channels of the features of proposal.
num_attention_blocks (int) – The number of attention blocks used in selsa aggregator module. Defaults to 16.
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.
- forward(x, ref_x)[source]¶
Aggregate the features ref_x of reference proposals.
The aggregation mainly contains two steps: 1. Use multi-head attention to computing the weight between x and ref_x. 2. Use the normlized (i.e. softmax) weight to weightedly sum ref_x.
- Parameters
x (Tensor) – of shape [N, C]. N is the number of key frame proposals.
ref_x (Tensor) – of shape [M, C]. M is the number of reference frame proposals.
- Returns
The aggregated features of key frame proposals with shape [N, C].
- Return type
Tensor