mmtrack.apis¶

mmtrack.core¶

anchor¶

evaluation¶

motion¶

optimizer¶

track¶

utils¶

mmtrack.datasets¶

datasets¶

parsers¶

pipelines¶

samplers¶

class mmtrack.datasets.samplers.DistributedVideoSampler(dataset, num_replicas=None, rank=None, shuffle=False)[source]¶

Put videos to multi gpus during testing.

Parameters

dataset (Dataset) – Test dataset that must has data_infos attribute. Each data_info in data_infos record information of one frame, and each video must has one data_info that includes data_info[‘frame_id’] == 0.
num_replicas (int) – The number of gpus. Defaults to None.
rank (int) – Gpu rank id. Defaults to None.
shuffle (bool) – If True, shuffle the dataset. Defaults to False.

mmtrack.models¶

mot¶

sot¶

vid¶

aggregators¶

class mmtrack.models.aggregators.EmbedAggregator(num_convs=1, channels=256, kernel_size=3, norm_cfg=None, act_cfg={'type': 'ReLU'}, init_cfg=None)[source]¶

Embedding convs to aggregate multi feature maps.

This module is proposed in “Flow-Guided Feature Aggregation for Video Object Detection”. FGFA.

Parameters

num_convs (int) – Number of embedding convs.
channels (int) – Channels of embedding convs. Defaults to 256.
kernel_size (int) – Kernel size of embedding convs, Defaults to 3.
norm_cfg (dict) – Configuration of normlization method after each conv. Defaults to None.
act_cfg (dict) – Configuration of activation method after each conv. Defaults to dict(type=’ReLU’).
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

forward(x, ref_x)[source]¶

Aggregate reference feature maps ref_x.

The aggregation mainly contains two steps: 1. Computing the cos similarity between x and ref_x. 2. Use the normlized (i.e. softmax) cos similarity to weightedly sum ref_x.

Parameters

x (Tensor) – of shape [1, C, H, W]
ref_x (Tensor) – of shape [N, C, H, W]. N is the number of reference feature maps.

Returns

The aggregated feature map with shape [1, C, H, W].

Return type

Tensor

class mmtrack.models.aggregators.SelsaAggregator(in_channels, num_attention_blocks=16, init_cfg=None)[source]¶

Selsa aggregator module.

This module is proposed in “Sequence Level Semantics Aggregation for Video Object Detection”. SELSA.

Parameters

in_channels (int) – The number of channels of the features of proposal.
num_attention_blocks (int) – The number of attention blocks used in selsa aggregator module. Defaults to 16.
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

forward(x, ref_x)[source]¶

Aggregate the features ref_x of reference proposals.

The aggregation mainly contains two steps: 1. Use multi-head attention to computing the weight between x and ref_x. 2. Use the normlized (i.e. softmax) weight to weightedly sum ref_x.

Parameters

x (Tensor) – of shape [N, C]. N is the number of key frame proposals.
ref_x (Tensor) – of shape [M, C]. M is the number of reference frame proposals.

Returns

The aggregated features of key frame proposals with shape [N, C].

Return type

Tensor

backbones¶

losses¶

motion¶

reid¶

roi_heads¶

track_heads¶

builder¶

mmtrack.utils¶

mmtrack.utils.collect_env()[source]¶: Collect the information of the running environments.

mmtrack.utils.get_root_logger(log_file=None, log_level=20)[source]¶

Get root logger.

Parameters

log_file (str) – File path of log. Defaults to None.
log_level (int) – The level of logger. Defaults to logging.INFO.

Returns

The obtained logger

Return type

logging.Logger