This section will show how to train existing models on supported datasets. The following training environments are supported:
You can also manage jobs with Slurm.
Important:
train_cfg
as
train_cfg = dict(val_interval=10)
. That means evaluating the model every 10 epochs.lr=0.01
for 8 GPUs * 1 img/gpu and lr=0.04 for 16 GPUs * 2 imgs/gpu.--work-dir
. It uses ./work_dirs/CONFIG_NAME
as default.--amp
.The model is default put on cuda device.
Only if there are no cuda devices, the model will be put on cpu.
So if you want to train the model on CPU, you need to export CUDA_VISIBLE_DEVICES=-1
to disable GPU visibility first.
More details in MMEngine.
CUDA_VISIBLE_DEVICES=-1 python tools/train.py ${CONFIG_FILE} [optional arguments]
An example of training the MOT model QDTrack on CPU:
CUDA_VISIBLE_DEVICES=-1 python tools/train.py configs/qdtrack/qdtrack_faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain_test-mot17halfval.py
If you want to train the model on single GPU, you can directly use the tools/train.py
as follows.
python tools/train.py ${CONFIG_FILE} [optional arguments]
You can use export CUDA_VISIBLE_DEVICES=$GPU_ID
to select the GPU.
An example of training the MOT model QDTrack on single GPU:
CUDA_VISIBLE_DEVICES=2 python tools/train.py configs/qdtrack/qdtrack_faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain_test-mot17halfval.py
We provide tools/dist_train.sh
to launch training on multiple GPUs.
The basic usage is as follows.
bash ./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments]
If you would like to launch multiple jobs on a single machine, e.g., 2 jobs of 4-GPU training on a machine with 8 GPUs, you need to specify different ports (29500 by default) for each job to avoid communication conflict.
For example, you can set the port in commands as follows.
CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh ${CONFIG_FILE} 4
CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 ./tools/dist_train.sh ${CONFIG_FILE} 4
An example of training the MOT model QDTrack on single node multiple GPUs:
bash ./tools/dist_train.sh configs/qdtrack/qdtrack_faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain_test-mot17halfval.py 8
If you launch with multiple machines simply connected with ethernet, you can simply run following commands:
On the first machine:
NNODES=2 NODE_RANK=0 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR bash tools/dist_train.sh $CONFIG $GPUS
On the second machine:
NNODES=2 NODE_RANK=1 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR bash tools/dist_train.sh $CONFIG $GPUS
Usually it is slow if you do not have high speed networking like InfiniBand.
Slurm is a good job scheduling system for computing clusters.
On a cluster managed by Slurm, you can use slurm_train.sh
to spawn training jobs.
It supports both single-node and multi-node training.
The basic usage is as follows.
bash ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR} ${GPUS}
An example of training the MOT model QDTrack with Slurm:
PORT=29501 \
GPUS_PER_NODE=8 \
SRUN_ARGS="--quotatype=reserved" \
bash ./tools/slurm_train.sh \
mypartition \
mottrack
configs/qdtrack/qdtrack_faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain_test-mot17halfval.py
./work_dirs/QDTrack \
8
This section will show how to test existing models on supported datasets. The following testing environments are supported:
You can also manage jobs with Slurm.
Important:
DeepSORT
, SORT
, StrongSORT
need load the weight of the reid
and the weight of the detector
separately.
Other algorithms such as ByteTrack
, OCSORT
and QDTrack
don't need. So we provide --checkpoint
, --detector
and --reid
to load weights.StrongSORT
, Mask2former
only support
video_based test. if your GPU memory can't fit the entire video, you can switch test way by set sampler type.
For example:
video_based test: sampler=dict(type='DefaultSampler', shuffle=False, round_up=False)
image_based test: sampler=dict(type='TrackImgSampler')
outfile_prefix
in evaluator.
For example, val_evaluator = dict(outfile_prefix='results/sort_mot17')
.
Otherwise, a temporal file will be created and will be removed after evaluation.format_only=True
.
For example, test_evaluator = dict(type='MOTChallengeMetric', metric=['HOTA', 'CLEAR', 'Identity'], outfile_prefix='sort_mot17_results', format_only=True)
The model is default put on cuda device.
Only if there are no cuda devices, the model will be put on cpu.
So if you want to test the model on CPU, you need to export CUDA_VISIBLE_DEVICES=-1
to disable GPU visibility first.
More details in MMEngine.
CUDA_VISIBLE_DEVICES=-1 python tools/test_tracking.py ${CONFIG_FILE} [optional arguments]
An example of testing the MOT model SORT on CPU:
CUDA_VISIBLE_DEVICES=-1 python tools/test_tracking.py configs/sort/sort_faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain_test-mot17halfval.py --detector ${CHECKPOINT_FILE}
If you want to test the model on single GPU, you can directly use the tools/test_tracking.py
as follows.
python tools/test_tracking.py ${CONFIG_FILE} [optional arguments]
You can use export CUDA_VISIBLE_DEVICES=$GPU_ID
to select the GPU.
An example of testing the MOT model QDTrack on single GPU:
CUDA_VISIBLE_DEVICES=2 python tools/test_tracking.py configs/qdtrack/qdtrack_faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain_test-mot17halfval.py --detector ${CHECKPOINT_FILE}
We provide tools/dist_test_tracking.sh
to launch testing on multiple GPUs.
The basic usage is as follows.
bash ./tools/dist_test_tracking.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments]
An example of testing the MOT model DeepSort on single node multiple GPUs:
bash ./tools/dist_test_tracking.sh configs/qdtrack/qdtrack_faster-rcnn_r50_fpn_8xb2-4e_mot17halftrain_test-mot17halfval.py 8 --detector ${CHECKPOINT_FILE} --reid ${CHECKPOINT_FILE}
You can test on multiple nodes, which is similar with "Train on multiple nodes".
On a cluster managed by Slurm, you can use slurm_test_tracking.sh
to spawn testing jobs.
It supports both single-node and multi-node testing.
The basic usage is as follows.
[GPUS=${GPUS}] bash tools/slurm_test_tracking.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} [optional arguments]
An example of testing the VIS model Mask2former with Slurm:
GPUS=8
bash tools/slurm_test_tracking.sh \
mypartition \
vis \
configs/mask2former_vis/mask2former_r50_8xb2-8e_youtubevis2021.py \
--checkpoint ${CHECKPOINT_FILE}