Pytorch官方使用的示例代码如下:
import torch import torchvision model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True) # For training images, boxes = torch.rand(4, 3, 600, 1200), torch.rand(4, 11, 4) boxes[:, :, 2:4] = boxes[:, :, 0:2] + boxes[:, :, 2:4] labels = torch.randint(1, 91, (4, 11)) images = list(image for image in images) targets = [] for i in range(len(images)): d = { boxes: boxes[i], labels: labels[i]} targets.append(d) output = model(images, targets) # For inference model.eval() x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)] predictions = model(x) # optionally, if you want to export the model to ONNX: torch.onnx.export(model, x, "faster_rcnn.onnx", opset_version = 11)
下面主要就示例代码进行详细说明。
首先,初始化 Faster R-CNN 模型。
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
可以看出,这里使用的是主干网络 Resnet-50-FPN 的 Faster R-CNN。接下来 Debug 进内部代码。
def fasterrcnn_resnet50_fpn(pretrained=False, progress=True, num_classes=91, pretrained_backbone=True, trainable_backbone_layers=3, **kwargs): """ Constructs a Faster R-CNN model with a ResNet-50-FPN backbone. 构建一个主干网络为 ResNet-50-FPN 的 Faster R-CNN 模型。 The input to the model is expected to be a list of tensors, each of shape ``[C, H, W]``, one for each image, and should be in ``0-1`` range. Different images can have different sizes. 模型的输入应该为一个由tensors组成的列表,每个tensor的形状为[C,H,W],对于每一个图像的元素值都应该在[0,1]的范围内,不同的图像有着不同的尺寸。 The behavior of the model changes depending if it is in training or evaluation mode. 模型有训练与评估两种模式,模型的表现取决于模型所处的模式。 During training, the model expects both the input tensors, as well as a targets (list of dictionary), containing: - boxes (``FloatTensor[N, 4]``): the ground-truth boxes in ``[x1, y1, x2, y2]`` format, with values of ``x`` between ``0`` and ``W`` and values of ``y`` between ``0`` and ``H`` - labels (``Int64Tensor[N]``): the class label for each ground-truth box 在训练过程中,模型需要输入图像的tensor,以及目标(字典组成的列表),其包含: - 边框(FloatTensor[N,4]):真实框为[x1,y1,x2,y2]的形式,x 的值在 0~W 之间,y 的值在 0-H 之间。 - 标签(Int64Tensor[N]):每个真实框的类别标签。 The model returns a ``Dict[Tensor]`` during training, containing the classification and regression losses for both the RPN and the R-CNN. 在训练期间,模型返回一个 ”Dict[Tensor]“,包含 RPN 与 R-CNN 阶段的分类与回归损失。 During inference, the model requires only the input tensors, and returns the post-processed predictions as a ``List[Dict[Tensor]]``, one for each input image. The fields of the ``Dict`` are as follows: - boxes (``FloatTensor[N, 4]``): the predicted boxes in ``[x1, y1, x2, y2]`` format, with values of ``x`` between ``0`` and ``W`` and values of ``y`` between ``0`` and ``H`` - labels (``Int64Tensor[N]``): the predicted labels for each image - scores (``Tensor[N]``): the scores or each prediction 在推理过程中,模型仅需要输入图像的tensor,然后返回经过后处理的预测结果以 "List[Dict[Tensor]]" 的形式,对于每一个输入图像,其 "Dict" 域如下: - 边框(FloatTensor[N,4]):预测框为[x1,y1,x2,y2]的形式,x 的值在 0~W 之间,y 的值在 0~H 之间。 - 标签(Int64Tensor[N]):每个图像的预测标签。 - 分数(Tensor[N]):每个预测的分数。 Faster R-CNN is exportable to ONNX for a fixed batch size with inputs images of fixed size. Faster R—CNN 可以被导出为一个固定批大小域固定尺寸输入图像的 ONNX 格式。 Arguments: pretrained (bool): If True, returns a model pre-trained on COCO train2017 progress (bool): If True, displays a progress bar of the download to stderr pretrained_backbone (bool): If True, returns a model with backbone pre-trained on Imagenet num_classes (int): number of output classes of the model (including the background) trainable_backbone_layers (int): number of trainable (not frozen) resnet layers starting from final block. Valid values are between 0 and 5, with 5 meaning all backbone layers are trainable. 参数: pretrianed(bool):如果为真,返回一个在 COCO train2017 上的预训练模型。 progress(bool):如果为真,将下载进度条展示在屏幕。 pretrained_backbone(bool):如果为真,返回一个在 Imagenet 上的主干网络预训练模型。 num_classes(int):模型输出的种类数量(包括背景)。 trainable_backbone_layers(int):从最后一个块开始可训练 ResNet 层的数量(未被冻结)。合法的值在 0~5 之间,5 意味着所有主干网络的层都是可训练的。 """ # 使用 assert 判断 trainable_backbone_layers 的值是否合法 assert trainable_backbone_layers <= 5 and trainable_backbone_layers >= 0 # dont freeze any layers if pretrained model or backbone is not used # 如果预训练模型或者预训练主干网络未被使用,不要冻结任何层。 if not (pretrained or pretrained_backbone): trainable_backbone_layers = 5 if pretrained: # no need to download the backbone if pretrained is set # 如果预训练模型被使用,就不需要下载预训练主干网络 pretrained_backbone = False # 获取 ResNet_FPN 主干网络 backbone = resnet_fpn_backbone(resnet50, pretrained_backbone, trainable_layers=trainable_backbone_layers) # 获取 Faster R-CNN 模型 model = FasterRCNN(backbone, num_classes, **kwargs) if pretrained: # 如果使用预训练模型,就下载相关的预训练模型配置 state_dict = load_state_dict_from_url(model_urls[fasterrcnn_resnet50_fpn_coco], progress=progress) # 加载模型配置到模型中 model.load_state_dict(state_dict) return model # 返回模型
Debug 进获取 ResNet_FPN 主干网络对应代码。
def resnet_fpn_backbone( backbone_name, pretrained, norm_layer=misc_nn_ops.FrozenBatchNorm2d, trainable_layers=3, returned_layers=None, extra_blocks=None ): """ Constructs a specified ResNet backbone with FPN on top. Freezes the specified number of layers in the backbone. 构建一个在顶端加入FPN的ResNet主干网络。冻结主干网络中指定数量的层。