以下链接是个人关于YOLO V3所有见解,如有错误欢迎大家指出,我们第一时间纠正,如有兴趣可以加微信:17575010159 相互讨论技术。 目标检测0-00:YOLO V3目录-史上最全
一:目标边框的预测
YOLOv3网络在三个特征图中分别通过(4+1+c) k个大小为11的卷积核进行卷积预测,k为预设边界框(bounding box prior)的个数(k默认取3),c为预测目标的类别数,其中4k个参数负责预测目标边界框的偏移量,k个参数负责预测目标边界框内包含目标的概率,ck个参数负责预测这k个预设边界框对应c个目标类别的概率。下图展示了目标边界框的预测过程(该图是本人重新绘制的,与论文中的示意图有些不同,个人感觉自己绘制的更便于理解)。图中虚线矩形框为预设边界框,实线矩形框为通过网络预测的偏移量计算得到的预测边界框。其中(Cx,Cy)为预设边界框在特征图上的中心坐标,(Pw,Ph)为预设边界框在特征图上的宽和高,(tx,ty,tw,th)分别为网络预测的边界框中心偏移量(tx,ty)以及宽高缩放比(tw,th),(bx,by,bw,bh)为最终预测的目标边界框,从预设边界框到最终预测边界框的转换过程如图右侧公式所示,其中σ函数是sigmoid函数其目的是将预测偏移量缩放到0到1之间(这样能够将预设边界框的中心坐标固定在一个cell当中,作者说这样能够加快网络收敛)。
下图给出了三个预测层的特征图大小以及每个特征图上预设边界框的尺寸(这些预设边界框尺寸都是作者根据COCO数据集聚类得到的):
如果看了上面的介绍,还是不能很好的理解,没有关系,我们根据代码更好的去理解,在core/yolov3文件中,找到如下:
def __init__(self, input_data, trainable): self.trainable = trainable # 类别名字 self.classes = utils.read_class_names(cfg.YOLO.CLASSES) # 类别个数 self.num_class = len(self.classes) # 3个下采样比例[8,16,32] self.strides = np.array(cfg.YOLO.STRIDES) # 预选框 self.anchors = utils.get_anchors(cfg.YOLO.ANCHORS) # anchor_per_scale=3,表示对每个框进行3中预测 self.anchor_per_scale = cfg.YOLO.ANCHOR_PER_SCALE # IOU损失的阈值0.5 self.iou_loss_thresh = cfg.YOLO.IOU_LOSS_THRESH # 更新样本的方法 resize(变换大小) self.upsample_method = cfg.YOLO.UPSAMPLE_METHOD try: self.conv_lbbox, self.conv_mbbox, self.conv_sbbox = self.__build_nework(input_data) except: raise NotImplementedError("Can not build up yolov3 network!") with tf.variable_scope('pred_sbbox'): self.pred_sbbox = self.decode(self.conv_sbbox, self.anchors[0], self.strides[0]) with tf.variable_scope('pred_mbbox'): self.pred_mbbox = self.decode(self.conv_mbbox, self.anchors[1], self.strides[1]) with tf.variable_scope('pred_lbbox'): self.pred_lbbox = self.decode(self.conv_lbbox, self.anchors[2], self.strides[2])
我们可以看到,其通过self.__build_nework(input_data)函数得到3个降采样的特征向量,上小节为了方便大小的理解,告诉大家其网络输出,直接为box中心坐标+长框+置信度+类别概率。
通过前面目标边框的预测的介绍,从__build_nework网络得到的特征向量中的box[4]为(tx,ty,tw,th):分别为网络预测的边界框中心偏移量(tx,ty)以及宽高缩放比(tw,th)。那么我们就需要把他转化为我们需要的(bx,by,bw,bh)。其转换的核心函数在core/yolov3文件中:
def decode(self, conv_output, anchors, stride): """ return tensor of shape [batch_size, output_size, output_size, anchor_per_scale, 5 + num_classes] contains (x, y, w, h, score, probability) """ # 获得输入数据的形状 conv_shape = tf.shape(conv_output) batch_size = conv_shape[0] output_size = conv_shape[1] # 每个gred预测3个box(只有包含box中心的grep才进行预测) anchor_per_scale = len(anchors) conv_output = tf.reshape(conv_output, (batch_size, output_size, output_size, anchor_per_scale, 5 + self.num_class)) # 获得每个gred预测的box中心坐标的偏移值, conv_raw_dxdy = conv_output[:, :, :, :, 0:2] # 表示高宽的缩放比 conv_raw_dwdh = conv_output[:, :, :, :, 2:4] # 会的box预测每个类别的置信度 conv_raw_conf = conv_output[:, :, :, :, 4:5] # 每种类别的概率 conv_raw_prob = conv_output[:, :, :, :, 5: ] # 对没一列进进行标号,假设output_size = 5得到如下 # [0 0 0 0 0] # [1 1 1 1 1] # [2 2 2 2 2] # [3 3 3 3 3] # [4 4 4 4 4]] y = tf.tile(tf.range(output_size, dtype=tf.int32)[:, tf.newaxis], [1, output_size]) # 对没一列进进行标号,假设output_size = 5得到如下 # [[0 1 2 3 4] # [0 1 2 3 4] # [0 1 2 3 4] # [0 1 2 3 4] # [0 1 2 3 4]] x = tf.tile(tf.range(output_size, dtype=tf.int32)[tf.newaxis, :], [output_size, 1]) # 获得每个gred的坐标,如: # [(0,0),(0,1),(0,2),(0,3),(0,4)] # ...... # ...... # [(4,0),(4,1),(4,2),(4,3),(4,4)] xy_grid = tf.concat([x[:, :, tf.newaxis], y[:, :, tf.newaxis]], axis=-1) # 再增加一个batch_size与anchor_per_scale的维度,最终形成的xy_grid包含每个batch_size,每个张图片gred的坐标,及3个预测的anchor_per_scale xy_grid = tf.tile(xy_grid[tf.newaxis, :, :, tf.newaxis, :], [batch_size, 1, 1, anchor_per_scale, 1]) xy_grid = tf.cast(xy_grid, tf.float32) # conv_raw_dxdy相对于中心点偏移值,* stride复原到原图像中 pred_xy = (tf.sigmoid(conv_raw_dxdy) + xy_grid) * stride # conv_raw_dwdh,表示高宽的缩放比 pred_wh = (tf.exp(conv_raw_dwdh) * anchors) * stride # 进行合并 pred_xywh = tf.concat([pred_xy, pred_wh], axis=-1) # 置信度 pred_conf = tf.sigmoid(conv_raw_conf) # 类别概率 pred_prob = tf.sigmoid(conv_raw_prob) return tf.concat([pred_xywh, pred_conf, pred_prob], axis=-1)
这样我们就完成了核心的转换,我们得到(bx,by,bw,bh)之后,当然就可以计算损失函数了。
损失函数在core/yolov3文件中:
def bbox_iou(self, boxes1, boxes2): boxes1_area = boxes1[..., 2] * boxes1[..., 3] boxes2_area = boxes2[..., 2] * boxes2[..., 3] boxes1 = tf.concat([boxes1[..., :2] - boxes1[..., 2:] * 0.5, boxes1[..., :2] + boxes1[..., 2:] * 0.5], axis=-1) boxes2 = tf.concat([boxes2[..., :2] - boxes2[..., 2:] * 0.5, boxes2[..., :2] + boxes2[..., 2:] * 0.5], axis=-1) left_up = tf.maximum(boxes1[..., :2], boxes2[..., :2]) right_down = tf.minimum(boxes1[..., 2:], boxes2[..., 2:]) inter_section = tf.maximum(right_down - left_up, 0.0) inter_area = inter_section[..., 0] * inter_section[..., 1] union_area = boxes1_area + boxes2_area - inter_area iou = 1.0 * inter_area / union_area return iou def loss_layer(self, conv, pred, label, bboxes, anchors, stride): print(conv.shape) print(pred.shape) print(label.shape) print(bboxes.shape) conv_shape = tf.shape(conv) batch_size = conv_shape[0] output_size = conv_shape[1] input_size = stride * output_size conv = tf.reshape(conv, (batch_size, output_size, output_size, self.anchor_per_scale, 5 + self.num_class)) conv_raw_conf = conv[:, :, :, :, 4:5] #没有通过anchors计算的置信度 conv_raw_prob = conv[:, :, :, :, 5:] #没有通过anchors计算的类别概率 pred_xywh = pred[:, :, :, :, 0:4] #通过anchors计算得到的中心坐标以及长宽 pred_conf = pred[:, :, :, :, 4:5] #通过anchors计算的类别的置信度 label_xywh = label[:, :, :, :, 0:4] # 每个cell真实3个box对应的中心坐标以及长框 respond_bbox = label[:, :, :, :, 4:5] # 每个cell真实3个box的置信度 label_prob = label[:, :, :, :, 5:] # 每个cell真实3个box种类的概率 giou = tf.expand_dims(self.bbox_giou(pred_xywh, label_xywh), axis=-1) print('giou', giou.shape) input_size = tf.cast(input_size, tf.float32) bbox_loss_scale = 2.0 - 1.0 * label_xywh[:, :, :, :, 2:3] * label_xywh[:, :, :, :, 3:4] / (input_size ** 2) giou_loss = respond_bbox * bbox_loss_scale * (1- giou) iou = self.bbox_iou(pred_xywh[:, :, :, :, np.newaxis, :], bboxes[:, np.newaxis, np.newaxis, np.newaxis, :, :]) max_iou = tf.expand_dims(tf.reduce_max(iou, axis=-1), axis=-1) respond_bgd = (1.0 - respond_bbox) * tf.cast( max_iou < self.iou_loss_thresh, tf.float32 ) conf_focal = self.focal(respond_bbox, pred_conf) conf_loss = conf_focal * ( respond_bbox * tf.nn.sigmoid_cross_entropy_with_logits(labels=respond_bbox, logits=conv_raw_conf) + respond_bgd * tf.nn.sigmoid_cross_entropy_with_logits(labels=respond_bbox, logits=conv_raw_conf) ) prob_loss = respond_bbox * tf.nn.sigmoid_cross_entropy_with_logits(labels=label_prob, logits=conv_raw_prob) giou_loss = tf.reduce_mean(tf.reduce_sum(giou_loss, axis=[1,2,3,4])) conf_loss = tf.reduce_mean(tf.reduce_sum(conf_loss, axis=[1,2,3,4])) prob_loss = tf.reduce_mean(tf.reduce_sum(prob_loss, axis=[1,2,3,4])) return giou_loss, conf_loss, prob_loss def compute_loss(self, label_sbbox, label_mbbox, label_lbbox, true_sbbox, true_mbbox, true_lbbox): with tf.name_scope('smaller_box_loss'): loss_sbbox = self.loss_layer(self.conv_sbbox, self.pred_sbbox, label_sbbox, true_sbbox, anchors = self.anchors[0], stride = self.strides[0]) with tf.name_scope('medium_box_loss'): loss_mbbox = self.loss_layer(self.conv_mbbox, self.pred_mbbox, label_mbbox, true_mbbox, anchors = self.anchors[1], stride = self.strides[1]) with tf.name_scope('bigger_box_loss'): loss_lbbox = self.loss_layer(self.conv_lbbox, self.pred_lbbox, label_lbbox, true_lbbox, anchors = self.anchors[2], stride = self.strides[2]) with tf.name_scope('giou_loss'): giou_loss = loss_sbbox[0] + loss_mbbox[0] + loss_lbbox[0] with tf.name_scope('conf_loss'): conf_loss = loss_sbbox[1] + loss_mbbox[1] + loss_lbbox[1] with tf.name_scope('prob_loss'): prob_loss = loss_sbbox[2] + loss_mbbox[2] + loss_lbbox[2] return giou_loss, conf_loss, prob_loss
这里就不进行详细解析了,总的来说: 对于 xywh使用的是IOU去计算损失 置信度,以及类别概率使用交叉损失熵。