经过这几天不断地测试, YOLO 在 TX1 上跑得还是挺不错的, 符合我们实验室的要求. 但是 YOLO 依赖的 Darknet 框架还是太原始了, 不如 TensorFlow 或者 Caffe 用着顺手. 另外, 我负责的目标检测这一块还需要和梅老板写的新框架相结合, 所以更加需要把 YOLO 移植到一个成熟的框架上去.

很幸运的是, YOLO 在各个框架上的移植都有前人做过了, 比如 darktfcaffe-yolo. 今天以 caffe-yolo 为例, 谈一下在其上使用自己的数据集来训练.

Reformat our dataset as PASCAL VOC style

为了之后的方便起见, 首先将我们的数据集转成 PASCAL VOC 的标准的目录格式.

Structure of PASCAL VOC dataset

其目录结构如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
.
├── VOC2007
│   ├── Annotations
│   ├── ImageSets
│   ├── JPEGImages
│   ├── SegmentationClass
│   └── SegmentationObject
└── VOC2012
    ├── Annotations
    ├── ImageSets
    ├── JPEGImages
    ├── SegmentationClass
    └── SegmentationObject

其中Annotations目录放的是.xml文件, JPEGImages目录中存放的是对应的.jpg图像. 由于我们不做语义分割, 所以SegmentationClassSegmentationObject对我们没什么用.

ImageSets目录中结构如下, 主要关注的是Main文件夹中的trainval.txt, train.txt , val.txt以及test.txt四个文件.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
.
├── Layout
│   ├── test.txt
│   ├── train.txt
│   ├── trainval.txt
│   └── val.txt
├── Main
│   ├── aeroplane_test.txt
│   ├── aeroplane_train.txt
│   ├── aeroplane_trainval.txt
│   ├── aeroplane_val.txt
│   ├── ...
│   ├── test.txt
│   ├── train.txt
│   ├── trainval.txt
│   └── val.txt
└── Segmentation
    ├── test.txt
    ├── train.txt
    ├── trainval.txt
    └── val.txt

Reformat our dataset

首先是把之前杂乱的图片文件名重新整理, 如下所示:

1
2
3
4
5
6
7
8
.
├── image00001.jpg
├── image00002.jpg
├── image00012.jpg
├── ...
├── image04524.jpg
├── image04525.jpg
└── image04526.jpg

随后用labelImg重新标注这些图. 标注完成后, 建立我们自己的数据集的结构, 并且将图片和标注放到对应的文件夹里:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
.
├── ROB2017
│   ├── Annotations
│   ├── ImageSets
│   ├── JPEGImages
│   └── JPEGImages_original
└── scripts
    ├── clean.py
    ├── conf.json
    ├── convert_png2jpg.py
    └── split_dataset.py

之后写了几个脚本, 其中clean.py用来清理未标注的图片; split_dataset.py用来分割训练集验证集测试集, 并且保存到ImageSets/Main中.

至此, 把我们的数据集转成 PASCAL VOC 标准目录的工作就完成了, 可以进行下一步的训练工作.

Train YOLO on Caffe

Clone & Make

1
2
3
4
$ git clone https://github.com/yeahkun/caffe-yolo.git
$ cd caffe-yolo
$ cp Makefile.config.example Makefile.config
$ make -j8

若是出现src/caffe/net.cpp:8:18: fatal error: hdf5.h: No such file or directory这一错误, 可以照下文修改Makefile.config文件:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
diff --git a/Makefile.config b/Makefile.config
index a873502..88828cc 100644
--- a/Makefile.config.example
+++ b/Makefile.config.example
@@ -69,8 +69,8 @@ PYTHON_LIB := /usr/lib
 # WITH_PYTHON_LAYER := 1

 # Whatever else you find you need goes here.
-INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include
-LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib
+INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial/
+LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu/hdf5/serial/

 # If Homebrew is installed at a non standard location (for example your home directory) and you use it for general dependencies
 # INCLUDE_DIRS += $(shell brew --prefix)/include

同时还可以开启 cuDNN 以及修改 compute, 充分发挥 GTX1080 的性能:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
## Refer to http://caffe.berkeleyvision.org/installation.html
# Contributions simplifying and improving our build system are welcome!

# cuDNN acceleration switch (uncomment to build with cuDNN).
-# USE_CUDNN := 1
+USE_CUDNN := 1

# CPU-only switch (uncomment to build without GPU support).
# CPU_ONLY := 1
...
# CUDA architecture setting: going with all of them.
# For CUDA < 6.0, comment the *_50 lines for compatibility.
CUDA_ARCH := -gencode arch=compute_20,code=sm_20 \
                -gencode arch=compute_20,code=sm_21 \
                -gencode arch=compute_30,code=sm_30 \
                -gencode arch=compute_35,code=sm_35 \
                -gencode arch=compute_50,code=sm_50 \
-                -gencode arch=compute_50,code=compute_50
+                -gencode arch=compute_50,code=compute_50 \
+                -gencode arch=compute_61,code=compute_61

Data preparation

1
2
3
4
5
6
7
 $ cd data/yolo
 $ ln -s /your/path/to/VOCdevkit/ .
 $ python ./get_list.py
 # change related path in script convert.sh
 $ sudo rm -r lmdb
 $ mkdir lmdb
 $ ./convert.sh 

有一些注意点:

  • 记得将ln -s /your/path/to/VOCdevkit/ .中的/your/path/to/VOCdevkit/换成自己数据集的路径, 例如ln -s ~/data/ROBdevkit/ .

  • 修改./get_list.py:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
  diff --git a/data/yolo/get_list.py b/data/yolo/get_list.py
  index f519f1a..73b9858 100755
  --- a/data/yolo/get_list.py
  +++ b/data/yolo/get_list.py
  @@ -3,12 +3,15 @@ import os

   trainval_jpeg_list = []
   trainval_xml_list = []
  -test07_jpeg_list = []
  -test07_xml_list = []
  -test12_jpeg_list = []
  -
  -for name in ["VOC2007", "VOC2012"]:
  -  voc_dir = os.path.join("VOCdevkit", name)
  +test_jpeg_list = []
  +test_xml_list = []
  +
  +for name in ['ROB2017']:
  +  # voc_dir = os.path.join("VOCdevkit", name)
  +  voc_dir = os.path.join('ROBdevkit', name)
     txt_fold = os.path.join(voc_dir, "ImageSets/Main")
     jpeg_fold = os.path.join(voc_dir, "JPEGImages")
     xml_fold = os.path.join(voc_dir, "Annotations")
  @@ -23,35 +26,49 @@ for name in ["VOC2007", "VOC2012"]:
             print trainval_jpeg_list[-1], "not exist"
           if not os.path.exists(trainval_xml_list[-1]):
             print trainval_xml_list[-1], "not exist"
  -  if name == "VOC2007":
  -    file_path = os.path.join(txt_fold, "test.txt")
  -    with open(file_path, 'r') as fp:
  -      for line in fp:
  -        line = line.strip()
  -        test07_jpeg_list.append(os.path.join(jpeg_fold, "{}.jpg".format(line)))
  -        test07_xml_list.append(os.path.join(xml_fold, "{}.xml".format(line)))
  -        if not os.path.exists(test07_jpeg_list[-1]):
  -          print test07_jpeg_list[-1], "not exist"
  -        if not os.path.exists(test07_xml_list[-1]):
  -          print test07_xml_list[-1], "not exist"
  -  elif name == "VOC2012":
  +  if name == "ROB2017":
       file_path = os.path.join(txt_fold, "test.txt")
       with open(file_path, 'r') as fp:
         for line in fp:
           line = line.strip()
  -        test12_jpeg_list.append(os.path.join(jpeg_fold, "{}.jpg".format(line)))
  -        if not os.path.exists(test12_jpeg_list[-1]):
  -          print test12_jpeg_list[-1], "not exist"
  +        test_jpeg_list.append(os.path.join(jpeg_fold, "{}.jpg".format(line)))
  +        test_xml_list.append(os.path.join(xml_fold, "{}.xml".format(line)))
  +        if not os.path.exists(test_jpeg_list[-1]):
  +          print test_jpeg_list[-1], "not exist"
  +        if not os.path.exists(test_xml_list[-1]):
  +          print test_xml_list[-1], "not exist"

   with open("trainval.txt", "w") as wr:
     for i in range(len(trainval_jpeg_list)):
       wr.write("{} {}\n".format(trainval_jpeg_list[i], trainval_xml_list[i]))

  -with open("test_2007.txt", "w") as wr:
  -  for i in range(len(test07_jpeg_list)):
  -    wr.write("{} {}\n".format(test07_jpeg_list[i], test07_xml_list[i]))
  -
  -with open("test_2012.txt", "w") as wr:
  -  for i in range(len(test12_jpeg_list)):
  -    wr.write("{}\n".format(test12_jpeg_list[i]))
  +with open("test.txt", "w") as wr:
  +  for i in range(len(test_jpeg_list)):
  +    wr.write("{} {}\n".format(test_jpeg_list[i], test_xml_list[i]))
  • 修改convert.sh
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
  diff --git a/data/yolo/convert.sh b/data/yolo/convert.sh
  index 8a52525..a06eb69 100755
  --- a/data/yolo/convert.sh
  +++ b/data/yolo/convert.sh
  @@ -1,7 +1,7 @@
   #!/usr/bin/env sh

   CAFFE_ROOT=../..
  -ROOT_DIR=/your/path/to/vocroot/
  +ROOT_DIR=/home/m/data/
   LABEL_FILE=$CAFFE_ROOT/data/yolo/label_map.txt

   # 2007 + 2012 trainval
  @@ -10,13 +10,15 @@ LMDB_DIR=./lmdb/trainval_lmdb
   SHUFFLE=true

   # 2007 test
  -# LIST_FILE=$CAFFE_ROOT/data/yolo/test_2007.txt
  -# LMDB_DIR=./lmdb/test2007_lmdb
  +# LIST_FILE=$CAFFE_ROOT/data/yolo/test.txt
  +# LMDB_DIR=./lmdb/test_lmdb
   # SHUFFLE=false

   RESIZE_W=448
   RESIZE_H=448

   $CAFFE_ROOT/build/tools/convert_box_data --resize_width=$RESIZE_W --resize_height=$RESIZE_H \
     --label_file=$LABEL_FILE $ROOT_DIR $LIST_FILE $LMDB_DIR --encoded=true --encode_type=jpg --shuffle=$SHUFFLE
  • 修改label_map.txt:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
  diff --git a/data/yolo/label_map.txt b/data/yolo/label_map.txt
  index 1fe873a..bee8f82 100644
  --- a/data/yolo/label_map.txt
  +++ b/data/yolo/label_map.txt
  @@ -1,20 +1,3 @@
  -aeroplane 0
  -bicycle 1
  -bird 2
  -boat 3
  -bottle 4
  -bus 5
  -car 6
  -cat 7
  -chair 8
  -cow 9
  -diningtable 10
  -dog 11
  -horse 12
  -motorbike 13
  -person 14
  -pottedplant 15
  -sheep 16
  -sofa 17
  -train 18
  -tvmonitor 19
  +ball 0
  +goal 1
  +robot 2

Train

1
2
3
4
  cd examples/yolo
  # change related path in script train.sh
  mkdir models
  nohup ./train.sh &

也有一些注意点:

  • 修改gnet_train.prototxt:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
  diff --git a/examples/yolo/gnet_train.prototxt b/examples/yolo/gnet_train.prototxt
  index 8483a32..da01daf 100644
  --- a/examples/yolo/gnet_train.prototxt
  +++ b/examples/yolo/gnet_train.prototxt
  @@ -36,7 +36,7 @@ layer {
       mean_value: 123
     }
     data_param {
  -    source: "../../data/yolo/lmdb/test2007_lmdb"
  +    source: "../../data/yolo/lmdb/test_lmdb"
       batch_size: 1
       side: 7
       backend: LMDB
  • 修改train.sh:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
  diff --git a/examples/yolo/train.sh b/examples/yolo/train.sh
  index 416e2b0..ecd0872 100755
  --- a/examples/yolo/train.sh
  +++ b/examples/yolo/train.sh
  @@ -3,8 +3,7 @@
   CAFFE_HOME=../..

   SOLVER=./gnet_solver.prototxt
  -WEIGHTS=/your/path/to/bvlc_googlenet.caffemodel
  +WEIGHTS=/home/m/workspace/caffe-yolo/models/bvlc_googlenet/bvlc_googlenet.caffemodel

   $CAFFE_HOME/build/tools/caffe train \
  -    --solver=$SOLVER --weights=$WEIGHTS --gpu=0,1
  +    --solver=$SOLVER --weights=$WEIGHTS --gpu=0
  • 注意还要预先下载 GoogleNet 的预训练权重文件, 并且放在caffe-yolo/models/bvlc_googlenet/(当然放哪里是随便的, 注意改train.sh中的相应地址即可).

Test

1
2
3
  # if everything goes well, the map of gnet_yolo_iter_32000.caffemodel may reach ~56.
  cd examples/yolo
  ./test.sh model_path gpu_id

(To be continued)