Train Caffe-YOLO on our own dataset

经过这几天不断地测试, YOLO 在 TX1 上跑得还是挺不错的, 符合我们实验室的要求. 但是 YOLO 依赖的 Darknet 框架还是太原始了, 不如 TensorFlow 或者 Caffe 用着顺手. 另外, 我负责的目标检测这一块还需要和梅老板写的新框架相结合, 所以更加需要把 YOLO 移植到一个成熟的框架上去.

很幸运的是, YOLO 在各个框架上的移植都有前人做过了, 比如 darktfcaffe-yolo. 今天以 caffe-yolo 为例, 谈一下在其上使用自己的数据集来训练.

Reformat our dataset as PASCAL VOC style

为了之后的方便起见, 首先将我们的数据集转成 PASCAL VOC 的标准的目录格式.

Structure of PASCAL VOC dataset

其目录结构如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
.
├── VOC2007
│   ├── Annotations
│   ├── ImageSets
│   ├── JPEGImages
│   ├── SegmentationClass
│   └── SegmentationObject
└── VOC2012
├── Annotations
├── ImageSets
├── JPEGImages
├── SegmentationClass
└── SegmentationObject

其中Annotations目录放的是.xml文件, JPEGImages目录中存放的是对应的.jpg图像. 由于我们不做语义分割, 所以SegmentationClassSegmentationObject对我们没什么用.

ImageSets目录中结构如下, 主要关注的是Main文件夹中的trainval.txt, train.txt , val.txt以及test.txt四个文件.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
.
├── Layout
│   ├── test.txt
│   ├── train.txt
│   ├── trainval.txt
│   └── val.txt
├── Main
│   ├── aeroplane_test.txt
│   ├── aeroplane_train.txt
│   ├── aeroplane_trainval.txt
│   ├── aeroplane_val.txt
│   ├── ...
│   ├── test.txt
│   ├── train.txt
│   ├── trainval.txt
│   └── val.txt
└── Segmentation
├── test.txt
├── train.txt
├── trainval.txt
└── val.txt

Reformat our dataset

首先是把之前杂乱的图片文件名重新整理, 如下所示:

1
2
3
4
5
6
7
8
.
├── image00001.jpg
├── image00002.jpg
├── image00012.jpg
├── ...
├── image04524.jpg
├── image04525.jpg
└── image04526.jpg

随后用labelImg重新标注这些图. 标注完成后, 建立我们自己的数据集的结构, 并且将图片和标注放到对应的文件夹里:

1
2
3
4
5
6
7
8
9
10
11
.
├── ROB2017
│   ├── Annotations
│   ├── ImageSets
│   ├── JPEGImages
│   └── JPEGImages_original
└── scripts
├── clean.py
├── conf.json
├── convert_png2jpg.py
└── split_dataset.py

之后写了几个脚本, 其中clean.py用来清理未标注的图片; split_dataset.py用来分割训练集验证集测试集, 并且保存到ImageSets/Main中.

至此, 把我们的数据集转成 PASCAL VOC 标准目录的工作就完成了, 可以进行下一步的训练工作.

Train YOLO on Caffe

Clone & Make

1
2
3
4
$ git clone https://github.com/yeahkun/caffe-yolo.git
$ cd caffe-yolo
$ cp Makefile.config.example Makefile.config
$ make -j8

若是出现src/caffe/net.cpp:8:18: fatal error: hdf5.h: No such file or directory这一错误, 可以照下文修改Makefile.config文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
diff --git a/Makefile.config b/Makefile.config
index a873502..88828cc 100644
--- a/Makefile.config.example
+++ b/Makefile.config.example
@@ -69,8 +69,8 @@ PYTHON_LIB := /usr/lib
# WITH_PYTHON_LAYER := 1
# Whatever else you find you need goes here.
-INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include
-LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib
+INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial/
+LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu/hdf5/serial/
# If Homebrew is installed at a non standard location (for example your home directory) and you use it for general dependencies
# INCLUDE_DIRS += $(shell brew --prefix)/include

同时还可以开启 cuDNN 以及修改 compute, 充分发挥 GTX1080 的性能:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
## Refer to http://caffe.berkeleyvision.org/installation.html
# Contributions simplifying and improving our build system are welcome!
# cuDNN acceleration switch (uncomment to build with cuDNN).
-# USE_CUDNN := 1
+USE_CUDNN := 1
# CPU-only switch (uncomment to build without GPU support).
# CPU_ONLY := 1
...
# CUDA architecture setting: going with all of them.
# For CUDA < 6.0, comment the *_50 lines for compatibility.
CUDA_ARCH := -gencode arch=compute_20,code=sm_20 \
-gencode arch=compute_20,code=sm_21 \
-gencode arch=compute_30,code=sm_30 \
-gencode arch=compute_35,code=sm_35 \
-gencode arch=compute_50,code=sm_50 \
- -gencode arch=compute_50,code=compute_50
+ -gencode arch=compute_50,code=compute_50 \
+ -gencode arch=compute_61,code=compute_61

Data preparation

1
2
3
4
5
6
7
$ cd data/yolo
$ ln -s /your/path/to/VOCdevkit/ .
$ python ./get_list.py
# change related path in script convert.sh
$ sudo rm -r lmdb
$ mkdir lmdb
$ ./convert.sh

有一些注意点:

  • 记得将ln -s /your/path/to/VOCdevkit/ .中的/your/path/to/VOCdevkit/换成自己数据集的路径, 例如ln -s ~/data/ROBdevkit/ .

  • 修改./get_list.py:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    diff --git a/data/yolo/get_list.py b/data/yolo/get_list.py
    index f519f1a..73b9858 100755
    --- a/data/yolo/get_list.py
    +++ b/data/yolo/get_list.py
    @@ -3,12 +3,15 @@ import os
    trainval_jpeg_list = []
    trainval_xml_list = []
    -test07_jpeg_list = []
    -test07_xml_list = []
    -test12_jpeg_list = []
    -
    -for name in ["VOC2007", "VOC2012"]:
    - voc_dir = os.path.join("VOCdevkit", name)
    +test_jpeg_list = []
    +test_xml_list = []
    +
    +for name in ['ROB2017']:
    + # voc_dir = os.path.join("VOCdevkit", name)
    + voc_dir = os.path.join('ROBdevkit', name)
    txt_fold = os.path.join(voc_dir, "ImageSets/Main")
    jpeg_fold = os.path.join(voc_dir, "JPEGImages")
    xml_fold = os.path.join(voc_dir, "Annotations")
    @@ -23,35 +26,49 @@ for name in ["VOC2007", "VOC2012"]:
    print trainval_jpeg_list[-1], "not exist"
    if not os.path.exists(trainval_xml_list[-1]):
    print trainval_xml_list[-1], "not exist"
    - if name == "VOC2007":
    - file_path = os.path.join(txt_fold, "test.txt")
    - with open(file_path, 'r') as fp:
    - for line in fp:
    - line = line.strip()
    - test07_jpeg_list.append(os.path.join(jpeg_fold, "{}.jpg".format(line)))
    - test07_xml_list.append(os.path.join(xml_fold, "{}.xml".format(line)))
    - if not os.path.exists(test07_jpeg_list[-1]):
    - print test07_jpeg_list[-1], "not exist"
    - if not os.path.exists(test07_xml_list[-1]):
    - print test07_xml_list[-1], "not exist"
    - elif name == "VOC2012":
    + if name == "ROB2017":
    file_path = os.path.join(txt_fold, "test.txt")
    with open(file_path, 'r') as fp:
    for line in fp:
    line = line.strip()
    - test12_jpeg_list.append(os.path.join(jpeg_fold, "{}.jpg".format(line)))
    - if not os.path.exists(test12_jpeg_list[-1]):
    - print test12_jpeg_list[-1], "not exist"
    + test_jpeg_list.append(os.path.join(jpeg_fold, "{}.jpg".format(line)))
    + test_xml_list.append(os.path.join(xml_fold, "{}.xml".format(line)))
    + if not os.path.exists(test_jpeg_list[-1]):
    + print test_jpeg_list[-1], "not exist"
    + if not os.path.exists(test_xml_list[-1]):
    + print test_xml_list[-1], "not exist"
    with open("trainval.txt", "w") as wr:
    for i in range(len(trainval_jpeg_list)):
    wr.write("{} {}\n".format(trainval_jpeg_list[i], trainval_xml_list[i]))
    -with open("test_2007.txt", "w") as wr:
    - for i in range(len(test07_jpeg_list)):
    - wr.write("{} {}\n".format(test07_jpeg_list[i], test07_xml_list[i]))
    -
    -with open("test_2012.txt", "w") as wr:
    - for i in range(len(test12_jpeg_list)):
    - wr.write("{}\n".format(test12_jpeg_list[i]))
    +with open("test.txt", "w") as wr:
    + for i in range(len(test_jpeg_list)):
    + wr.write("{} {}\n".format(test_jpeg_list[i], test_xml_list[i]))
  • 修改convert.sh

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    diff --git a/data/yolo/convert.sh b/data/yolo/convert.sh
    index 8a52525..a06eb69 100755
    --- a/data/yolo/convert.sh
    +++ b/data/yolo/convert.sh
    @@ -1,7 +1,7 @@
    #!/usr/bin/env sh
    CAFFE_ROOT=../..
    -ROOT_DIR=/your/path/to/vocroot/
    +ROOT_DIR=/home/m/data/
    LABEL_FILE=$CAFFE_ROOT/data/yolo/label_map.txt
    # 2007 + 2012 trainval
    @@ -10,13 +10,15 @@ LMDB_DIR=./lmdb/trainval_lmdb
    SHUFFLE=true
    # 2007 test
    -# LIST_FILE=$CAFFE_ROOT/data/yolo/test_2007.txt
    -# LMDB_DIR=./lmdb/test2007_lmdb
    +# LIST_FILE=$CAFFE_ROOT/data/yolo/test.txt
    +# LMDB_DIR=./lmdb/test_lmdb
    # SHUFFLE=false
    RESIZE_W=448
    RESIZE_H=448
    $CAFFE_ROOT/build/tools/convert_box_data --resize_width=$RESIZE_W --resize_height=$RESIZE_H \
    --label_file=$LABEL_FILE $ROOT_DIR $LIST_FILE $LMDB_DIR --encoded=true --encode_type=jpg --shuffle=$SHUFFLE
  • 修改label_map.txt:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    diff --git a/data/yolo/label_map.txt b/data/yolo/label_map.txt
    index 1fe873a..bee8f82 100644
    --- a/data/yolo/label_map.txt
    +++ b/data/yolo/label_map.txt
    @@ -1,20 +1,3 @@
    -aeroplane 0
    -bicycle 1
    -bird 2
    -boat 3
    -bottle 4
    -bus 5
    -car 6
    -cat 7
    -chair 8
    -cow 9
    -diningtable 10
    -dog 11
    -horse 12
    -motorbike 13
    -person 14
    -pottedplant 15
    -sheep 16
    -sofa 17
    -train 18
    -tvmonitor 19
    +ball 0
    +goal 1
    +robot 2

Train

1
2
3
4
cd examples/yolo
# change related path in script train.sh
mkdir models
nohup ./train.sh &

也有一些注意点:

  • 修改gnet_train.prototxt:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    diff --git a/examples/yolo/gnet_train.prototxt b/examples/yolo/gnet_train.prototxt
    index 8483a32..da01daf 100644
    --- a/examples/yolo/gnet_train.prototxt
    +++ b/examples/yolo/gnet_train.prototxt
    @@ -36,7 +36,7 @@ layer {
    mean_value: 123
    }
    data_param {
    - source: "../../data/yolo/lmdb/test2007_lmdb"
    + source: "../../data/yolo/lmdb/test_lmdb"
    batch_size: 1
    side: 7
    backend: LMDB
  • 修改train.sh:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    diff --git a/examples/yolo/train.sh b/examples/yolo/train.sh
    index 416e2b0..ecd0872 100755
    --- a/examples/yolo/train.sh
    +++ b/examples/yolo/train.sh
    @@ -3,8 +3,7 @@
    CAFFE_HOME=../..
    SOLVER=./gnet_solver.prototxt
    -WEIGHTS=/your/path/to/bvlc_googlenet.caffemodel
    +WEIGHTS=/home/m/workspace/caffe-yolo/models/bvlc_googlenet/bvlc_googlenet.caffemodel
    $CAFFE_HOME/build/tools/caffe train \
    - --solver=$SOLVER --weights=$WEIGHTS --gpu=0,1
    + --solver=$SOLVER --weights=$WEIGHTS --gpu=0
  • 注意还要预先下载 GoogleNet 的预训练权重文件, 并且放在caffe-yolo/models/bvlc_googlenet/(当然放哪里是随便的, 注意改train.sh中的相应地址即可).

Test

1
2
3
# if everything goes well, the map of gnet_yolo_iter_32000.caffemodel may reach ~56.
cd examples/yolo
./test.sh model_path gpu_id

(To be continued)