Skip to the content.

Surgical Scene Segmentation Using Semantic Image Synthesis with a Virtual Surgery Environment

This work has been dony by Jihun Yoon1(yjh2020@hutom.io), SeulGi Hong1(sghong@hutom.io), Seungbum Hong1, Jiwon Lee1, Soyeon Shin1, Bokyung Park1, Nakjun Sung1, Hayeong Yu1, Sungjae Kim1, SungHyun Park2, Woo Jin Hyung1,2, and Min-Kook Choi1†(mkchoi@hutom.io) from 1) Hutom, Seoul, Republic of Korea, 2) Department of Surgery, Yonsei University College of Medicine

main figure The schematic diagram of surgical scene segmentation using semantic image synthesis with a virtual surgery environment


Baseline Model Inference on a Gastrectomy Test Video

Abstract

Several researchers conducted image synthesis research for surgical vision to decrease training data generation costs. However, the previous works have limited results for real-world applications with simple simulators, including only a few organs and surgical tools, and outdated segmentation models to evaluate the quality of the image. Furthermore, none of the research releases complete data sets to the public to enable further study. Therefore, we provide novel methods and extensive experiments for surgical scene segmentation using semantic image synthesis with a complex virtual surgery environment. In addition, we release our data set to encourage further study. First, we created three cross-validation sets of real image data for baselines while alleviating class-imbalanced problem. Second, we created a virtual surgery environment in the unity engine with five organs from real patient CT data and 22 the da Vinci surgical instruments from actual measurements. Third, We convert this environment photo-realistically with representative semantic image synthesis models, SEAN and SPADE. Lastly we evaluate it with various state-of-the-art instance and semantic segmentation models and succeed in highly improving our segmentation models with the help of synthetic training data.

How this work is different with exisiting works?

Research How many classes (#) to translate? Recognition models (#) Real image data Do they provide real iamge data with annotations?
[11] Liver class only (1) Semantic segmentation (1) Re-annotate Chole80 X
[12] Liver class only (1) Semantic segmentation (1) Re-annotate Chole80 X
[13] Laparoscopic tools (5) Semantic segmentation (1) Own laparoscopic data X
Our work Laparoscopic/Robotic tools (22) + Organs(5) Instance segmentation (3) + Semantic segmentation (2) Own robotic gasstrectomy data O

Data Description

Original Data

Example images

Data Type Image
Real(R)
Manual Synthetic(MS)
Domain Randomized Synthetic(DRS)

Semantic Image Synthesis Data

Example images

Data Type Original SEAN SPADE
MS
DRS

Data Statistics(Count)

Class R1 MS DRS SIS(DRS)
Harmonic Ace Head 1313 289 591 539
Harmonic Ace Body 1267 297 766 559
Maryland Bipolar Forceps Head 1454 297 580 593
Maryland Bipolar Forceps Wrist 1092 286 577 466
Maryland Bipolar Forceps Body 672 273 770 396
Cadiere Forceps Head 1083 515 0 404
Cadiere Forceps Wrist 892 441 0 323
Cadiere Forceps Body 850 407 0 303
Curved Atraumatic Grasper Head 700 592 887 230
Curved Atraumatic Grasper Body 787 591 1053 267
Stapler Head 328 293 646 247
Stapler Body 305 298 879 291
Medium Large Clip Applier Head 287 300 607 212
Medium Large Clip Applier Wrist 230 299 608 190
Medium Large Clip Applier Body 140 287 778 240
Small Clip Applier Head 277 300 540 198
Small Clip Applier Wrist 260 300 544 191
Small Clip Applier Body 183 299 742 203
Suction 286 298 779 238
Needle 286 299 609 256
Endotip 298 300 820 249
Specimenbag 506 0 0 201
DrainTube 304 300 794 246
Liver 2779 3143 349 1047
Stomach 2252 3299 355 821
Pancreas 1450 3165 301 529
Spleen 274 3016 328 95
Gallbladder 815 2159 246 300
Gauze 2701 0 0 1007
The Other Instruments 1435 0 0 523
The Other Tissues 3367 0 0 1236
Background 3375 3300 1228 1236

Top-8 classwise Performance(Real vs Synthetic)

Relative synthetic data set performance compared to real data set. The peformance is calauted by

Instance Segmentation

Semantic Segmentation

Data Download

Model Zoo

You can download models and test them.

Semantic Segmentation(link)

Algorithm Backbone Data mIoU/mAcc/aAcc
DeepLab V3+ ResNeSt R1 74.68/
82.99/
87.72
DeepLab V3+ ResNeSt R1+SEAN(MS) 75.58(+0.9)/
83.81(+0.82)/
88.09(+0.37)

Instance Segmentation(link)

Algorithm Backbone Data bboxAP/maskAP
Hybrid Task Cascade for Instance Segmentation Resnet101-FPN R1 53.9/
55.0
Hybrid Task Cascade for Instance Segmentation Resnet101-FPN R1+SEAN(MS) 54.3(+0.4)/
57.2(+2.2)
Cascade Mask R-CNN Resnet101-FPN R1 51.2/51.0
Cascade Mask R-CNN Resnet101-FPN R1+SPADE(MS+DRS) 52.5(+1.3)/
53.6(+2.6)

Baseline Models

Semantic Segmentation

Installation

Test

Evaluatiton Metrics

Instance Segmentation

Installation

Test

Evaluatiton Metrics