기계 번역으로 제공되는 번역입니다. 제공된 번역과 원본 영어의 내용이 상충하는 경우에는 영어 버전이 우선합니다.

# 다른 데이터 세트 형식을 매니페스트 파일로 변환
<a name="md-converting-to-sm-format"></a>

다음 정보를 사용하여 다양한 소스 데이터 세트 형식에서 Amazon SageMaker AI 형식 매니페스트 파일을 생성할 수 있습니다. 매니페스트 파일을 생성한 후 이를 사용하여 데이터 세트에 생성합니다. 자세한 내용은 [매니페스트 파일을 사용하여 이미지 가져오기](md-create-dataset-ground-truth.md) 단원을 참조하십시오.

**Topics**
+ [COCO 데이터세트를 매니페스트 파일 형식으로 변환](md-transform-coco.md)
+ [다중 레이블 SageMaker AI Ground Truth 매니페스트 파일 변환](md-gt-cl-transform.md)
+ [CSV 파일로 매니페스트 파일 생성](ex-csv-manifest.md)

# COCO 데이터세트를 매니페스트 파일 형식으로 변환
<a name="md-transform-coco"></a>

[COCO](http://cocodataset.org/#home)는 대규모 객체 감지, 세분화, 캡션 데이터 세트를 지정하는 데 사용되는 형식입니다. 이 Python [예제](md-coco-transform-example.md)는 COCO 객체 감지 형식 데이터 세트를 Amazon Rekognition Custom Labels [경계 상자 형식 매니페스트 파일](md-create-manifest-file-object-detection.md)로 변환하는 방법을 보여줍니다. 이 항목에는 직접 코드를 작성하는 데 사용할 수 있는 정보도 포함되어 있습니다.

COCO 형식 JSON 파일은 전체 데이터 세트**에 대한 정보를 제공하는 5개 항목으로 구성되어 있습니다. 자세한 내용은 [COCO 데이터세트 형식](md-coco-overview.md) 단원을 참조하십시오.
+ `info`: 데이터 세트에 대한 일반 정보 
+ `licenses `: 데이터 세트의 이미지에 대한 라이선스 정보
+ [`images`](md-coco-overview.md#md-coco-images): 데이터 세트의 이미지 목록
+ [`annotations`](md-coco-overview.md#md-coco-annotations): 데이터 세트의 모든 이미지에 있는 주석 목록(테두리 상자 포함)
+ [`categories`](md-coco-overview.md#md-coco-categories): 레이블 카테고리 목록

Amazon Rekognition Custom Labels 매니페스트 파일을 생성하려면 `images`, `annotations` 및 `categories` 목록의 정보가 필요합니다.

Amazon Rekognition Custom Labels 매니페스트 파일은 JSON 라인 형식이며, 각 줄에는 이미지에 있는** 하나 이상의 객체에 대한 경계 상자와 레이블 정보가 있습니다. 자세한 내용은 [매니페스트 파일의 객체 위치 파악](md-create-manifest-file-object-detection.md) 단원을 참조하십시오.

## COCO 객체를 사용자 지정 레이블 JSON 라인에 매핑
<a name="md-mapping-coco"></a>

COCO 형식 데이터 세트를 변환하려면 객체 위치 파악을 위해 COCO 데이터 세트를 Amazon Rekognition Custom Labels 매니페스트 파일에 매핑하세요. 자세한 내용은 [매니페스트 파일의 객체 위치 파악](md-create-manifest-file-object-detection.md) 단원을 참조하십시오. 각 이미지에 대한 JSON 라인을 구축하려면 매니페스트 파일에 COCO 데이터 세트 `image`, `annotation`, `category` 객체 필드 ID를 매핑해야 합니다.

다음은 COCO 매니페스트 파일의 예제입니다. 자세한 내용은 [COCO 데이터세트 형식](md-coco-overview.md) 단원을 참조하십시오.

```
{
    "info": {
        "description": "COCO 2017 Dataset","url": "http://cocodataset.org","version": "1.0","year": 2017,"contributor": "COCO Consortium","date_created": "2017/09/01"
    },
    "licenses": [
        {"url": "http://creativecommons.org/licenses/by/2.0/","id": 4,"name": "Attribution License"}
    ],
    "images": [
        {"id": 242287, "license": 4, "coco_url": "http://images.cocodataset.org/val2017/xxxxxxxxxxxx.jpg", "flickr_url": "http://farm3.staticflickr.com/2626/xxxxxxxxxxxx.jpg", "width": 426, "height": 640, "file_name": "xxxxxxxxx.jpg", "date_captured": "2013-11-15 02:41:42"},
        {"id": 245915, "license": 4, "coco_url": "http://images.cocodataset.org/val2017/nnnnnnnnnnnn.jpg", "flickr_url": "http://farm1.staticflickr.com/88/xxxxxxxxxxxx.jpg", "width": 640, "height": 480, "file_name": "nnnnnnnnnn.jpg", "date_captured": "2013-11-18 02:53:27"}
    ],
    "annotations": [
        {"id": 125686, "category_id": 0, "iscrowd": 0, "segmentation": [[164.81, 417.51,......167.55, 410.64]], "image_id": 242287, "area": 42061.80340000001, "bbox": [19.23, 383.18, 314.5, 244.46]},
        {"id": 1409619, "category_id": 0, "iscrowd": 0, "segmentation": [[376.81, 238.8,........382.74, 241.17]], "image_id": 245915, "area": 3556.2197000000015, "bbox": [399, 251, 155, 101]},
        {"id": 1410165, "category_id": 1, "iscrowd": 0, "segmentation": [[486.34, 239.01,..........495.95, 244.39]], "image_id": 245915, "area": 1775.8932499999994, "bbox": [86, 65, 220, 334]}
    ],
    "categories": [
        {"supercategory": "speaker","id": 0,"name": "echo"},
        {"supercategory": "speaker","id": 1,"name": "echo dot"}
    ]
}
```

다음 다이어그램은 데이터 세트**의 COCO 데이터세트 목록이 Amazon Rekognition Custom Labels JSON 라인에 매핑되는 이미지**를 보여줍니다. 이미지의 모든 JSON 라인에는 소스 참조, 작업, 작업 메타데이터 필드가 있습니다. 일치하는 색상은 단일 이미지에 대한 정보를 나타냅니다. 매니페스트에서 각 이미지에 여러 주석과 메타데이터/범주가 있을 수 있습니다.

![\[이미지, 주석, 범주가 포함된 Coco 매니페스트의 구조를 보여주는 다이어그램입니다.\]](http://docs.aws.amazon.com/ko_kr/rekognition/latest/customlabels-dg/images/coco-transform.png)


**단일 JSON 라인에 대한 COCO 객체를 가져오려면**

1. 이미지 목록의 각 이미지에 대해 주석 필드 `image_id`의 값이 이미지 `id` 필드와 일치하는 주석 목록에서 주석을 가져옵니다.

1. 1단계에서 일치하는 각 주석에 대해 `categories` 목록을 읽고 `category` 필드 `id` 값이 `annotation` 객체 `category_id` 필드와 일치하는 `category`를 각각 가져옵니다.

1. 일치하는 `image`, `annotation`, `category` 객체를 사용하여 이미지의 JSON 라인을 생성합니다. 필드를 매핑하려면 [COCO 객체 필드를 사용자 지정 레이블 JSON 라인 객체 필드에 매핑하기](#md-mapping-fields-coco) 항목을 참조하세요.

1. `images` 목록의 각 `image` 객체에 대해 JSON 라인을 생성할 때까지 1\$13단계를 반복합니다.

예제 코드는 [COCO 데이터 세트 변환](md-coco-transform-example.md) 항목을 참조하세요.

## COCO 객체 필드를 사용자 지정 레이블 JSON 라인 객체 필드에 매핑하기
<a name="md-mapping-fields-coco"></a>

Amazon Rekognition Custom Labels JSON 라인의 COCO 객체를 식별한 후에는 COCO 객체 필드를 상응하는 Amazon Rekognition Custom Labels JSON 라인 객체 필드에 매핑해야 합니다. 다음 예제 Amazon Rekognition Custom Labels JSON 라인은 하나의 이미지(`id`=`000000245915`)를 위의 COCO JSON 예제에 매핑합니다. 다음 정보를 참고하세요.
+ `source-ref`는 Amazon S3 버킷의 이미지 위치입니다. Amazon S3 버킷에 COCO 이미지가 저장되지 않은 경우, 이미지를 Amazon S3 버킷으로 이동해야 합니다.
+ `annotations` 목록에는 이미지의 각 객체에 대해 `annotation` 객체가 포함되어 있습니다. `annotation` 객체에는 경계 상자 정보(`top`, `left`, `width`, `height`) 및 레이블 식별자(`class_id`)가 포함됩니다.
+ 레이블 식별자(`class_id`)는 메타데이터의 `class-map` 목록에 매핑됩니다. 그것은 이미지에 사용된 레이블을 나열합니다.

```
{
	"source-ref": "s3://custom-labels-bucket/images/000000245915.jpg",
	"bounding-box": {
		"image_size": {
			"width": 640,
			"height": 480,
			"depth": 3
		},
		"annotations": [{
			"class_id": 0,
			"top": 251,
			"left": 399,
			"width": 155,
			"height": 101
		}, {
			"class_id": 1,
			"top": 65,
			"left": 86,
			"width": 220,
			"height": 334
		}]
	},
	"bounding-box-metadata": {
		"objects": [{
			"confidence": 1
		}, {
			"confidence": 1
		}],
		"class-map": {
			"0": "Echo",
			"1": "Echo Dot"
		},
		"type": "groundtruth/object-detection",
		"human-annotated": "yes",
		"creation-date": "2018-10-18T22:18:13.527256",
		"job-name": "my job"
	}
}
```

다음 정보를 사용하여 Amazon Rekognition Custom Labels 매니페스트 파일 필드를 COCO 데이터 세트 JSON 필드에 매핑할 수 있습니다.

### source-ref
<a name="md-source-ref-coco"></a>

이미지 위치의 S3 형식 URL입니다. 이미지는 S3 버킷에 저장되어야 합니다. 자세한 내용은 [source-ref](md-create-manifest-file-object-detection.md#cd-manifest-source-ref) 단원을 참조하십시오. `coco_url` COCO 필드가 S3 버킷 위치를 가리키는 경우 `coco_url`의 값을 `source-ref`의 값으로 사용할 수 있습니다. 또는 `file_name`(COCO) 필드에 `source-ref`를 매핑하고, 변환 코드에서 이미지가 저장되는 위치에 필요한 S3 경로를 추가할 수 있습니다.

### *bounding-box*
<a name="md-label-attribute-id-coco"></a>

사용자가 선택한 레이블 속성 이름 자세한 내용은 [*bounding-box*](md-create-manifest-file-object-detection.md#md-manifest-source-bounding-box) 단원을 참조하십시오.

#### image\$1size
<a name="md-image-size-coco"></a>

이미지 크기(픽셀 단위) [이미지](md-coco-overview.md#md-coco-images) 목록의 `image` 객체에 매핑됩니다.
+ `height`-> `image.height`
+ `width`-> `image.width`
+ `depth`-> Amazon Rekognition Custom Labels에는 사용되지 않지만 값을 입력해야 합니다.

#### 주석
<a name="md-annotations-coco"></a>

`annotation` 객체의 목록. 이미지의 각 객체마다 `annotation`이 하나씩 있습니다.

#### annotation
<a name="md-annotation-coco"></a>

이미지에 있는 객체의 한 인스턴스에 대한 경계 상자 정보가 들어 있습니다.
+ `class_id` -> 사용자 지정 레이블의 `class-map` 목록에 매핑되는 숫자 ID
+ `top` -> `bbox[1]`
+ `left` -> `bbox[0]`
+ `width` -> `bbox[2]`
+ `height` -> `bbox[3]`

### *bounding-box*-metadata
<a name="md-metadata-coco"></a>

레이블 속성의 메타데이터 레이블 및 레이블 식별자를 포함합니다. 자세한 내용은 [*bounding-box*-metadata](md-create-manifest-file-object-detection.md#md-manifest-source-bounding-box-metadata) 단원을 참조하십시오.

#### Objects
<a name="cd-metadata-objects-coco"></a>

이미지에 있는 객체의 배열입니다. 인덱스를 기준으로 `annotations` 목록에 매핑됩니다.

##### 객체
<a name="cd-metadata-object-coco"></a>
+ `confidence`->Amazon Rekognition Custom Labels에는 사용되지 않지만 값(1)이 필요합니다.

#### class-map
<a name="md-metadata-class-map-coco"></a>

이미지에서 감지된 객체에 적용되는 레이블(클래스)의 맵입니다. [카테고리](md-coco-overview.md#md-coco-categories) 목록에 있는 카테고리 개체에 매핑됩니다.
+ `id` -> `category.id`
+ `id value` -> `category.name`

#### type
<a name="md-type-coco"></a>

`groundtruth/object-detection`이어야 합니다.

#### human-annotated
<a name="md-human-annotated-coco"></a>

`yes` 또는 `no`을 지정합니다. 자세한 내용은 [*bounding-box*-metadata](md-create-manifest-file-object-detection.md#md-manifest-source-bounding-box-metadata) 단원을 참조하십시오.

#### creation-date -> [image](md-coco-overview.md#md-coco-images).date\$1captured
<a name="md-creation-date-coco"></a>

이미지가 생성된 날짜 및 시간입니다. COCO 이미지 목록에 있는 이미지의 [image](md-coco-overview.md#md-coco-images).date\$1captured 필드에 매핑됩니다. Amazon Rekognition Custom Labels의 `creation-date`의 형식은 Y-M-DTH:MS:S**일 것으로 예상합니다.

#### job-name
<a name="md-job-name-coco"></a>

사용자가 선택한 직무 이름 

# COCO 데이터세트 형식
<a name="md-coco-overview"></a>

COCO 데이터 세트는 전체 데이터 세트에 대한 정보를 제공하는 다섯 개의 항목으로 구성됩니다. COCO 객체 감지 데이터 세트의 형식은 [COCO 데이터 형식](http://cocodataset.org/#format-data)에 문서화되어 있습니다.
+ 정보: 데이터 세트에 대한 일반 정보입니다.
+ 라이선스: 데이터 세트의 이미지에 대한 라이선스 정보입니다.
+ [이미지](#md-coco-images): 데이터 세트에 있는 이미지 목록
+ [주석](#md-coco-annotations): 데이터 세트의 모든 이미지에 있는 주석(경계 상자 포함)의 목록
+ [카테고리](#md-coco-categories): 레이블 카테고리 목록

사용자 지정 레이블 매니페스트를 만들려면 COCO 매니페스트 파일의 `images`, `annotations`, `categories` 목록을 사용하세요. 다른 항목(`info`, `licences`)은 필수가 아닙니다. 다음은 COCO 매니페스트 파일의 예제입니다.

```
{
    "info": {
        "description": "COCO 2017 Dataset","url": "http://cocodataset.org","version": "1.0","year": 2017,"contributor": "COCO Consortium","date_created": "2017/09/01"
    },
    "licenses": [
        {"url": "http://creativecommons.org/licenses/by/2.0/","id": 4,"name": "Attribution License"}
    ],
    "images": [
        {"id": 242287, "license": 4, "coco_url": "http://images.cocodataset.org/val2017/xxxxxxxxxxxx.jpg", "flickr_url": "http://farm3.staticflickr.com/2626/xxxxxxxxxxxx.jpg", "width": 426, "height": 640, "file_name": "xxxxxxxxx.jpg", "date_captured": "2013-11-15 02:41:42"},
        {"id": 245915, "license": 4, "coco_url": "http://images.cocodataset.org/val2017/nnnnnnnnnnnn.jpg", "flickr_url": "http://farm1.staticflickr.com/88/xxxxxxxxxxxx.jpg", "width": 640, "height": 480, "file_name": "nnnnnnnnnn.jpg", "date_captured": "2013-11-18 02:53:27"}
    ],
    "annotations": [
        {"id": 125686, "category_id": 0, "iscrowd": 0, "segmentation": [[164.81, 417.51,......167.55, 410.64]], "image_id": 242287, "area": 42061.80340000001, "bbox": [19.23, 383.18, 314.5, 244.46]},
        {"id": 1409619, "category_id": 0, "iscrowd": 0, "segmentation": [[376.81, 238.8,........382.74, 241.17]], "image_id": 245915, "area": 3556.2197000000015, "bbox": [399, 251, 155, 101]},
        {"id": 1410165, "category_id": 1, "iscrowd": 0, "segmentation": [[486.34, 239.01,..........495.95, 244.39]], "image_id": 245915, "area": 1775.8932499999994, "bbox": [86, 65, 220, 334]}
    ],
    "categories": [
        {"supercategory": "speaker","id": 0,"name": "echo"},
        {"supercategory": "speaker","id": 1,"name": "echo dot"}
    ]
}
```

## 이미지 목록
<a name="md-coco-images"></a>

COCO 데이터 세트에서 참조하는 이미지는 이미지 배열에 나열됩니다. 각 이미지 객체에는 이미지 파일 이름과 같은 이미지에 대한 정보가 들어 있습니다. 다음 예제 이미지 객체에서 다음 정보와 Amazon Rekognition Custom Labels 매니페스트 파일을 생성하는 데 필요한 필드를 기록해 둡니다.
+ `id`: (필수) 이미지의 고유 식별자 `id` 필드는 주석 배열(경계 상자 정보가 저장되는 위치)의 `id` 필드에 매핑됩니다.
+ `license`: (필수 아님) 라이선스 어레이에 매핑됩니다.
+ `coco_url`: (선택 사항) 이미지의 위치
+ `flickr_url`: (필수 아님) Flickr에서의 이미지 위치
+ `width`: (필수) 이미지의 너비
+ `height`: (필수) 이미지의 높이
+ `file_name`: (필수) 이미지 파일 이름 이 예제에서 `file_name`과 `id`는 일치하지만 COCO 데이터 세트의 요구 사항은 아닙니다.
+ `date_captured`: (필수) 이미지를 캡처한 날짜 및 시간 

```
{
    "id": 245915,
    "license": 4,
    "coco_url": "http://images.cocodataset.org/val2017/nnnnnnnnnnnn.jpg",
    "flickr_url": "http://farm1.staticflickr.com/88/nnnnnnnnnnnnnnnnnnn.jpg",
    "width": 640,
    "height": 480,
    "file_name": "000000245915.jpg",
    "date_captured": "2013-11-18 02:53:27"
}
```

## 주석(경계 상자) 목록
<a name="md-coco-annotations"></a>

모든 이미지에 있는 모든 객체의 경계 상자 정보는 주석 목록에 저장됩니다. 단일 주석 개체에는 단일 개체에 대한 경계 상자 정보와 이미지의 개체 레이블이 포함됩니다. 이미지에 있는 객체의 각 인스턴스에는 주석 개체가 있습니다.

다음 예제에서 다음 정보와 Amazon Rekognition Custom Labels 매니페스트 파일을 생성하는 데 필요한 필드를 기록해 둡니다.
+ `id`: (필수 아님) 주석의 식별자
+ `image_id`: (필수) 이미지 배열의 `id` 이미지에 대응합니다.
+ `category_id`: (필수) 경계 상자 내의 객체를 식별하는 레이블의 식별자입니다. 카테고리 배열의 `id` 필드에 매핑됩니다.
+ `iscrowd`: (필수 아님) 이미지에 많은 객체가 포함되어 있는지 여부를 지정합니다.
+ `segmentation`: (필수 아님) 이미지 상의 객체에 대한 세그멘트화 정보입니다. Amazon Rekognition Custom Labels는 세그멘트화를 지원하지 않습니다.
+ `area`: (필수 아님) 주석의 영역
+ `bbox`: (필수) 이미지에 있는 객체 주위의 경계 상자 좌표(픽셀 단위)를 포함합니다.

```
{
    "id": 1409619,
    "category_id": 1,
    "iscrowd": 0,
    "segmentation": [
        [86.0, 238.8,..........382.74, 241.17]
    ],
    "image_id": 245915,
    "area": 3556.2197000000015,
    "bbox": [86, 65, 220, 334]
}
```

## 카테고리 목록
<a name="md-coco-categories"></a>

레이블 정보는 카테고리 배열에 저장됩니다. 다음 예제 카테고리 객체에서 다음 정보와 Amazon Rekognition Custom Labels 매니페스트 파일을 생성하는 데 필요한 필드를 기록해 둡니다.
+ `supercategory`: (필수 아님) 레이블의 상위 카테고리 
+ `id`: (필수) 레이블 식별자 `id` 필드는 `annotation` 객체의 `category_id` 필드에 매핑됩니다. 다음 예제에서 에코 도트의 식별자는 2입니다.
+ `name`: (필수) 레이블 이름 

```
        {"supercategory": "speaker","id": 2,"name": "echo dot"}
```

# COCO 데이터 세트 변환
<a name="md-coco-transform-example"></a>

다음 Python 예제를 사용하여 COCO 형식 데이터 세트의 경계 상자 정보를 Amazon Rekognition Custom Labels 매니페스트 파일로 변환합니다. 해당 코드는 생성된 매니페스트 파일을 Amazon S3 버킷에 업로드합니다. 해당 코드는 이미지를 업로드하는 데 사용할 수 있는 AWS CLI 명령도 제공합니다.

**COCO 데이터 세트를 변환하려면(SDK)**

1. 아직 설정하지 않았다면 다음과 같이 하세요.

   1. `AmazonS3FullAccess` 권한이 있는지 확인합니다. 자세한 내용은 [SDK 권한 설정](su-sdk-permissions.md) 단원을 참조하십시오.

   1.  AWS CLI 및 AWS SDKs를 설치하고 구성합니다. 자세한 내용은 [4단계: AWS CLI 및 AWS SDKs 설정](su-awscli-sdk.md) 단원을 참조하십시오.

1. 다음 Python 코드를 사용하여 COCO 데이터 세트를 변환합니다. 다음 값을 설정하세요.
   + `s3_bucket`: 이미지 및 Amazon Rekognition Custom Labels 매니페스트 파일을 저장할 S3 버킷의 이름입니다.
   + `s3_key_path_images`: S3 버킷(`s3_bucket`) 내에서 이미지를 배치하려는 위치의 경로
   + `s3_key_path_manifest_file`: S3 버킷(`s3_bucket`) 내에서 사용자 지정 레이블 매니페스트 파일을 배치할 경로
   + `local_path`: 예제에서 입력 COCO 데이터 세트를 열고 새 사용자 지정 레이블 매니페스트 파일도 저장하는 로컬 경로
   + `local_images_path`: 훈련에 사용할 이미지의 로컬 경로
   + `coco_manifest`: 입력 COCO 데이터 세트 파일 이름
   + `cl_manifest_file`: 예제에서 만든 매니페스트 파일의 이름 `local_path`에서 지정한 위치에 파일이 저장됩니다. 일반적으로 파일에는 `.manifest` 확장자가 있지만 필수는 아닙니다.
   + `job_name`: 사용자 정의 레이블 작업의 이름

   ```
   import json
   import os
   import random
   import shutil
   import datetime
   import botocore
   import boto3
   import PIL.Image as Image
   import io
   
   #S3 location for images
   s3_bucket = 'bucket'
   s3_key_path_manifest_file = 'path to custom labels manifest file/'
   s3_key_path_images = 'path to images/'
   s3_path='s3://' + s3_bucket  + '/' + s3_key_path_images
   s3 = boto3.resource('s3')
   
   #Local file information
   local_path='path to input COCO dataset and output Custom Labels manifest/'
   local_images_path='path to COCO images/'
   coco_manifest = 'COCO dataset JSON file name'
   coco_json_file = local_path + coco_manifest
   job_name='Custom Labels job name'
   cl_manifest_file = 'custom_labels.manifest'
   
   label_attribute ='bounding-box'
   
   open(local_path + cl_manifest_file, 'w').close()
   
   # class representing a Custom Label JSON line for an image
   class cl_json_line:  
       def __init__(self,job, img):  
   
           #Get image info. Annotations are dealt with seperately
           sizes=[]
           image_size={}
           image_size["width"] = img["width"]
           image_size["depth"] = 3
           image_size["height"] = img["height"]
           sizes.append(image_size)
   
           bounding_box={}
           bounding_box["annotations"] = []
           bounding_box["image_size"] = sizes
   
           self.__dict__["source-ref"] = s3_path + img['file_name']
           self.__dict__[job] = bounding_box
   
           #get metadata
           metadata = {}
           metadata['job-name'] = job_name
           metadata['class-map'] = {}
           metadata['human-annotated']='yes'
           metadata['objects'] = [] 
           date_time_obj = datetime.datetime.strptime(img['date_captured'], '%Y-%m-%d %H:%M:%S')
           metadata['creation-date']= date_time_obj.strftime('%Y-%m-%dT%H:%M:%S') 
           metadata['type']='groundtruth/object-detection'
           
           self.__dict__[job + '-metadata'] = metadata
   
   
   print("Getting image, annotations, and categories from COCO file...")
   
   with open(coco_json_file) as f:
   
       #Get custom label compatible info    
       js = json.load(f)
       images = js['images']
       categories = js['categories']
       annotations = js['annotations']
   
       print('Images: ' + str(len(images)))
       print('annotations: ' + str(len(annotations)))
       print('categories: ' + str(len (categories)))
   
   
   print("Creating CL JSON lines...")
       
   images_dict = {image['id']: cl_json_line(label_attribute, image) for image in images}
   
   print('Parsing annotations...')
   for annotation in annotations:
   
       image=images_dict[annotation['image_id']]
   
       cl_annotation = {}
       cl_class_map={}
   
       # get bounding box information
       cl_bounding_box={}
       cl_bounding_box['left'] = annotation['bbox'][0]
       cl_bounding_box['top'] = annotation['bbox'][1]
    
       cl_bounding_box['width'] = annotation['bbox'][2]
       cl_bounding_box['height'] = annotation['bbox'][3]
       cl_bounding_box['class_id'] = annotation['category_id']
   
       getattr(image, label_attribute)['annotations'].append(cl_bounding_box)
   
   
       for category in categories:
            if annotation['category_id'] == category['id']:
               getattr(image, label_attribute + '-metadata')['class-map'][category['id']]=category['name']
           
       
       cl_object={}
       cl_object['confidence'] = int(1)  #not currently used by Custom Labels
       getattr(image, label_attribute + '-metadata')['objects'].append(cl_object)
   
   print('Done parsing annotations')
   
   # Create manifest file.
   print('Writing Custom Labels manifest...')
   
   for im in images_dict.values():
   
       with open(local_path+cl_manifest_file, 'a+') as outfile:
               json.dump(im.__dict__,outfile)
               outfile.write('\n')
               outfile.close()
   
   # Upload manifest file to S3 bucket.
   print ('Uploading Custom Labels manifest file to S3 bucket')
   print('Uploading'  + local_path + cl_manifest_file + ' to ' + s3_key_path_manifest_file)
   print(s3_bucket)
   s3 = boto3.resource('s3')
   s3.Bucket(s3_bucket).upload_file(local_path + cl_manifest_file, s3_key_path_manifest_file + cl_manifest_file)
   
   # Print S3 URL to manifest file,
   print ('S3 URL Path to manifest file. ')
   print('\033[1m s3://' + s3_bucket + '/' + s3_key_path_manifest_file + cl_manifest_file + '\033[0m') 
   
   # Display aws s3 sync command.
   print ('\nAWS CLI s3 sync command to upload your images to S3 bucket. ')
   print ('\033[1m aws s3 sync ' + local_images_path + ' ' + s3_path + '\033[0m')
   ```

1. 코드를 실행합니다.

1. 프로그램 출력에서 `s3 sync` 명령을 기록해 둡니다. 이 정보는 다음 단계에서 필요합니다.

1. 명령 프롬프트에서 `s3 sync` 명령을 실행합니다. 이미지는 S3 버킷에 업로드됩니다. 업로드 중에 명령이 실패하면 로컬 이미지가 S3 버킷과 동기화될 때까지 명령을 다시 실행하세요.

1. 프로그램 출력에서 매니페스트 파일의 S3 URL 경로를 기록해 둡니다. 이 정보는 다음 단계에서 필요합니다.

1. [SageMaker AI Ground Truth 매니페스트 파일을 사용하여 데이터 세트 생성(콘솔)](md-create-dataset-ground-truth.md#md-create-dataset-ground-truth-console)의 지침에 따라 업로드된 매니페스트 파일로 데이터 세트를 생성하세요. 8단계로 **.manifest 파일 위치**에 이전 단계에서 기록해 둔 Amazon S3 URL을 입력합니다. AWS SDK를 사용하고 있다면 [SageMaker AI Ground Truth 매니페스트 파일(SDK)을 사용하여 데이터 세트 생성](md-create-dataset-ground-truth.md#md-create-dataset-ground-truth-sdk) 항목을 수행하세요.

# 다중 레이블 SageMaker AI Ground Truth 매니페스트 파일 변환
<a name="md-gt-cl-transform"></a>

이 주제에서는 다중 레이블 Amazon SageMaker AI Ground Truth 매니페스트 파일을 Amazon Rekognition Custom Labels 형식 매니페스트 파일로 변환하는 방법을 보여줍니다.

다중 레이블 작업에 대한 SageMaker AI Ground Truth 매니페스트 파일의 형식은 Amazon Rekognition Custom Labels 형식 매니페스트 파일과 다릅니다. 다중 레이블 분류란 어떤 이미지가 일련의 클래스로 분류되지만 동시에 여러 클래스에 속할 수 있는 경우를 말합니다. 이 경우 이미지에 축구공**, 공**과 같은 여러 레이블(다중 레이블)이 있을 수 있습니다.

다중 레이블 SageMaker AI Ground Truth 작업에 대한 자세한 내용은 [이미지 분류(다중 레이블)를 참조하세요](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-image-classification-multilabel.html). 다중 레이블 형식 Amazon Rekognition Custom Labels 매니페스트 파일에 대한 자세한 내용은 [이미지에 여러 이미지 수준 레이블 추가](md-create-manifest-file-classification.md#md-dataset-purpose-classification-multiple-labels) 항목을 참조하세요.

## SageMaker AI Ground Truth 작업에 대한 매니페스트 파일 가져오기
<a name="md-get-gt-manifest"></a>

다음 절차에서는 Amazon SageMaker AI Ground Truth 작업에 대한 출력 매니페스트 파일(`output.manifest`)을 가져오는 방법을 보여줍니다. `output.manifest`를 다음 절차의 입력으로 사용합니다.

**SageMaker AI Ground Truth 작업 매니페스트 파일을 다운로드하려면**

1. [https://console.aws.amazon.com/sagemaker/](https://console.aws.amazon.com/sagemaker/) 링크를 엽니다.

1. 탐색 창에서 **Ground Truth**를 선택한 다음 **레이블 지정 작업**을 선택합니다.

1. 사용할 매니페스트 파일이 들어 있는 레이블 지정 작업을 선택합니다.

1. 세부 정보 페이지의 **출력 데이터 세트 위치** 아래에 있는 링크를 선택합니다. Amazon S3 콘솔이 데이터 세트 위치에서 열립니다.

1. `Manifests`, `output`, 다음 `output.manifest`을 선택합니다.

1. 매니페스트 파일을 다운로드하려면 **객체 작업**을 선택하고 **다운로드**를 선택합니다.

## 다중 레이블 SageMaker AI 매니페스트 파일 변환
<a name="md-transform-ml-gt"></a>

다음 절차에서는 기존 다중 레이블 형식 SageMaker AI GroundTruth 매니페스트 파일에서 다중 레이블 형식 Amazon Rekognition Custom Labels 매니페스트 파일을 생성합니다.

**참고**  
코드를 실행하려면 Python 버전 3 이상이 필요합니다.<a name="md-procedure-multi-label-transform"></a>

**다중 레이블 SageMaker AI 매니페스트 파일을 변환하려면**

1. 다음 Python 코드를 실행합니다. [SageMaker AI Ground Truth 작업에 대한 매니페스트 파일 가져오기](#md-get-gt-manifest)에서 생성한 매니페스트 파일의 이름을 명령줄 인수로 제공합니다.

   ```
   # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
   # SPDX-License-Identifier:  Apache-2.0
   """
   Purpose
   Shows how to create and Amazon Rekognition Custom Labels format
   manifest file from an Amazon SageMaker Ground Truth Image
   Classification (Multi-label) format manifest file.
   """
   import json
   import logging
   import argparse
   import os.path
   
   logger = logging.getLogger(__name__)
   
   def create_manifest_file(ground_truth_manifest_file):
       """
       Creates an Amazon Rekognition Custom Labels format manifest file from
       an Amazon SageMaker Ground Truth Image Classification (Multi-label) format
       manifest file.
       :param: ground_truth_manifest_file: The name of the Ground Truth manifest file,
       including the relative path.
       :return: The name of the new Custom Labels manifest file.
       """
   
       logger.info('Creating manifest file from %s', ground_truth_manifest_file)
       new_manifest_file = f'custom_labels_{os.path.basename(ground_truth_manifest_file)}'
   
       # Read the SageMaker Ground Truth manifest file into memory.
       with open(ground_truth_manifest_file) as gt_file:
           lines = gt_file.readlines()
   
       #Iterate through the lines one at a time to generate the
       #new lines for the Custom Labels manifest file.
       with open(new_manifest_file, 'w') as the_new_file:
           for line in lines:
               #job_name - The of the Amazon Sagemaker Ground Truth job.
               job_name = ''
               # Load in the old json item from the Ground Truth manifest file
               old_json = json.loads(line)
   
               # Get the job name
               keys = old_json.keys()
               for key in keys:
                   if 'source-ref' not in key and '-metadata' not in key:
                       job_name = key
   
               new_json = {}
               # Set the location of the image
               new_json['source-ref'] = old_json['source-ref']
   
               # Temporarily store the list of labels
               labels = old_json[job_name]
   
               # Iterate through the labels and reformat to Custom Labels format
               for index, label in enumerate(labels):
                   new_json[f'{job_name}{index}'] = index
                   metadata = {}
                   metadata['class-name'] = old_json[f'{job_name}-metadata']['class-map'][str(label)]
                   metadata['confidence'] = old_json[f'{job_name}-metadata']['confidence-map'][str(label)]
                   metadata['type'] = 'groundtruth/image-classification'
                   metadata['job-name'] = old_json[f'{job_name}-metadata']['job-name']
                   metadata['human-annotated'] = old_json[f'{job_name}-metadata']['human-annotated']
                   metadata['creation-date'] = old_json[f'{job_name}-metadata']['creation-date']
                   # Add the metadata to new json line
                   new_json[f'{job_name}{index}-metadata'] = metadata
               # Write the current line to the json file
               the_new_file.write(json.dumps(new_json))
               the_new_file.write('\n')
   
       logger.info('Created %s', new_manifest_file)
       return  new_manifest_file
   
   def add_arguments(parser):
       """
       Adds command line arguments to the parser.
       :param parser: The command line parser.
       """
   
       parser.add_argument(
           "manifest_file", help="The Amazon SageMaker Ground Truth manifest file"
           "that you want to use."
       )
   
   
   def main():
       logging.basicConfig(level=logging.INFO,
                           format="%(levelname)s: %(message)s")
       try:
           # get command line arguments
           parser = argparse.ArgumentParser(usage=argparse.SUPPRESS)
           add_arguments(parser)
           args = parser.parse_args()
           # Create the manifest file
           manifest_file = create_manifest_file(args.manifest_file)
           print(f'Manifest file created: {manifest_file}')
       except FileNotFoundError as err:
           logger.exception('File not found: %s', err)
           print(f'File not found: {err}. Check your manifest file.')
   
   if __name__ == "__main__":
       main()
   ```

1. 스크립트에 표시되는 새 매니페스트 파일의 이름을 기록해 둡니다. 다음 단계에서 해당 항목을 사용합니다.

1. 매니페스트 파일을 저장하는 데 사용할 Amazon S3 버킷에 [매니페스트 파일을 업로드](https://docs.aws.amazon.com/AmazonS3/latest/userguide/upload-objects.html)합니다.
**참고**  
Amazon Rekognition Custom Labels가 매니페스트 파일 JSON 라인의 `source-ref` 필드에서 참조되는 Amazon S3 버킷에 액세스할 수 있는지 확인하세요. 자세한 내용은 [외부 Amazon S3 버킷에 액세스](su-console-policy.md#su-external-buckets) 단원을 참조하십시오. Ground Truth 작업이 Amazon Rekognition Custom Labels 콘솔 버킷에 이미지를 저장하는 경우 권한을 추가할 필요가 없습니다.

1. [SageMaker AI Ground Truth 매니페스트 파일을 사용하여 데이터 세트 생성(콘솔)](md-create-dataset-ground-truth.md#md-create-dataset-ground-truth-console)의 지침에 따라 업로드된 매니페스트 파일로 데이터 세트를 생성하세요. 8단계로 **.manifest 파일 위치**에 매니페스트 파일의 위치로 사용할 Amazon S3 URL을 입력합니다. AWS SDK를 사용하고 있다면 [SageMaker AI Ground Truth 매니페스트 파일(SDK)을 사용하여 데이터 세트 생성](md-create-dataset-ground-truth.md#md-create-dataset-ground-truth-sdk) 항목을 수행하세요.

# CSV 파일로 매니페스트 파일 생성
<a name="ex-csv-manifest"></a>

이 예제 Python 스크립트는 Comma Separated Values(CSV) 파일을 사용하여 이미지에 레이블을 지정함으로써 매니페스트 파일 생성을 간소화합니다. 사용자가 CSV 파일을 생성합니다. 매니페스트 파일은 [다중 레이블 이미지 분류](getting-started.md#gs-multi-label-image-classification-example) 또는 [다중 레이블 이미지 분류](getting-started.md#gs-multi-label-image-classification-example) 용도에 적합합니다. 자세한 내용은 [객체, 장면 및 개념 찾기](understanding-custom-labels.md#tm-classification) 단원을 참조하십시오.

**참고**  
이 스크립트는 [객체 위치](understanding-custom-labels.md#tm-object-localization) 또는 [브랜드 위치](understanding-custom-labels.md#tm-brand-detection-localization)를 찾는 데 적합한 매니페스트 파일을 생성하지 않습니다.

매니페스트 파일은 모델 학습에 사용되는 이미지를 설명합니다. 이미지 위치와 이미지에 지정된 레이블을 예로 들 수 있습니다. 매니페스트 파일은 하나 이상의 JSON 라인으로 구성됩니다. 각 JSON 라인은 단일 이미지를 설명합니다. 자세한 내용은 [매니페스트 파일의 이미지 수준 레이블 가져오기](md-create-manifest-file-classification.md) 단원을 참조하십시오.

CSV 파일은 텍스트 파일의 여러 행에 대한 표 형식 데이터를 나타냅니다. 행의 필드는 쉼표로 구분합니다. 자세한 내용은 [comma separated values](https://en.wikipedia.org/wiki/Comma-separated_values)를 참조하세요. 이 스크립트에서 CSV 파일의 각 행은 단일 이미지를 나타내며 매니페스트 파일의 JSON 라인에 매핑됩니다. [다중 레이블 이미지 분류](getting-started.md#gs-multi-label-image-classification-example)를 지원하는 매니페스트 파일의 CSV 파일을 만들려면 각 행에 하나 이상의 이미지 수준 레이블을 추가하세요. [이미지 분류](getting-started.md#gs-image-classification-example)에 적합한 매니페스트 파일을 만들려면 각 행에 단일 이미지 수준 레이블을 추가하세요.

예를 들어, 다음 CSV 파일은 [다중 레이블 이미지 분류](getting-started.md#gs-multi-label-image-classification-example)(꽃) 시작하기** 프로젝트의 이미지를 설명합니다.

```
camellia1.jpg,camellia,with_leaves
camellia2.jpg,camellia,with_leaves
camellia3.jpg,camellia,without_leaves
helleborus1.jpg,helleborus,without_leaves,not_fully_grown
helleborus2.jpg,helleborus,with_leaves,fully_grown
helleborus3.jpg,helleborus,with_leaves,fully_grown
jonquil1.jpg,jonquil,with_leaves
jonquil2.jpg,jonquil,with_leaves
jonquil3.jpg,jonquil,with_leaves
jonquil4.jpg,jonquil,without_leaves
mauve_honey_myrtle1.jpg,mauve_honey_myrtle,without_leaves
mauve_honey_myrtle2.jpg,mauve_honey_myrtle,with_leaves
mauve_honey_myrtle3.jpg,mauve_honey_myrtle,with_leaves
mediterranean_spurge1.jpg,mediterranean_spurge,with_leaves
mediterranean_spurge2.jpg,mediterranean_spurge,without_leaves
```

스크립트는 각 행에 대해 JSON 라인을 생성합니다. 예를 들어, 다음은 첫 번째 행(`camellia1.jpg,camellia,with_leaves`)의 JSON 라인입니다.

```
{"source-ref": "s3://bucket/flowers/train/camellia1.jpg","camellia": 1,"camellia-metadata":{"confidence": 1,"job-name": "labeling-job/camellia","class-name": "camellia","human-annotated": "yes","creation-date": "2022-01-21T14:21:05","type": "groundtruth/image-classification"},"with_leaves": 1,"with_leaves-metadata":{"confidence": 1,"job-name": "labeling-job/with_leaves","class-name": "with_leaves","human-annotated": "yes","creation-date": "2022-01-21T14:21:05","type": "groundtruth/image-classification"}}
```

예제 CSV에는 이미지에 대한 Amazon S3 경로가 없습니다. CSV 파일에 이미지의 Amazon S3 경로가 포함되어 있지 않은 경우 `--s3_path` 명령줄 인수를 사용하여 이미지에 대한 Amazon S3 경로를 지정하세요.

스크립트는 각 이미지의 첫 번째 항목을 중복 제거된 이미지 CSV 파일에 기록합니다. 중복 제거된 이미지 CSV 파일에는 입력 CSV 파일에 있는 각 이미지의 단일 인스턴스가 포함됩니다. 입력 CSV 파일에서 이미지가 추가로 나타나는 경우 중복 이미지 CSV 파일에 기록됩니다. 스크립트가 중복된 이미지를 발견하면 중복 이미지 CSV 파일을 검토하고 필요에 따라 중복 제거된 이미지 CSV 파일을 업데이트하세요. 중복 제거된 파일을 사용하여 스크립트를 다시 실행합니다. 입력 CSV 파일에 중복이 없는 경우 스크립트는 중복 제거된 이미지 CSV 파일과 중복 이미지 CSV 파일이 비어 있으므로 해당 파일을 삭제합니다.

 이 절차에서 사용자는 CSV 파일을 만들고 Python 스크립트를 실행하여 매니페스트 파일을 만듭니다.

**CSV 파일에서 매니페스트 파일을 생성하려면**

1. 각 행에 다음 필드를 포함하는 CSV 파일을 생성합니다 (이미지당 한 행). CSV 파일에 헤더 행을 추가하지 마세요.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/ko_kr/rekognition/latest/customlabels-dg/ex-csv-manifest.html)

   예: `camellia1.jpg,camellia,with_leaves` 또는 `s3://my-bucket/flowers/train/camellia1.jpg,camellia,with_leaves` 

1. CSV 파일을 저장합니다.

1. 다음 Python 스크립트를 실행합니다. 다음 인수를 제공하세요.
   + `csv_file`: 1단계에서 생성한 CSV 파일 
   + `manifest_file`: 생성할 매니페스트 파일의 이름
   + (선택 사항)`--s3_path s3://path_to_folder/`: 이미지 파일 이름에 추가할 Amazon S3 경로(필드 1) 필드 1의 이미지에 아직 S3 경로가 포함되어 있지 않은 경우 `--s3_path`를 사용합니다.

   ```
   # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
   # SPDX-License-Identifier:  Apache-2.0
   
   from datetime import datetime, timezone
   import argparse
   import logging
   import csv
   import os
   import json
   
   """
   Purpose
   Amazon Rekognition Custom Labels model example used in the service documentation.
   Shows how to create an image-level (classification) manifest file from a CSV file.
   You can specify multiple image level labels per image.
   CSV file format is
   image,label,label,..
   If necessary, use the bucket argument to specify the S3 bucket folder for the images.
   https://docs.aws.amazon.com/rekognition/latest/customlabels-dg/md-gt-cl-transform.html
   """
   
   logger = logging.getLogger(__name__)
   
   
   def check_duplicates(csv_file, deduplicated_file, duplicates_file):
       """
       Checks for duplicate images in a CSV file. If duplicate images
       are found, deduplicated_file is the deduplicated CSV file - only the first
       occurence of a duplicate is recorded. Other duplicates are recorded in duplicates_file.
       :param csv_file: The source CSV file.
       :param deduplicated_file: The deduplicated CSV file to create. If no duplicates are found
       this file is removed.
       :param duplicates_file: The duplicate images CSV file to create. If no duplicates are found
       this file is removed.
       :return: True if duplicates are found, otherwise false.
       """
   
       logger.info("Deduplicating %s", csv_file)
   
       duplicates_found = False
   
       # Find duplicates.
       with open(csv_file, 'r', newline='', encoding="UTF-8") as f,\
               open(deduplicated_file, 'w', encoding="UTF-8") as dedup,\
               open(duplicates_file, 'w', encoding="UTF-8") as duplicates:
   
           reader = csv.reader(f, delimiter=',')
           dedup_writer = csv.writer(dedup)
           duplicates_writer = csv.writer(duplicates)
   
           entries = set()
           for row in reader:
               # Skip empty lines.
               if not ''.join(row).strip():
                   continue
   
               key = row[0]
               if key not in entries:
                   dedup_writer.writerow(row)
                   entries.add(key)
               else:
                   duplicates_writer.writerow(row)
                   duplicates_found = True
   
       if duplicates_found:
           logger.info("Duplicates found check %s", duplicates_file)
   
       else:
           os.remove(duplicates_file)
           os.remove(deduplicated_file)
   
       return duplicates_found
   
   
   def create_manifest_file(csv_file, manifest_file, s3_path):
       """
       Reads a CSV file and creates a Custom Labels classification manifest file.
       :param csv_file: The source CSV file.
       :param manifest_file: The name of the manifest file to create.
       :param s3_path: The S3 path to the folder that contains the images.
       """
       logger.info("Processing CSV file %s", csv_file)
   
       image_count = 0
       label_count = 0
   
       with open(csv_file, newline='', encoding="UTF-8") as csvfile,\
               open(manifest_file, "w", encoding="UTF-8") as output_file:
   
           image_classifications = csv.reader(
               csvfile, delimiter=',', quotechar='|')
   
           # Process each row (image) in CSV file.
           for row in image_classifications:
               source_ref = str(s3_path)+row[0]
   
               image_count += 1
   
               # Create JSON for image source ref.
               json_line = {}
               json_line['source-ref'] = source_ref
   
               # Process each image level label.
               for index in range(1, len(row)):
                   image_level_label = row[index]
   
                   # Skip empty columns.
                   if image_level_label == '':
                       continue
                   label_count += 1
   
                  # Create the JSON line metadata.
                   json_line[image_level_label] = 1
                   metadata = {}
                   metadata['confidence'] = 1
                   metadata['job-name'] = 'labeling-job/' + image_level_label
                   metadata['class-name'] = image_level_label
                   metadata['human-annotated'] = "yes"
                   metadata['creation-date'] = \
                       datetime.now(timezone.utc).strftime('%Y-%m-%dT%H:%M:%S.%f')
                   metadata['type'] = "groundtruth/image-classification"
   
                   json_line[f'{image_level_label}-metadata'] = metadata
   
                   # Write the image JSON Line.
               output_file.write(json.dumps(json_line))
               output_file.write('\n')
   
       output_file.close()
       logger.info("Finished creating manifest file %s\nImages: %s\nLabels: %s",
                   manifest_file, image_count, label_count)
   
       return image_count, label_count
   
   
   def add_arguments(parser):
       """
       Adds command line arguments to the parser.
       :param parser: The command line parser.
       """
   
       parser.add_argument(
           "csv_file", help="The CSV file that you want to process."
       )
   
       parser.add_argument(
           "--s3_path", help="The S3 bucket and folder path for the images."
           " If not supplied, column 1 is assumed to include the S3 path.", required=False
       )
   
   
   def main():
   
       logging.basicConfig(level=logging.INFO,
                           format="%(levelname)s: %(message)s")
   
       try:
   
           # Get command line arguments
           parser = argparse.ArgumentParser(usage=argparse.SUPPRESS)
           add_arguments(parser)
           args = parser.parse_args()
   
           s3_path = args.s3_path
           if s3_path is None:
               s3_path = ''
   
           # Create file names.
           csv_file = args.csv_file
           file_name = os.path.splitext(csv_file)[0]
           manifest_file = f'{file_name}.manifest'
           duplicates_file = f'{file_name}-duplicates.csv'
           deduplicated_file = f'{file_name}-deduplicated.csv'
   
           # Create manifest file, if there are no duplicate images.
           if check_duplicates(csv_file, deduplicated_file, duplicates_file):
               print(f"Duplicates found. Use {duplicates_file} to view duplicates "
                     f"and then update {deduplicated_file}. ")
               print(f"{deduplicated_file} contains the first occurence of a duplicate. "
                     "Update as necessary with the correct label information.")
               print(f"Re-run the script with {deduplicated_file}")
           else:
               print("No duplicates found. Creating manifest file.")
   
               image_count, label_count = create_manifest_file(csv_file,
                                                               manifest_file,
                                                               s3_path)
   
               print(f"Finished creating manifest file: {manifest_file} \n"
                     f"Images: {image_count}\nLabels: {label_count}")
   
       except FileNotFoundError as err:
           logger.exception("File not found: %s", err)
           print(f"File not found: {err}. Check your input CSV file.")
   
   
   if __name__ == "__main__":
       main()
   ```

1. 테스트 데이터 세트를 사용하려는 경우 1\$13단계를 반복하여 테스트 데이터 세트의 매니페스트 파일을 생성하세요.

1. 필요한 경우 CSV 파일의 열 1에서 지정한(또는 `--s3_path` 명령줄에서 지정한) Amazon S3 버킷 경로에 이미지를 복사합니다. 다음 AWS S3 명령을 사용할 수 있습니다.

   ```
   aws s3 cp --recursive your-local-folder s3://your-target-S3-location
   ```

1. 매니페스트 파일을 저장하는 데 사용할 Amazon S3 버킷에 [매니페스트 파일을 업로드](https://docs.aws.amazon.com/AmazonS3/latest/userguide/upload-objects.html)합니다.
**참고**  
Amazon Rekognition Custom Labels가 매니페스트 파일 JSON 라인의 `source-ref` 필드에서 참조되는 Amazon S3 버킷에 액세스할 수 있는지 확인하세요. 자세한 내용은 [외부 Amazon S3 버킷에 액세스](su-console-policy.md#su-external-buckets) 단원을 참조하십시오. Ground Truth 작업이 Amazon Rekognition Custom Labels 콘솔 버킷에 이미지를 저장하는 경우 권한을 추가할 필요가 없습니다.

1. [SageMaker AI Ground Truth 매니페스트 파일을 사용하여 데이터 세트 생성(콘솔)](md-create-dataset-ground-truth.md#md-create-dataset-ground-truth-console)의 지침에 따라 업로드된 매니페스트 파일로 데이터 세트를 생성하세요. 8단계로 **.manifest 파일 위치**에 매니페스트 파일의 위치로 사용할 Amazon S3 URL을 입력합니다. AWS SDK를 사용하고 있다면 [SageMaker AI Ground Truth 매니페스트 파일(SDK)을 사용하여 데이터 세트 생성](md-create-dataset-ground-truth.md#md-create-dataset-ground-truth-sdk) 항목을 수행하세요.