Terjemahan disediakan oleh mesin penerjemah. Jika konten terjemahan yang diberikan bertentangan dengan versi bahasa Inggris aslinya, utamakan versi bahasa Inggris.

# Mengonversi format dataset lain ke file manifes
<a name="md-converting-to-sm-format"></a>

Anda dapat menggunakan informasi berikut untuk membuat file manifes format Amazon SageMaker AI dari berbagai format kumpulan data sumber. Setelah membuat file manifes, gunakan untuk membuat dataset. Untuk informasi selengkapnya, lihat [Menggunakan file manifes untuk mengimpor gambar](md-create-dataset-ground-truth.md).

**Topics**
+ [Mengubah dataset COCO menjadi format file manifes](md-transform-coco.md)
+ [Mengubah file manifes SageMaker AI Ground Truth multi-label](md-gt-cl-transform.md)
+ [Membuat file manifes dari file CSV](ex-csv-manifest.md)

# Mengubah dataset COCO menjadi format file manifes
<a name="md-transform-coco"></a>

[COCO](http://cocodataset.org/#home) adalah format untuk menentukan deteksi objek skala besar, segmentasi, dan kumpulan data teks. [Contoh](md-coco-transform-example.md) [Python ini menunjukkan kepada Anda cara mengubah kumpulan data format deteksi objek COCO menjadi file manifes format kotak pembatas Amazon Rekognition Custom Labels.](md-create-manifest-file-object-detection.md) Bagian ini juga mencakup informasi yang dapat Anda gunakan untuk menulis kode Anda sendiri.

File JSON format COCO terdiri dari lima bagian yang menyediakan informasi untuk *seluruh* kumpulan data. Untuk informasi selengkapnya, lihat [Format dataset COCO](md-coco-overview.md). 
+ `info`— informasi umum tentang dataset. 
+ `licenses `— informasi lisensi untuk gambar dalam dataset.
+ [`images`](md-coco-overview.md#md-coco-images)— daftar gambar dalam dataset.
+ [`annotations`](md-coco-overview.md#md-coco-annotations)— daftar anotasi (termasuk kotak pembatas) yang ada di semua gambar dalam kumpulan data.
+ [`categories`](md-coco-overview.md#md-coco-categories)— daftar kategori label.

Anda memerlukan informasi dari`images`,`annotations`, dan `categories` daftar untuk membuat file manifes Label Kustom Rekognition Amazon.

*File manifes Label Kustom Rekognition Amazon dalam format baris JSON di mana setiap baris memiliki kotak pembatas dan informasi label untuk satu atau beberapa objek pada gambar.* Untuk informasi selengkapnya, lihat [Lokalisasi objek dalam file manifes](md-create-manifest-file-object-detection.md).

## Memetakan Objek COCO ke Garis JSON Label Kustom
<a name="md-mapping-coco"></a>

Untuk mengubah kumpulan data format COCO, Anda memetakan kumpulan data COCO ke file manifes Label Kustom Rekognition Amazon untuk pelokalan objek. Untuk informasi selengkapnya, lihat [Lokalisasi objek dalam file manifes](md-create-manifest-file-object-detection.md). Untuk membuat baris JSON untuk setiap gambar, file manifes perlu memetakan kumpulan data COCO `image``annotation`, dan `category` bidang objek. IDs 

Berikut ini adalah contoh file manifes COCO. Untuk informasi selengkapnya, lihat [Format dataset COCO](md-coco-overview.md).

```
{
    "info": {
        "description": "COCO 2017 Dataset","url": "http://cocodataset.org","version": "1.0","year": 2017,"contributor": "COCO Consortium","date_created": "2017/09/01"
    },
    "licenses": [
        {"url": "http://creativecommons.org/licenses/by/2.0/","id": 4,"name": "Attribution License"}
    ],
    "images": [
        {"id": 242287, "license": 4, "coco_url": "http://images.cocodataset.org/val2017/xxxxxxxxxxxx.jpg", "flickr_url": "http://farm3.staticflickr.com/2626/xxxxxxxxxxxx.jpg", "width": 426, "height": 640, "file_name": "xxxxxxxxx.jpg", "date_captured": "2013-11-15 02:41:42"},
        {"id": 245915, "license": 4, "coco_url": "http://images.cocodataset.org/val2017/nnnnnnnnnnnn.jpg", "flickr_url": "http://farm1.staticflickr.com/88/xxxxxxxxxxxx.jpg", "width": 640, "height": 480, "file_name": "nnnnnnnnnn.jpg", "date_captured": "2013-11-18 02:53:27"}
    ],
    "annotations": [
        {"id": 125686, "category_id": 0, "iscrowd": 0, "segmentation": [[164.81, 417.51,......167.55, 410.64]], "image_id": 242287, "area": 42061.80340000001, "bbox": [19.23, 383.18, 314.5, 244.46]},
        {"id": 1409619, "category_id": 0, "iscrowd": 0, "segmentation": [[376.81, 238.8,........382.74, 241.17]], "image_id": 245915, "area": 3556.2197000000015, "bbox": [399, 251, 155, 101]},
        {"id": 1410165, "category_id": 1, "iscrowd": 0, "segmentation": [[486.34, 239.01,..........495.95, 244.39]], "image_id": 245915, "area": 1775.8932499999994, "bbox": [86, 65, 220, 334]}
    ],
    "categories": [
        {"supercategory": "speaker","id": 0,"name": "echo"},
        {"supercategory": "speaker","id": 1,"name": "echo dot"}
    ]
}
```

*Diagram berikut menunjukkan bagaimana kumpulan data COCO mencantumkan peta kumpulan data ke baris *JSON* Label Kustom Rekognition Amazon untuk gambar.* Setiap baris JSON untuk gambar memiliki kolom sumber referensi, pekerjaan, dan metadata pekerjaan. Warna yang cocok menunjukkan informasi untuk satu gambar. Perhatikan bahwa dalam manifes, gambar individu mungkin memiliki beberapa anotasi dan metadata/kategori.

![\[Diagram yang menunjukkan struktur Coco Manifest, dengan gambar, anotasi, dan kategori yang terkandung di dalamnya.\]](http://docs.aws.amazon.com/id_id/rekognition/latest/customlabels-dg/images/coco-transform.png)


**Untuk mendapatkan objek COCO untuk satu baris JSON**

1. Untuk setiap gambar dalam daftar gambar, dapatkan anotasi dari daftar anotasi di mana nilai bidang anotasi `image_id` cocok dengan bidang gambar. `id`

1. Untuk setiap anotasi yang cocok di langkah 1, baca `categories` daftar dan dapatkan masing-masing `category` nilai bidang yang `id` cocok dengan `category` bidang `annotation` objek`category_id`.

1. Buat garis JSON untuk gambar menggunakan objek yang cocok `image``annotation`, dan`category`. Untuk memetakan bidang, lihat[Memetakan bidang objek COCO ke bidang objek garis JSON Label Kustom](#md-mapping-fields-coco). 

1. Ulangi langkah 1-3 sampai Anda telah membuat baris JSON untuk setiap `image` objek dalam daftar. `images`

Untuk kode sampel, lihat [Mengubah dataset COCO](md-coco-transform-example.md).

## Memetakan bidang objek COCO ke bidang objek garis JSON Label Kustom
<a name="md-mapping-fields-coco"></a>

Setelah Anda mengidentifikasi objek COCO untuk baris JSON Label Kustom Rekognition Amazon, Anda perlu memetakan bidang objek COCO ke bidang objek baris JSON Label Kustom Amazon Rekognition masing-masing. Contoh berikut Amazon Rekognition Custom Labels JSON line memetakan satu gambar `id` (`000000245915`=) ke contoh COCO JSON sebelumnya. Perhatikan informasi berikut.
+ `source-ref`adalah lokasi gambar dalam ember Amazon S3. Jika gambar COCO Anda tidak disimpan dalam bucket Amazon S3, Anda harus memindahkannya ke bucket Amazon S3.
+ `annotations`Daftar berisi `annotation` objek untuk setiap objek pada gambar. `annotation`Objek mencakup informasi kotak pembatas (`top`,, `left``width`,`height`) dan pengenal label (`class_id`).
+ Pengenal label (`class_id`) memetakan ke `class-map` daftar dalam metadata. Ini mencantumkan label yang digunakan pada gambar.

```
{
	"source-ref": "s3://custom-labels-bucket/images/000000245915.jpg",
	"bounding-box": {
		"image_size": {
			"width": 640,
			"height": 480,
			"depth": 3
		},
		"annotations": [{
			"class_id": 0,
			"top": 251,
			"left": 399,
			"width": 155,
			"height": 101
		}, {
			"class_id": 1,
			"top": 65,
			"left": 86,
			"width": 220,
			"height": 334
		}]
	},
	"bounding-box-metadata": {
		"objects": [{
			"confidence": 1
		}, {
			"confidence": 1
		}],
		"class-map": {
			"0": "Echo",
			"1": "Echo Dot"
		},
		"type": "groundtruth/object-detection",
		"human-annotated": "yes",
		"creation-date": "2018-10-18T22:18:13.527256",
		"job-name": "my job"
	}
}
```

Gunakan informasi berikut untuk memetakan kolom file manifes Label Kustom Rekognition Amazon ke bidang JSON kumpulan data COCO. 

### sumber-ref
<a name="md-source-ref-coco"></a>

URL format S3 untuk lokasi gambar. Gambar harus disimpan dalam ember S3. Untuk informasi selengkapnya, lihat [sumber-ref](md-create-manifest-file-object-detection.md#cd-manifest-source-ref). Jika bidang `coco_url` COCO menunjuk ke lokasi bucket S3, Anda dapat menggunakan nilai `coco_url` untuk nilai. `source-ref` Atau, Anda dapat memetakan `source-ref` ke bidang `file_name` (COCO) dan dalam kode transformasi Anda, tambahkan jalur S3 yang diperlukan ke tempat gambar disimpan. 

### *bounding-box*
<a name="md-label-attribute-id-coco"></a>

Nama atribut label yang Anda pilih. Untuk informasi selengkapnya, lihat [*bounding-box*](md-create-manifest-file-object-detection.md#md-manifest-source-bounding-box).

#### image\$1size
<a name="md-image-size-coco"></a>

Ukuran gambar dalam piksel. Peta ke `image` objek dalam daftar [gambar](md-coco-overview.md#md-coco-images).
+ `height`-> `image.height`
+ `width`-> `image.width`
+ `depth`-> Tidak digunakan oleh Label Kustom Rekognition Amazon tetapi nilainya harus diberikan.

#### anotasi
<a name="md-annotations-coco"></a>

Daftar objek `annotation`. Ada satu `annotation` untuk setiap objek pada gambar.

#### anotasi
<a name="md-annotation-coco"></a>

Berisi informasi kotak pembatas untuk satu contoh objek pada gambar. 
+ `class_id`-> pemetaan id numerik ke daftar Custom Label. `class-map`
+ `top` -> `bbox[1]`
+ `left` -> `bbox[0]`
+ `width` -> `bbox[2]`
+ `height` -> `bbox[3]`

### *bounding-box*-metadata
<a name="md-metadata-coco"></a>

Metadata untuk atribut label. Termasuk label dan pengidentifikasi label. Untuk informasi selengkapnya, lihat [*bounding-box*-metadata](md-create-manifest-file-object-detection.md#md-manifest-source-bounding-box-metadata).

#### Objek
<a name="cd-metadata-objects-coco"></a>

Array objek dalam gambar. Peta ke `annotations` daftar berdasarkan indeks.

##### Objek
<a name="cd-metadata-object-coco"></a>
+ `confidence`-> Tidak digunakan oleh Amazon Rekognition Custom Labels, tetapi nilai (1) diperlukan.

#### peta kelas
<a name="md-metadata-class-map-coco"></a>

Peta label (kelas) yang berlaku untuk objek yang terdeteksi dalam gambar. Peta ke objek kategori dalam daftar [kategori](md-coco-overview.md#md-coco-categories).
+ `id` -> `category.id`
+ `id value` -> `category.name`

#### jenis
<a name="md-type-coco"></a>

Harus `groundtruth/object-detection`

#### beranotasi manusia
<a name="md-human-annotated-coco"></a>

Tentukan `yes` atau `no`. Untuk informasi selengkapnya, lihat [*bounding-box*-metadata](md-create-manifest-file-object-detection.md#md-manifest-source-bounding-box-metadata).

#### [kreasi-tanggal -> gambar .date\$1capture](md-coco-overview.md#md-coco-images)
<a name="md-creation-date-coco"></a>

Tanggal dan waktu pembuatan gambar. Memetakan ke bidang [gambar](md-coco-overview.md#md-coco-images) .date\$1capture dari gambar dalam daftar gambar COCO. *Amazon Rekognition Custom Labels mengharapkan format `creation-date` menjadi Y-M-DTH:M: S.*

#### nama-pekerjaan
<a name="md-job-name-coco"></a>

Nama pekerjaan yang Anda pilih. 

# Format dataset COCO
<a name="md-coco-overview"></a>

Dataset COCO terdiri dari lima bagian informasi yang memberikan informasi untuk seluruh kumpulan data. Format untuk kumpulan data deteksi objek COCO didokumentasikan di Format Data [COCO](http://cocodataset.org/#format-data). 
+ info — informasi umum tentang dataset. 
+ lisensi — informasi lisensi untuk gambar dalam dataset.
+ [gambar](#md-coco-images) — daftar gambar dalam dataset.
+ [anotasi](#md-coco-annotations) — daftar anotasi (termasuk kotak pembatas) yang ada di semua gambar dalam kumpulan data.
+ [kategori](#md-coco-categories) — daftar kategori label.

Untuk membuat manifes Label Kustom, Anda menggunakan`images`,`annotations`, dan `categories` daftar dari file manifes COCO. Bagian lain (`info`,`licences`) tidak diperlukan. Berikut ini adalah contoh file manifes COCO.

```
{
    "info": {
        "description": "COCO 2017 Dataset","url": "http://cocodataset.org","version": "1.0","year": 2017,"contributor": "COCO Consortium","date_created": "2017/09/01"
    },
    "licenses": [
        {"url": "http://creativecommons.org/licenses/by/2.0/","id": 4,"name": "Attribution License"}
    ],
    "images": [
        {"id": 242287, "license": 4, "coco_url": "http://images.cocodataset.org/val2017/xxxxxxxxxxxx.jpg", "flickr_url": "http://farm3.staticflickr.com/2626/xxxxxxxxxxxx.jpg", "width": 426, "height": 640, "file_name": "xxxxxxxxx.jpg", "date_captured": "2013-11-15 02:41:42"},
        {"id": 245915, "license": 4, "coco_url": "http://images.cocodataset.org/val2017/nnnnnnnnnnnn.jpg", "flickr_url": "http://farm1.staticflickr.com/88/xxxxxxxxxxxx.jpg", "width": 640, "height": 480, "file_name": "nnnnnnnnnn.jpg", "date_captured": "2013-11-18 02:53:27"}
    ],
    "annotations": [
        {"id": 125686, "category_id": 0, "iscrowd": 0, "segmentation": [[164.81, 417.51,......167.55, 410.64]], "image_id": 242287, "area": 42061.80340000001, "bbox": [19.23, 383.18, 314.5, 244.46]},
        {"id": 1409619, "category_id": 0, "iscrowd": 0, "segmentation": [[376.81, 238.8,........382.74, 241.17]], "image_id": 245915, "area": 3556.2197000000015, "bbox": [399, 251, 155, 101]},
        {"id": 1410165, "category_id": 1, "iscrowd": 0, "segmentation": [[486.34, 239.01,..........495.95, 244.39]], "image_id": 245915, "area": 1775.8932499999994, "bbox": [86, 65, 220, 334]}
    ],
    "categories": [
        {"supercategory": "speaker","id": 0,"name": "echo"},
        {"supercategory": "speaker","id": 1,"name": "echo dot"}
    ]
}
```

## daftar gambar
<a name="md-coco-images"></a>

Gambar yang direferensikan oleh dataset COCO tercantum dalam larik gambar. Setiap objek gambar berisi informasi tentang gambar seperti nama file gambar. Dalam contoh objek gambar berikut, perhatikan informasi berikut dan bidang mana yang diperlukan untuk membuat file manifes Label Kustom Rekognition Amazon.
+ `id`— (Diperlukan) Pengidentifikasi unik untuk gambar. `id`Bidang memetakan ke `id` bidang dalam array anotasi (tempat informasi kotak pembatas disimpan).
+ `license`— (Tidak Diperlukan) Peta ke array lisensi. 
+ `coco_url`— (Opsional) Lokasi gambar.
+ `flickr_url`— (Tidak diperlukan) Lokasi gambar di Flickr.
+ `width`— (Wajib) Lebar gambar.
+ `height`— (Wajib) Ketinggian gambar.
+ `file_name`— (Wajib) Nama file gambar. Dalam contoh ini, `file_name` dan `id` cocok, tetapi ini bukan persyaratan untuk kumpulan data COCO. 
+ `date_captured`— (Wajib) tanggal dan waktu gambar diambil. 

```
{
    "id": 245915,
    "license": 4,
    "coco_url": "http://images.cocodataset.org/val2017/nnnnnnnnnnnn.jpg",
    "flickr_url": "http://farm1.staticflickr.com/88/nnnnnnnnnnnnnnnnnnn.jpg",
    "width": 640,
    "height": 480,
    "file_name": "000000245915.jpg",
    "date_captured": "2013-11-18 02:53:27"
}
```

## daftar anotasi (kotak pembatas)
<a name="md-coco-annotations"></a>

Informasi kotak pembatas untuk semua objek pada semua gambar disimpan daftar anotasi. Objek anotasi tunggal berisi informasi kotak pembatas untuk satu objek dan label objek pada gambar. Ada objek anotasi untuk setiap instance objek pada gambar. 

Dalam contoh berikut, perhatikan informasi berikut dan bidang mana yang diperlukan untuk membuat file manifes Label Kustom Rekognition Amazon. 
+ `id`— (Tidak diperlukan) Pengidentifikasi untuk anotasi.
+ `image_id`— (Wajib) Sesuai dengan gambar `id` dalam array gambar.
+ `category_id`— (Wajib) Pengidentifikasi untuk label yang mengidentifikasi objek dalam kotak pembatas. Ini memetakan ke `id` bidang array kategori. 
+ `iscrowd`— (Tidak diperlukan) Menentukan apakah gambar berisi kerumunan objek. 
+ `segmentation`— (Tidak diperlukan) Informasi segmentasi untuk objek pada gambar. Amazon Rekognition Custom Labels tidak mendukung segmentasi. 
+ `area`— (Tidak diperlukan) Area anotasi.
+ `bbox`— (Wajib) Berisi koordinat, dalam piksel, dari kotak pembatas di sekitar objek pada gambar.

```
{
    "id": 1409619,
    "category_id": 1,
    "iscrowd": 0,
    "segmentation": [
        [86.0, 238.8,..........382.74, 241.17]
    ],
    "image_id": 245915,
    "area": 3556.2197000000015,
    "bbox": [86, 65, 220, 334]
}
```

## daftar kategori
<a name="md-coco-categories"></a>

Informasi label disimpan array kategori. Dalam objek kategori contoh berikut, perhatikan informasi berikut dan bidang mana yang diperlukan untuk membuat file manifes Label Kustom Rekognition Amazon. 
+ `supercategory`— (Tidak wajib) Kategori induk untuk label. 
+ `id`— (Wajib) Pengidentifikasi label. `id`Bidang memetakan ke `category_id` bidang dalam suatu `annotation` objek. Dalam contoh berikut, Pengidentifikasi untuk titik gema adalah 2. 
+ `name`— (Diperlukan) nama label. 

```
        {"supercategory": "speaker","id": 2,"name": "echo dot"}
```

# Mengubah dataset COCO
<a name="md-coco-transform-example"></a>

Gunakan contoh Python berikut untuk mengubah informasi kotak pembatas dari kumpulan data format COCO menjadi file manifes Label Kustom Rekognition Amazon. Kode mengunggah file manifes yang dibuat ke bucket Amazon S3 Anda. Kode ini juga menyediakan perintah AWS CLI yang dapat Anda gunakan untuk mengunggah gambar Anda. 

**Untuk mengubah dataset COCO (SDK)**

1. Jika belum:

   1. Pastikan Anda memiliki `AmazonS3FullAccess` izin. Untuk informasi selengkapnya, lihat [Siapkan izin SDK](su-sdk-permissions.md).

   1. Instal dan konfigurasikan AWS CLI dan AWS SDKs. Untuk informasi selengkapnya, lihat [Langkah 4: Mengatur AWS CLI dan AWS SDKs](su-awscli-sdk.md).

1. Gunakan kode Python berikut untuk mengubah dataset COCO. Tetapkan nilai-nilai berikut.
   + `s3_bucket`— Nama bucket S3 tempat Anda ingin menyimpan gambar dan file manifes Label Kustom Rekognition Amazon. 
   + `s3_key_path_images`— Jalur ke tempat Anda ingin menempatkan gambar di dalam ember S3 (`s3_bucket`).
   + `s3_key_path_manifest_file`— Jalur ke tempat Anda ingin menempatkan file manifes Label Kustom dalam bucket S3 (`s3_bucket`).
   + `local_path`— Jalur lokal ke tempat contoh membuka kumpulan data COCO input dan juga menyimpan file manifes Label Kustom baru.
   + `local_images_path`— Jalur lokal ke gambar yang ingin Anda gunakan untuk pelatihan.
   + `coco_manifest`— Nama file dataset COCO masukan.
   + `cl_manifest_file`— Nama untuk file manifes yang dibuat oleh contoh. File disimpan di lokasi yang ditentukan oleh`local_path`. Dengan konvensi, file memiliki ekstensi`.manifest`, tetapi ini tidak diperlukan.
   + `job_name`— Nama untuk pekerjaan Label Kustom.

   ```
   import json
   import os
   import random
   import shutil
   import datetime
   import botocore
   import boto3
   import PIL.Image as Image
   import io
   
   #S3 location for images
   s3_bucket = 'bucket'
   s3_key_path_manifest_file = 'path to custom labels manifest file/'
   s3_key_path_images = 'path to images/'
   s3_path='s3://' + s3_bucket  + '/' + s3_key_path_images
   s3 = boto3.resource('s3')
   
   #Local file information
   local_path='path to input COCO dataset and output Custom Labels manifest/'
   local_images_path='path to COCO images/'
   coco_manifest = 'COCO dataset JSON file name'
   coco_json_file = local_path + coco_manifest
   job_name='Custom Labels job name'
   cl_manifest_file = 'custom_labels.manifest'
   
   label_attribute ='bounding-box'
   
   open(local_path + cl_manifest_file, 'w').close()
   
   # class representing a Custom Label JSON line for an image
   class cl_json_line:  
       def __init__(self,job, img):  
   
           #Get image info. Annotations are dealt with seperately
           sizes=[]
           image_size={}
           image_size["width"] = img["width"]
           image_size["depth"] = 3
           image_size["height"] = img["height"]
           sizes.append(image_size)
   
           bounding_box={}
           bounding_box["annotations"] = []
           bounding_box["image_size"] = sizes
   
           self.__dict__["source-ref"] = s3_path + img['file_name']
           self.__dict__[job] = bounding_box
   
           #get metadata
           metadata = {}
           metadata['job-name'] = job_name
           metadata['class-map'] = {}
           metadata['human-annotated']='yes'
           metadata['objects'] = [] 
           date_time_obj = datetime.datetime.strptime(img['date_captured'], '%Y-%m-%d %H:%M:%S')
           metadata['creation-date']= date_time_obj.strftime('%Y-%m-%dT%H:%M:%S') 
           metadata['type']='groundtruth/object-detection'
           
           self.__dict__[job + '-metadata'] = metadata
   
   
   print("Getting image, annotations, and categories from COCO file...")
   
   with open(coco_json_file) as f:
   
       #Get custom label compatible info    
       js = json.load(f)
       images = js['images']
       categories = js['categories']
       annotations = js['annotations']
   
       print('Images: ' + str(len(images)))
       print('annotations: ' + str(len(annotations)))
       print('categories: ' + str(len (categories)))
   
   
   print("Creating CL JSON lines...")
       
   images_dict = {image['id']: cl_json_line(label_attribute, image) for image in images}
   
   print('Parsing annotations...')
   for annotation in annotations:
   
       image=images_dict[annotation['image_id']]
   
       cl_annotation = {}
       cl_class_map={}
   
       # get bounding box information
       cl_bounding_box={}
       cl_bounding_box['left'] = annotation['bbox'][0]
       cl_bounding_box['top'] = annotation['bbox'][1]
    
       cl_bounding_box['width'] = annotation['bbox'][2]
       cl_bounding_box['height'] = annotation['bbox'][3]
       cl_bounding_box['class_id'] = annotation['category_id']
   
       getattr(image, label_attribute)['annotations'].append(cl_bounding_box)
   
   
       for category in categories:
            if annotation['category_id'] == category['id']:
               getattr(image, label_attribute + '-metadata')['class-map'][category['id']]=category['name']
           
       
       cl_object={}
       cl_object['confidence'] = int(1)  #not currently used by Custom Labels
       getattr(image, label_attribute + '-metadata')['objects'].append(cl_object)
   
   print('Done parsing annotations')
   
   # Create manifest file.
   print('Writing Custom Labels manifest...')
   
   for im in images_dict.values():
   
       with open(local_path+cl_manifest_file, 'a+') as outfile:
               json.dump(im.__dict__,outfile)
               outfile.write('\n')
               outfile.close()
   
   # Upload manifest file to S3 bucket.
   print ('Uploading Custom Labels manifest file to S3 bucket')
   print('Uploading'  + local_path + cl_manifest_file + ' to ' + s3_key_path_manifest_file)
   print(s3_bucket)
   s3 = boto3.resource('s3')
   s3.Bucket(s3_bucket).upload_file(local_path + cl_manifest_file, s3_key_path_manifest_file + cl_manifest_file)
   
   # Print S3 URL to manifest file,
   print ('S3 URL Path to manifest file. ')
   print('\033[1m s3://' + s3_bucket + '/' + s3_key_path_manifest_file + cl_manifest_file + '\033[0m') 
   
   # Display aws s3 sync command.
   print ('\nAWS CLI s3 sync command to upload your images to S3 bucket. ')
   print ('\033[1m aws s3 sync ' + local_images_path + ' ' + s3_path + '\033[0m')
   ```

1. Jalankan kode tersebut.

1. Dalam output program, perhatikan `s3 sync` perintahnya. Anda membutuhkannya di langkah berikutnya.

1. Pada prompt perintah, jalankan `s3 sync` perintah. Gambar Anda diunggah ke bucket S3. Jika perintah gagal selama upload, jalankan lagi hingga gambar lokal Anda disinkronkan dengan bucket S3.

1. Dalam output program, perhatikan jalur URL S3 ke file manifes. Anda membutuhkannya di langkah berikutnya.

1. Ikuti instruksi di [Membuat kumpulan data dengan file manifes SageMaker AI Ground Truth (Console)](md-create-dataset-ground-truth.md#md-create-dataset-ground-truth-console) untuk membuat kumpulan data dengan file manifes yang diunggah. Untuk langkah 8, di **lokasi file.manifest**, masukkan URL Amazon S3 yang Anda catat di langkah sebelumnya. Jika Anda menggunakan AWS SDK, lakukan[Membuat kumpulan data dengan file manifes SageMaker AI Ground Truth (SDK)](md-create-dataset-ground-truth.md#md-create-dataset-ground-truth-sdk).

# Mengubah file manifes SageMaker AI Ground Truth multi-label
<a name="md-gt-cl-transform"></a>

Topik ini menunjukkan kepada Anda cara mengubah file manifes Amazon SageMaker AI Ground Truth multi-label menjadi file manifes format Label Kustom Rekognition Amazon. 

SageMaker File manifes AI Ground Truth untuk pekerjaan multi-label diformat secara berbeda dari file manifes format Amazon Rekognition Custom Labels. Klasifikasi multi-label adalah ketika gambar diklasifikasikan ke dalam satu set kelas, tetapi mungkin milik beberapa kelas sekaligus. Dalam hal ini, gambar berpotensi memiliki beberapa label (multi-label), seperti *sepak bola* dan *bola*.

Untuk informasi tentang pekerjaan SageMaker AI Ground Truth multi-label, lihat [Klasifikasi Gambar (Multi-label](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-image-classification-multilabel.html)). Untuk informasi tentang file manifes Label Kustom Amazon Rekognition format multi-label, lihat. [Menambahkan beberapa label tingkat gambar ke gambar](md-create-manifest-file-classification.md#md-dataset-purpose-classification-multiple-labels)

## Mendapatkan file manifes untuk pekerjaan SageMaker AI Ground Truth
<a name="md-get-gt-manifest"></a>

Prosedur berikut menunjukkan cara mendapatkan file manifes keluaran (`output.manifest`) untuk pekerjaan Amazon SageMaker AI Ground Truth. Anda menggunakan `output.manifest` sebagai masukan untuk prosedur berikutnya.

**Untuk mengunduh file manifes pekerjaan SageMaker AI Ground Truth**

1. Buka [https://console.aws.amazon.com/sagemaker/](https://console.aws.amazon.com/sagemaker/). 

1. Di panel navigasi, pilih **Ground Truth** lalu pilih **Labeling** Jobs. 

1. Pilih pekerjaan pelabelan yang berisi file manifes yang ingin Anda gunakan.

1. Pada halaman detail, pilih tautan di bawah **Lokasi set data keluaran**. Konsol Amazon S3 dibuka di lokasi dataset. 

1. Pilih`Manifests`, `output` dan kemudian`output.manifest`.

1. Pilih **Tindakan Objek** dan kemudian pilih **Unduh** untuk mengunduh file manifes.

## Mengubah file manifes SageMaker AI multi-label
<a name="md-transform-ml-gt"></a>

Prosedur berikut membuat file manifes Amazon Rekognition Custom Labels format multi-label dari file manifes AI format SageMaker GroundTruth multi-label yang ada.

**catatan**  
Untuk menjalankan kode, Anda memerlukan Python versi 3, atau lebih tinggi.<a name="md-procedure-multi-label-transform"></a>

**Untuk mengubah file manifes SageMaker AI multi-label**

1. Jalankan kode python berikut. Berikan nama file manifes yang Anda buat [Mendapatkan file manifes untuk pekerjaan SageMaker AI Ground Truth](#md-get-gt-manifest) sebagai argumen baris perintah.

   ```
   # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
   # SPDX-License-Identifier:  Apache-2.0
   """
   Purpose
   Shows how to create and Amazon Rekognition Custom Labels format
   manifest file from an Amazon SageMaker Ground Truth Image
   Classification (Multi-label) format manifest file.
   """
   import json
   import logging
   import argparse
   import os.path
   
   logger = logging.getLogger(__name__)
   
   def create_manifest_file(ground_truth_manifest_file):
       """
       Creates an Amazon Rekognition Custom Labels format manifest file from
       an Amazon SageMaker Ground Truth Image Classification (Multi-label) format
       manifest file.
       :param: ground_truth_manifest_file: The name of the Ground Truth manifest file,
       including the relative path.
       :return: The name of the new Custom Labels manifest file.
       """
   
       logger.info('Creating manifest file from %s', ground_truth_manifest_file)
       new_manifest_file = f'custom_labels_{os.path.basename(ground_truth_manifest_file)}'
   
       # Read the SageMaker Ground Truth manifest file into memory.
       with open(ground_truth_manifest_file) as gt_file:
           lines = gt_file.readlines()
   
       #Iterate through the lines one at a time to generate the
       #new lines for the Custom Labels manifest file.
       with open(new_manifest_file, 'w') as the_new_file:
           for line in lines:
               #job_name - The of the Amazon Sagemaker Ground Truth job.
               job_name = ''
               # Load in the old json item from the Ground Truth manifest file
               old_json = json.loads(line)
   
               # Get the job name
               keys = old_json.keys()
               for key in keys:
                   if 'source-ref' not in key and '-metadata' not in key:
                       job_name = key
   
               new_json = {}
               # Set the location of the image
               new_json['source-ref'] = old_json['source-ref']
   
               # Temporarily store the list of labels
               labels = old_json[job_name]
   
               # Iterate through the labels and reformat to Custom Labels format
               for index, label in enumerate(labels):
                   new_json[f'{job_name}{index}'] = index
                   metadata = {}
                   metadata['class-name'] = old_json[f'{job_name}-metadata']['class-map'][str(label)]
                   metadata['confidence'] = old_json[f'{job_name}-metadata']['confidence-map'][str(label)]
                   metadata['type'] = 'groundtruth/image-classification'
                   metadata['job-name'] = old_json[f'{job_name}-metadata']['job-name']
                   metadata['human-annotated'] = old_json[f'{job_name}-metadata']['human-annotated']
                   metadata['creation-date'] = old_json[f'{job_name}-metadata']['creation-date']
                   # Add the metadata to new json line
                   new_json[f'{job_name}{index}-metadata'] = metadata
               # Write the current line to the json file
               the_new_file.write(json.dumps(new_json))
               the_new_file.write('\n')
   
       logger.info('Created %s', new_manifest_file)
       return  new_manifest_file
   
   def add_arguments(parser):
       """
       Adds command line arguments to the parser.
       :param parser: The command line parser.
       """
   
       parser.add_argument(
           "manifest_file", help="The Amazon SageMaker Ground Truth manifest file"
           "that you want to use."
       )
   
   
   def main():
       logging.basicConfig(level=logging.INFO,
                           format="%(levelname)s: %(message)s")
       try:
           # get command line arguments
           parser = argparse.ArgumentParser(usage=argparse.SUPPRESS)
           add_arguments(parser)
           args = parser.parse_args()
           # Create the manifest file
           manifest_file = create_manifest_file(args.manifest_file)
           print(f'Manifest file created: {manifest_file}')
       except FileNotFoundError as err:
           logger.exception('File not found: %s', err)
           print(f'File not found: {err}. Check your manifest file.')
   
   if __name__ == "__main__":
       main()
   ```

1. Perhatikan nama file manifes baru yang ditampilkan skrip. Anda menggunakannya di langkah berikutnya.

1. [Unggah file manifes Anda](https://docs.aws.amazon.com/AmazonS3/latest/userguide/upload-objects.html) ke bucket Amazon S3 yang ingin Anda gunakan untuk menyimpan file manifes.
**catatan**  
Pastikan Label Kustom Amazon Rekognition memiliki akses ke bucket Amazon S3 yang direferensikan di bidang baris JSON file `source-ref` manifes. Untuk informasi selengkapnya, lihat [Mengakses Bucket Amazon S3 eksternal](su-console-policy.md#su-external-buckets). Jika lowongan Ground Truth menyimpan gambar di Bucket Konsol Label Kustom Amazon Rekognition, Anda tidak perlu menambahkan izin.

1. Ikuti petunjuk di [Membuat kumpulan data dengan file manifes SageMaker AI Ground Truth (Console)](md-create-dataset-ground-truth.md#md-create-dataset-ground-truth-console) untuk membuat kumpulan data dengan file manifes yang diunggah. Untuk langkah 8, di **lokasi file.manifest**, masukkan URL Amazon S3 untuk lokasi file manifes. Jika Anda menggunakan AWS SDK, lakukan[Membuat kumpulan data dengan file manifes SageMaker AI Ground Truth (SDK)](md-create-dataset-ground-truth.md#md-create-dataset-ground-truth-sdk).

# Membuat file manifes dari file CSV
<a name="ex-csv-manifest"></a>

Contoh skrip Python ini menyederhanakan pembuatan file manifes dengan menggunakan file Comma Separated Values (CSV) untuk memberi label gambar. Anda membuat file CSV. File manifes cocok untuk [klasifikasi gambar Multi-label](getting-started.md#gs-multi-label-image-classification-example) atau[Klasifikasi gambar multi-label](getting-started.md#gs-multi-label-image-classification-example). Untuk informasi selengkapnya, lihat [Temukan objek, adegan, dan konsep](understanding-custom-labels.md#tm-classification). 

**catatan**  
Skrip ini tidak membuat file manifes yang cocok untuk menemukan [lokasi objek](understanding-custom-labels.md#tm-object-localization) atau untuk menemukan [lokasi merek](understanding-custom-labels.md#tm-brand-detection-localization).

File manifes menjelaskan gambar yang digunakan untuk melatih model. Misalnya, lokasi gambar dan label yang ditetapkan untuk gambar. File manifes terdiri dari satu atau lebih baris JSON. Setiap baris JSON menggambarkan satu gambar. Untuk informasi selengkapnya, lihat [Mengimpor label tingkat gambar dalam file manifes](md-create-manifest-file-classification.md).

File CSV mewakili data tabular di beberapa baris dalam file teks. Bidang pada baris dipisahkan dengan koma. Untuk informasi selengkapnya, lihat [nilai yang dipisahkan koma](https://en.wikipedia.org/wiki/Comma-separated_values). Untuk skrip ini, setiap baris dalam file CSV Anda mewakili satu gambar dan memetakan ke Baris JSON dalam file manifes. Untuk membuat file CSV untuk file manifes yang mendukung [klasifikasi gambar Multi-label](getting-started.md#gs-multi-label-image-classification-example), Anda menambahkan satu atau beberapa label tingkat gambar ke setiap baris. Untuk membuat file manifes yang cocok[Klasifikasi gambar](getting-started.md#gs-image-classification-example), Anda menambahkan satu label tingkat gambar ke setiap baris.

Misalnya, File CSV berikut menjelaskan gambar dalam proyek [Klasifikasi gambar multi-label](getting-started.md#gs-multi-label-image-classification-example) (Bunga) *Memulai*. 

```
camellia1.jpg,camellia,with_leaves
camellia2.jpg,camellia,with_leaves
camellia3.jpg,camellia,without_leaves
helleborus1.jpg,helleborus,without_leaves,not_fully_grown
helleborus2.jpg,helleborus,with_leaves,fully_grown
helleborus3.jpg,helleborus,with_leaves,fully_grown
jonquil1.jpg,jonquil,with_leaves
jonquil2.jpg,jonquil,with_leaves
jonquil3.jpg,jonquil,with_leaves
jonquil4.jpg,jonquil,without_leaves
mauve_honey_myrtle1.jpg,mauve_honey_myrtle,without_leaves
mauve_honey_myrtle2.jpg,mauve_honey_myrtle,with_leaves
mauve_honey_myrtle3.jpg,mauve_honey_myrtle,with_leaves
mediterranean_spurge1.jpg,mediterranean_spurge,with_leaves
mediterranean_spurge2.jpg,mediterranean_spurge,without_leaves
```

Script menghasilkan JSON Lines untuk setiap baris. Sebagai contoh, berikut ini adalah JSON Line untuk baris pertama (`camellia1.jpg,camellia,with_leaves`).

```
{"source-ref": "s3://bucket/flowers/train/camellia1.jpg","camellia": 1,"camellia-metadata":{"confidence": 1,"job-name": "labeling-job/camellia","class-name": "camellia","human-annotated": "yes","creation-date": "2022-01-21T14:21:05","type": "groundtruth/image-classification"},"with_leaves": 1,"with_leaves-metadata":{"confidence": 1,"job-name": "labeling-job/with_leaves","class-name": "with_leaves","human-annotated": "yes","creation-date": "2022-01-21T14:21:05","type": "groundtruth/image-classification"}}
```

Dalam contoh CSV, jalur Amazon S3 ke gambar tidak ada. Jika file CSV Anda tidak menyertakan jalur Amazon S3 untuk gambar, gunakan `--s3_path` argumen baris perintah untuk menentukan jalur Amazon S3 ke gambar. 

Skrip merekam entri pertama untuk setiap gambar dalam file CSV gambar yang tidak digandakan. File CSV gambar yang dideduplikasi berisi satu contoh dari setiap gambar yang ditemukan dalam file CSV input. Kemunculan lebih lanjut dari gambar dalam file CSV input direkam dalam file CSV gambar duplikat. Jika skrip menemukan gambar duplikat, tinjau file CSV gambar duplikat dan perbarui file CSV gambar yang tidak digandakan seperlunya. Jalankan kembali skrip dengan file deduplikat. Jika tidak ada duplikat yang ditemukan dalam file CSV input, skrip menghapus file CSV gambar yang tidak digandakan dan gambar duplikat, karena kosong. CSVfile 

 Dalam prosedur ini, Anda membuat file CSV dan menjalankan skrip Python untuk membuat file manifes. 

**Untuk membuat file manifes dari file CSV**

1. Buat file CSV dengan bidang berikut di setiap baris (satu baris per gambar). Jangan menambahkan baris header ke file CSV.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/id_id/rekognition/latest/customlabels-dg/ex-csv-manifest.html)

   Misalnya `camellia1.jpg,camellia,with_leaves` atau `s3://my-bucket/flowers/train/camellia1.jpg,camellia,with_leaves` 

1. Simpan file CSV.

1. Jalankan skrip Python berikut. Berikan argumen berikut:
   + `csv_file`— File CSV yang Anda buat di langkah 1. 
   + `manifest_file`— Nama file manifes yang ingin Anda buat.
   + (Opsional) `--s3_path s3://path_to_folder/` - Jalur Amazon S3 untuk ditambahkan ke nama file gambar (bidang 1). Gunakan `--s3_path` jika gambar di bidang 1 belum berisi jalur S3.

   ```
   # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
   # SPDX-License-Identifier:  Apache-2.0
   
   from datetime import datetime, timezone
   import argparse
   import logging
   import csv
   import os
   import json
   
   """
   Purpose
   Amazon Rekognition Custom Labels model example used in the service documentation.
   Shows how to create an image-level (classification) manifest file from a CSV file.
   You can specify multiple image level labels per image.
   CSV file format is
   image,label,label,..
   If necessary, use the bucket argument to specify the S3 bucket folder for the images.
   https://docs.aws.amazon.com/rekognition/latest/customlabels-dg/md-gt-cl-transform.html
   """
   
   logger = logging.getLogger(__name__)
   
   
   def check_duplicates(csv_file, deduplicated_file, duplicates_file):
       """
       Checks for duplicate images in a CSV file. If duplicate images
       are found, deduplicated_file is the deduplicated CSV file - only the first
       occurence of a duplicate is recorded. Other duplicates are recorded in duplicates_file.
       :param csv_file: The source CSV file.
       :param deduplicated_file: The deduplicated CSV file to create. If no duplicates are found
       this file is removed.
       :param duplicates_file: The duplicate images CSV file to create. If no duplicates are found
       this file is removed.
       :return: True if duplicates are found, otherwise false.
       """
   
       logger.info("Deduplicating %s", csv_file)
   
       duplicates_found = False
   
       # Find duplicates.
       with open(csv_file, 'r', newline='', encoding="UTF-8") as f,\
               open(deduplicated_file, 'w', encoding="UTF-8") as dedup,\
               open(duplicates_file, 'w', encoding="UTF-8") as duplicates:
   
           reader = csv.reader(f, delimiter=',')
           dedup_writer = csv.writer(dedup)
           duplicates_writer = csv.writer(duplicates)
   
           entries = set()
           for row in reader:
               # Skip empty lines.
               if not ''.join(row).strip():
                   continue
   
               key = row[0]
               if key not in entries:
                   dedup_writer.writerow(row)
                   entries.add(key)
               else:
                   duplicates_writer.writerow(row)
                   duplicates_found = True
   
       if duplicates_found:
           logger.info("Duplicates found check %s", duplicates_file)
   
       else:
           os.remove(duplicates_file)
           os.remove(deduplicated_file)
   
       return duplicates_found
   
   
   def create_manifest_file(csv_file, manifest_file, s3_path):
       """
       Reads a CSV file and creates a Custom Labels classification manifest file.
       :param csv_file: The source CSV file.
       :param manifest_file: The name of the manifest file to create.
       :param s3_path: The S3 path to the folder that contains the images.
       """
       logger.info("Processing CSV file %s", csv_file)
   
       image_count = 0
       label_count = 0
   
       with open(csv_file, newline='', encoding="UTF-8") as csvfile,\
               open(manifest_file, "w", encoding="UTF-8") as output_file:
   
           image_classifications = csv.reader(
               csvfile, delimiter=',', quotechar='|')
   
           # Process each row (image) in CSV file.
           for row in image_classifications:
               source_ref = str(s3_path)+row[0]
   
               image_count += 1
   
               # Create JSON for image source ref.
               json_line = {}
               json_line['source-ref'] = source_ref
   
               # Process each image level label.
               for index in range(1, len(row)):
                   image_level_label = row[index]
   
                   # Skip empty columns.
                   if image_level_label == '':
                       continue
                   label_count += 1
   
                  # Create the JSON line metadata.
                   json_line[image_level_label] = 1
                   metadata = {}
                   metadata['confidence'] = 1
                   metadata['job-name'] = 'labeling-job/' + image_level_label
                   metadata['class-name'] = image_level_label
                   metadata['human-annotated'] = "yes"
                   metadata['creation-date'] = \
                       datetime.now(timezone.utc).strftime('%Y-%m-%dT%H:%M:%S.%f')
                   metadata['type'] = "groundtruth/image-classification"
   
                   json_line[f'{image_level_label}-metadata'] = metadata
   
                   # Write the image JSON Line.
               output_file.write(json.dumps(json_line))
               output_file.write('\n')
   
       output_file.close()
       logger.info("Finished creating manifest file %s\nImages: %s\nLabels: %s",
                   manifest_file, image_count, label_count)
   
       return image_count, label_count
   
   
   def add_arguments(parser):
       """
       Adds command line arguments to the parser.
       :param parser: The command line parser.
       """
   
       parser.add_argument(
           "csv_file", help="The CSV file that you want to process."
       )
   
       parser.add_argument(
           "--s3_path", help="The S3 bucket and folder path for the images."
           " If not supplied, column 1 is assumed to include the S3 path.", required=False
       )
   
   
   def main():
   
       logging.basicConfig(level=logging.INFO,
                           format="%(levelname)s: %(message)s")
   
       try:
   
           # Get command line arguments
           parser = argparse.ArgumentParser(usage=argparse.SUPPRESS)
           add_arguments(parser)
           args = parser.parse_args()
   
           s3_path = args.s3_path
           if s3_path is None:
               s3_path = ''
   
           # Create file names.
           csv_file = args.csv_file
           file_name = os.path.splitext(csv_file)[0]
           manifest_file = f'{file_name}.manifest'
           duplicates_file = f'{file_name}-duplicates.csv'
           deduplicated_file = f'{file_name}-deduplicated.csv'
   
           # Create manifest file, if there are no duplicate images.
           if check_duplicates(csv_file, deduplicated_file, duplicates_file):
               print(f"Duplicates found. Use {duplicates_file} to view duplicates "
                     f"and then update {deduplicated_file}. ")
               print(f"{deduplicated_file} contains the first occurence of a duplicate. "
                     "Update as necessary with the correct label information.")
               print(f"Re-run the script with {deduplicated_file}")
           else:
               print("No duplicates found. Creating manifest file.")
   
               image_count, label_count = create_manifest_file(csv_file,
                                                               manifest_file,
                                                               s3_path)
   
               print(f"Finished creating manifest file: {manifest_file} \n"
                     f"Images: {image_count}\nLabels: {label_count}")
   
       except FileNotFoundError as err:
           logger.exception("File not found: %s", err)
           print(f"File not found: {err}. Check your input CSV file.")
   
   
   if __name__ == "__main__":
       main()
   ```

1. Jika Anda berencana menggunakan kumpulan data pengujian, ulangi langkah 1-3 untuk membuat file manifes untuk kumpulan data pengujian Anda.

1. Jika perlu, salin gambar ke jalur bucket Amazon S3 yang Anda tentukan di kolom 1 file CSV (atau ditentukan dalam `--s3_path` baris perintah). Anda dapat menggunakan perintah AWS S3 berikut.

   ```
   aws s3 cp --recursive your-local-folder s3://your-target-S3-location
   ```

1. [Unggah file manifes Anda](https://docs.aws.amazon.com/AmazonS3/latest/userguide/upload-objects.html) ke bucket Amazon S3 yang ingin Anda gunakan untuk menyimpan file manifes.
**catatan**  
Pastikan Label Kustom Amazon Rekognition memiliki akses ke bucket Amazon S3 yang direferensikan di bidang baris JSON file `source-ref` manifes. Untuk informasi selengkapnya, lihat [Mengakses Bucket Amazon S3 eksternal](su-console-policy.md#su-external-buckets). Jika lowongan Ground Truth menyimpan gambar di Bucket Konsol Label Kustom Amazon Rekognition, Anda tidak perlu menambahkan izin.

1. Ikuti petunjuk di [Membuat kumpulan data dengan file manifes SageMaker AI Ground Truth (Console)](md-create-dataset-ground-truth.md#md-create-dataset-ground-truth-console) untuk membuat kumpulan data dengan file manifes yang diunggah. Untuk langkah 8, di **lokasi file.manifest**, masukkan URL Amazon S3 untuk lokasi file manifes. Jika Anda menggunakan AWS SDK, lakukan[Membuat kumpulan data dengan file manifes SageMaker AI Ground Truth (SDK)](md-create-dataset-ground-truth.md#md-create-dataset-ground-truth-sdk).