

Terjemahan disediakan oleh mesin penerjemah. Jika konten terjemahan yang diberikan bertentangan dengan versi bahasa Inggris aslinya, utamakan versi bahasa Inggris.

# Kerangka Kerja yang Didukung Region AWS, Jenis Instance, dan Model yang Diuji
<a name="training-compiler-support"></a>

**penting**  
Amazon Web Services (AWS) mengumumkan bahwa tidak akan ada rilis baru atau versi SageMaker Training Compiler. Anda dapat terus menggunakan SageMaker Training Compiler melalui AWS Deep Learning Containers (DLCs) for SageMaker Training yang ada. Penting untuk dicatat bahwa meskipun yang ada DLCs tetap dapat diakses, mereka tidak akan lagi menerima tambalan atau pembaruan dari AWS, sesuai dengan [Kebijakan Dukungan Framework AWS Deep Learning Containers](https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/support-policy.html).

Sebelum menggunakan SageMaker Training Compiler, periksa apakah kerangka kerja pilihan Anda didukung, jenis instans tersedia di Anda Akun AWS, dan Anda Akun AWS ada di salah satu yang didukung Region AWS.

**catatan**  
SageMaker Training Compiler tersedia di SageMaker Python SDK v2.70.0 atau yang lebih baru.

## Kerangka Kerja yang Didukung
<a name="training-compiler-supported-frameworks"></a>

SageMaker Training Compiler mendukung kerangka pembelajaran mendalam berikut dan tersedia melalui AWS Deep Learning Containers.

**Topics**
+ [PyTorch](#training-compiler-supported-frameworks-pytorch)
+ [TensorFlow](#training-compiler-supported-frameworks-tensorflow)

### PyTorch
<a name="training-compiler-supported-frameworks-pytorch"></a>



- **PyTorch**
  - **Versi kerangka:** PyTorch v1.13.1 / **URI Wadah Pembelajaran Mendalam:** 763104351884.dkr.ecr. {{<region>}}.amazonaws.com/:1.12.0-gpu-py38-cu113-ubuntu20.04-sagemaker pytorch-trcomp-training / **Dapat diperpanjang untuk kustomisasi Docker:** Tidak
  - **Versi kerangka:** PyTorch v1.12.0 / **URI Wadah Pembelajaran Mendalam:** 763104351884.dkr.ecr. {{<region>}}.amazonaws.com/:1.13.1-gpu-py39-cu117-ubuntu20.04-sagemaker pytorch-trcomp-training / **Dapat diperpanjang untuk kustomisasi Docker:** Tidak

- **PyTorch dengan Hugging Face Transformers**
  - **Versi kerangka:** Transformator v4.21.1<br />PyTorch v1.11.0 / **URI Wadah Pembelajaran Mendalam:** 763104351884.dkr.ecr. {{<region>}}.amazonaws.com/:1.11.0-transformers4.21.1-gpu-py38-cu113-ubuntu20.04 huggingface-pytorch-trcomp-training / **Dapat diperpanjang untuk kustomisasi Docker:** Tidak
  - **Versi kerangka:** Transformer v4.17.0<br />PyTorch v1.10.2 / **URI Wadah Pembelajaran Mendalam:** 763104351884.dkr.ecr. {{<region>}}.amazonaws.com/:1.10.2-transformers4.17.0-gpu-py38-cu113-ubuntu20.04 huggingface-pytorch-trcomp-training / **Dapat diperpanjang untuk kustomisasi Docker:** Tidak
  - **Versi kerangka:** Transformer v4.11.0<br />PyTorch v1.9.0 / **URI Wadah Pembelajaran Mendalam:** 763104351884.dkr.ecr. {{<region>}}.amazonaws.com/:1.9.0-transformers4.11.0-gpu-py38-cu111-ubuntu20.04 huggingface-pytorch-training-comp / **Dapat diperpanjang untuk kustomisasi Docker:** Tidak



### TensorFlow
<a name="training-compiler-supported-frameworks-tensorflow"></a>



- **TensorFlow**
  - **Versi kerangka:** TensorFlow v2.11.0 / **URI Wadah Pembelajaran Mendalam:** 763104351884.dkr.ecr. {{<region>}}.amazonaws.com/tensorflow-training:2.11.0-gpu-py39-cu112-ubuntu20.04-sagemaker / **Dapat diperpanjang untuk kustomisasi Docker:** Ya
  - **Versi kerangka:** TensorFlow v2.10.0 / **URI Wadah Pembelajaran Mendalam:** 763104351884.dkr.ecr. {{<region>}}.amazonaws.com/tensorflow-training:2.10.0-gpu-py39-cu112-ubuntu20.04-sagemaker / **Dapat diperpanjang untuk kustomisasi Docker:** Ya
  - **Versi kerangka:** TensorFlow v2.9.1 / **URI Wadah Pembelajaran Mendalam:** 763104351884.dkr.ecr. {{<region>}}.amazonaws.com/tensorflow-training:2.9.1-gpu-py39-cu112-ubuntu20.04-sagemaker / **Dapat diperpanjang untuk kustomisasi Docker:** Ya

- **TensorFlow dengan Hugging Face Transformers**
  - **Versi kerangka:** Transformer v4.17.0<br />TensorFlow v2.6.3 / **URI Wadah Pembelajaran Mendalam:** 763104351884.dkr.ecr. {{<region>}}.amazonaws.com/:2.6.3-transformers4.17.0-gpu-py38-cu112-ubuntu20.04 huggingface-tensorflow-trcomp-training / **Dapat diperpanjang untuk kustomisasi Docker:** Tidak
  - **Versi kerangka:** Transformer v4.11.0<br />TensorFlow v2.5.1 / **URI Wadah Pembelajaran Mendalam:** 763104351884.dkr.ecr. {{<region>}}.amazonaws.com/:2.5.1-transformers4.11.0-gpu-py37-cu112-ubuntu18.04 huggingface-tensorflow-training-comp / **Dapat diperpanjang untuk kustomisasi Docker:** Tidak



Untuk informasi selengkapnya, lihat [Gambar yang Tersedia](https://github.com/aws/deep-learning-containers/blob/master/available_images.md) di * GitHub repositori AWS Deep Learning Containers*.

## Region AWS
<a name="training-compiler-availablity-zone"></a>

[Wadah Kompiler SageMaker Pelatihan](https://github.com/aws/deep-learning-containers/blob/master/available_images.md#sagemaker-training-compiler-containers) tersedia di Region AWS tempat [AWS Deep Learning Containers](https://github.com/aws/deep-learning-containers/blob/master/available_images.md) berada dalam layanan kecuali wilayah Tiongkok.

## Tipe Instans Yang Didukung
<a name="training-compiler-supported-instance-types"></a>

SageMaker Training Compiler diuji dan mendukung jenis instans ML berikut.
+ Instans P4
+ Instans P3
+ Instans G4dn
+ Instans G5

Untuk spesifikasi jenis instans, lihat bagian **Komputasi Akselerasi** di halaman Jenis [Instans Amazon EC2](https://aws.amazon.com/ec2/instance-types/). Untuk informasi tentang harga instans, lihat [ SageMaker Harga Amazon](https://aws.amazon.com/sagemaker/pricing/).

Jika Anda menemukan pesan kesalahan yang mirip dengan berikut ini, ikuti petunjuk di [Minta peningkatan kuota layanan untuk sumber daya SageMaker AI](https://docs.aws.amazon.com/sagemaker/latest/dg/regions-quotas.html#service-limit-increase-request-procedure).

```
ResourceLimitExceeded: An error occurred (ResourceLimitExceeded) when calling
the CreateTrainingJob operation: The account-level service limit 'ml.p3dn.24xlarge
for training job usage' is 0 Instances, with current utilization of 0 Instances
and a request delta of 1 Instances.
Please contact AWS support to request an increase for this limit.
```

## Model yang Diuji
<a name="training-compiler-tested-models"></a>

Tabel berikut mencakup daftar model yang telah diuji dengan SageMaker Training Compiler. Sebagai referensi, ukuran batch terbesar yang dapat dimasukkan ke dalam memori juga disertakan bersama parameter pelatihan lainnya. SageMaker Training Compiler dapat mengubah jejak memori dari proses pelatihan model; sebagai hasilnya, ukuran batch yang lebih besar sering dapat digunakan selama proses pelatihan, yang selanjutnya mengurangi total waktu pelatihan. Dalam beberapa kasus, SageMaker Training Compiler secara cerdas mempromosikan caching yang mengarah pada penurunan ukuran batch terbesar yang dapat muat pada GPU. Anda harus menyetel ulang hyperparameters model Anda dan menemukan ukuran batch yang optimal untuk casing Anda. Untuk menghemat waktu, gunakan tabel referensi berikut untuk mencari ukuran batch yang bisa menjadi titik awal yang baik untuk kasus penggunaan Anda.

**catatan**  
Ukuran batch adalah ukuran batch lokal yang sesuai dengan masing-masing GPU individu dalam jenis instans masing-masing. Anda juga harus menyesuaikan tingkat pembelajaran saat mengubah ukuran batch.

### PyTorch 1.13.1
<a name="training-compiler-tested-models-pt1131"></a>

**Model pemrosesan bahasa alami (NLP)**

Model-model berikut diuji untuk pekerjaan pelatihan untuk semua kombinasi node tunggal dan multi-node dengan core GPU tunggal atau multi dan Automatic Mixed Precision (AMP) seperti yang ditunjukkan.


<table>
<thead>
  <tr><th colspan="7">Tunggal- node/multi-node single-GPU/multi -GPU</th></tr>
  <tr><th>Model</th><th>Set data</th><th>Tipe instans</th><th>presisi</th><th>Panjang Urutan</th><th>Ukuran Batch untuk kerangka kerja asli </th><th>Ukuran Batch untuk SageMaker Training Compiler </th></tr>
</thead>
<tbody>
  <tr><td>albert-base-v2</td><td>wikitext-2-raw-v1</td><td>g4dn.16xlarge</td><td>mengapung16</td><td>128</td><td>80</td><td>192</td></tr>
  <tr><td>albert-base-v2</td><td>wikitext-2-raw-v1</td><td>g5.4xlarge</td><td>mengapung16</td><td>128</td><td>128</td><td>332</td></tr>
  <tr><td>albert-base-v2</td><td>wikitext-2-raw-v1</td><td>p3.2xlarge</td><td>mengapung16</td><td>128</td><td>80</td><td>224</td></tr>
  <tr><td>bert-base-uncased</td><td>wikitext-2-raw-v1</td><td>g5.4xlarge</td><td>mengapung16</td><td>128</td><td>160</td><td>288</td></tr>
  <tr><td>dasar camembert</td><td>wikitext-2-raw-v1</td><td>g5.4xlarge</td><td>mengapung16</td><td>128</td><td>160</td><td>280</td></tr>
  <tr><td>distilbert-base-uncased</td><td>wikitext-2-raw-v1</td><td>g5.4xlarge</td><td>mengapung16</td><td>128</td><td>240</td><td>472</td></tr>
  <tr><td>distilgpt2</td><td>wikitext-2-raw-v1</td><td>g4dn.16xlarge</td><td>mengapung16</td><td>128</td><td>77</td><td>128</td></tr>
  <tr><td>distilgpt2</td><td>wikitext-2-raw-v1</td><td>g5.4xlarge</td><td>mengapung16</td><td>128</td><td>138</td><td>390</td></tr>
  <tr><td>distilgpt2</td><td>wikitext-2-raw-v1</td><td>p3.2xlarge</td><td>mengapung16</td><td>128</td><td>96</td><td>256</td></tr>
  <tr><td>distilroberta-basis</td><td>wikitext-2-raw-v1</td><td>g4dn.16xlarge</td><td>mengapung16</td><td>128</td><td>96</td><td>192</td></tr>
  <tr><td>distilroberta-basis</td><td>wikitext-2-raw-v1</td><td>g5.4xlarge</td><td>mengapung16</td><td>128</td><td>171</td><td>380</td></tr>
  <tr><td>distilroberta-basis</td><td>wikitext-2-raw-v1</td><td>p3.2xlarge</td><td>mengapung16</td><td>128</td><td>112</td><td>256</td></tr>
  <tr><td>gpt2</td><td>wikitext-2-raw-v1</td><td>g4dn.16xlarge</td><td>mengapung16</td><td>128</td><td>52</td><td>152</td></tr>
  <tr><td>gpt2</td><td>wikitext-2-raw-v1</td><td>g5.4xlarge</td><td>mengapung16</td><td>128</td><td>84</td><td>240</td></tr>
  <tr><td>gpt2</td><td>wikitext-2-raw-v1</td><td>p3.2xlarge</td><td>mengapung16</td><td>128</td><td>58</td><td>164</td></tr>
  <tr><td>microsoft/deberta-basis</td><td>wikitext-2-raw-v1</td><td>g4dn.16xlarge</td><td>mengapung16</td><td>128</td><td>48</td><td>128</td></tr>
  <tr><td>microsoft/deberta-basis</td><td>wikitext-2-raw-v1</td><td>g5.4xlarge</td><td>mengapung16</td><td>128</td><td>84</td><td>207</td></tr>
  <tr><td>microsoft/deberta-basis</td><td>wikitext-2-raw-v1</td><td>p3.2xlarge</td><td>mengapung16</td><td>128</td><td>53</td><td>133</td></tr>
  <tr><td>roberta-basis</td><td>wikitext-2-raw-v1</td><td>g5.4xlarge</td><td>mengapung16</td><td>128</td><td>125</td><td>224</td></tr>
  <tr><td>xlm-roberta-base</td><td>wikitext-2-raw-v1</td><td>g4dn.16xlarge</td><td>mengapung16</td><td>128</td><td>16</td><td>31</td></tr>
  <tr><td>xlm-roberta-base</td><td>wikitext-2-raw-v1</td><td>p3.2xlarge</td><td>mengapung16</td><td>128</td><td>18</td><td>50</td></tr>
  <tr><td>xlnet-base-cased</td><td>wikitext-2-raw-v1</td><td>g5.4xlarge</td><td>mengapung16</td><td>128</td><td>128</td><td>240</td></tr>
  <tr><td>bert-base-uncased</td><td>wikitext-103-v1</td><td>g5.48xlarge</td><td>mengapung16</td><td>512</td><td>29</td><td>50</td></tr>
  <tr><td>distilbert-base-uncased</td><td>wikitext-103-v1</td><td>g5.48xlarge</td><td>mengapung16</td><td>512</td><td>45</td><td>64</td></tr>
  <tr><td>gpt2</td><td>wikitext-103-v1</td><td>g5.48xlarge</td><td>mengapung16</td><td>512</td><td>18</td><td>45</td></tr>
  <tr><td>roberta-basis</td><td>wikitext-103-v1</td><td>g5.48xlarge</td><td>mengapung16</td><td>512</td><td>23</td><td>44</td></tr>
  <tr><td>gpt2</td><td>wikitext-103-v1</td><td>p4d.24xlarge</td><td>mengapung16</td><td>512</td><td>36</td><td>64</td></tr>
</tbody>
</table>


**Model Computer Vision (CV)**

Diuji menggunakan [TensorFlowModel Garden](https://github.com/tensorflow/models) dengan Automatic Mixed Precision (AMP) seperti yang ditunjukkan.


<table>
<thead>
  <tr><th colspan="6">Single/multi-node single/multi-GPU</th></tr>
  <tr><th>Model</th><th>Set data</th><th>Tipe instans</th><th>presisi</th><th>Ukuran Batch untuk kerangka kerja asli </th><th>Ukuran Batch untuk SageMaker Training Compiler </th></tr>
</thead>
<tbody>
  <tr><td>ResNet152</td><td>makanan101</td><td>g4dn.16xlarge</td><td>mengapung16</td><td>128</td><td>144</td></tr>
  <tr><td>ResNet152</td><td>makanan101</td><td>g5.4xlarge</td><td>mengapung16</td><td>128</td><td>192</td></tr>
  <tr><td>ResNet152</td><td>makanan101</td><td>p3.2xlarge</td><td>mengapung16</td><td>152</td><td>156</td></tr>
  <tr><td>VIt</td><td>makanan101</td><td>g4dn.16xlarge</td><td>mengapung16</td><td>512</td><td>512</td></tr>
  <tr><td>VIt</td><td>makanan101</td><td>g5.4xlarge</td><td>mengapung16</td><td>992</td><td>768</td></tr>
  <tr><td>VIt</td><td>makanan101</td><td>p3.2xlarge</td><td>mengapung16</td><td>848</td><td>768</td></tr>
</tbody>
</table>


### PyTorch 1.12.0
<a name="training-compiler-tested-models-pt1120"></a>

**Model pemrosesan bahasa alami (NLP)**

Model-model berikut diuji untuk pekerjaan pelatihan untuk semua kombinasi node tunggal dan multi-node dengan core GPU tunggal atau multi dan Automatic Mixed Precision (AMP) seperti yang ditunjukkan.


<table>
<thead>
  <tr><th colspan="7">Tunggal- node/multi-node single-GPU/multi -GPU</th></tr>
  <tr><th>Model</th><th>Set data</th><th>Tipe instans</th><th>presisi</th><th>Panjang Urutan</th><th>Ukuran Batch untuk kerangka kerja asli </th><th>Ukuran Batch untuk SageMaker Training Compiler </th></tr>
</thead>
<tbody>
  <tr><td>albert-base-v2</td><td>wikitext-2-raw-v1</td><td>ml.g5.2xbesar</td><td>mengapung16</td><td>128</td><td>128</td><td>248</td></tr>
  <tr><td>bert-base-uncased</td><td>wikitext-2-raw-v1</td><td>ml.g5.2xbesar</td><td>mengapung16</td><td>128</td><td>160</td><td>288</td></tr>
  <tr><td>dasar camembert</td><td>wikitext-2-raw-v1</td><td>ml.g5.2xbesar</td><td>mengapung16</td><td>128</td><td>160</td><td>279</td></tr>
  <tr><td>dasar camembert</td><td>wikitext-2-raw-v1</td><td>ml.p3.2xlarge</td><td>mengapung16</td><td>128</td><td>105</td><td>164</td></tr>
  <tr><td>distilgpt2</td><td>wikitext-2-raw-v1</td><td>ml.g5.2xbesar</td><td>mengapung16</td><td>128</td><td>136</td><td>256</td></tr>
  <tr><td>distilgpt2</td><td>wikitext-2-raw-v1</td><td>ml.p3.2xlarge</td><td>mengapung16</td><td>128</td><td>80</td><td>118</td></tr>
  <tr><td>gpt2</td><td>wikitext-2-raw-v1</td><td>ml.g5.2xbesar</td><td>mengapung16</td><td>128</td><td>84</td><td>240</td></tr>
  <tr><td>gpt2</td><td>wikitext-2-raw-v1</td><td>ml.p3.2xlarge</td><td>mengapung16</td><td>128</td><td>80</td><td>119</td></tr>
  <tr><td>microsoft/deberta-basis</td><td>wikitext-2-raw-v1</td><td>ml.g5.2xbesar</td><td>mengapung16</td><td>128</td><td>93</td><td>197</td></tr>
  <tr><td>microsoft/deberta-basis</td><td>wikitext-2-raw-v1</td><td>ml.p3.2xlarge</td><td>mengapung16</td><td>128</td><td>113</td><td>130</td></tr>
  <tr><td>roberta-basis</td><td>wikitext-2-raw-v1</td><td>ml.g5.2xbesar</td><td>mengapung16</td><td>128</td><td>125</td><td>224</td></tr>
  <tr><td>roberta-basis</td><td>wikitext-2-raw-v1</td><td>ml.p3.2xlarge</td><td>mengapung16</td><td>128</td><td>78</td><td>112</td></tr>
  <tr><td>xlnet-base-cased</td><td>wikitext-2-raw-v1</td><td>ml.g5.2xbesar</td><td>mengapung16</td><td>128</td><td>138</td><td>240</td></tr>
  <tr><td>bert-base-uncased</td><td>wikitext-103-v1</td><td>ml.p4d.24xlarge</td><td>mengapung16</td><td>512</td><td></td><td>52</td></tr>
  <tr><td>distilbert-base-uncased</td><td>wikitext-103-v1</td><td>ml.p4d.24xlarge</td><td>mengapung16</td><td>512</td><td></td><td>160</td></tr>
  <tr><td>gpt2</td><td>wikitext-103-v1</td><td>ml.p4d.24xlarge</td><td>mengapung16</td><td>512</td><td></td><td>25</td></tr>
  <tr><td>roberta-basis</td><td>wikitext-103-v1</td><td>ml.p4d.24xlarge</td><td>mengapung16</td><td>512</td><td></td><td>64</td></tr>
</tbody>
</table>


### TensorFlow 2.11.0
<a name="training-compiler-tested-models-tf2110"></a>

**Model Computer Vision (CV)**

Diuji menggunakan [TensorFlowModel Garden](https://github.com/tensorflow/models) dengan Automatic Mixed Precision (AMP) seperti yang ditunjukkan.


<table>
<thead>
  <tr><th colspan="6">Single/multi-node single/multi-GPU</th></tr>
  <tr><th>Model</th><th>Set data</th><th>Tipe instans</th><th>presisi</th><th>Ukuran Batch untuk kerangka kerja asli </th><th>Ukuran Batch untuk SageMaker Training Compiler </th></tr>
</thead>
<tbody>
  <tr><td>MaskRCNN- 50-FPN ResNet</td><td>COCO-2017</td><td>ml.g5.2xbesar</td><td>mengapung16</td><td>6</td><td>8</td></tr>
  <tr><td>MaskRCNN- 50-FPN ResNet</td><td>COCO-2017</td><td>ml.p3.2xlarge</td><td>mengapung16</td><td>4</td><td>6</td></tr>
  <tr><td>ResNet50</td><td>ImageNet</td><td>ml.g5.2xbesar</td><td>mengapung16</td><td>192</td><td>256</td></tr>
  <tr><td>ResNet50</td><td>ImageNet</td><td>ml.p3.2xlarge</td><td>mengapung16</td><td>256</td><td>256</td></tr>
  <tr><td>ResNet101</td><td>ImageNet</td><td>ml.g5.2xbesar</td><td>mengapung16</td><td>128</td><td>256</td></tr>
  <tr><td>ResNet101</td><td>ImageNet</td><td>ml.p3.2xlarge</td><td>mengapung16</td><td>128</td><td>128</td></tr>
  <tr><td>ResNet152</td><td>ImageNet</td><td>ml.g5.2xbesar</td><td>mengapung16</td><td>128</td><td>224</td></tr>
  <tr><td>ResNet152</td><td>ImageNet</td><td>ml.p3.2xlarge</td><td>mengapung16</td><td>128</td><td>128</td></tr>
  <tr><td>VisionTransformer</td><td>ImageNet</td><td>ml.g5.2xbesar</td><td>mengapung16</td><td>112</td><td>144</td></tr>
  <tr><td>VisionTransformer</td><td>ImageNet</td><td>ml.p3.2xlarge</td><td>mengapung16</td><td>96</td><td>128</td></tr>
</tbody>
</table>


**Model Natural Language Processing (NLP)**

Diuji menggunakan [model Transformer](https://github.com/huggingface/transformers) dengan `Sequence_Len=128` dan Automatic Mixed Precision (AMP) seperti yang ditunjukkan.


<table>
<thead>
  <tr><th colspan="6">Single/multi-node single/multi-GPU</th></tr>
  <tr><th>Model</th><th>Set data</th><th>Tipe instans</th><th>presisi</th><th>Ukuran Batch untuk kerangka kerja asli </th><th>Ukuran Batch untuk SageMaker Training Compiler </th></tr>
</thead>
<tbody>
  <tr><td>albert-base-v2</td><td>wikitext-2-raw-v1</td><td>ml.g5.2xbesar</td><td>mengapung16</td><td>160</td><td>197</td></tr>
  <tr><td>albert-base-v2</td><td>wikitext-2-raw-v1</td><td>ml.p3.2xlarge</td><td>mengapung16</td><td>95</td><td>127</td></tr>
  <tr><td>bert-base-uncased</td><td>wikitext-2-raw-v1</td><td>ml.g5.2xbesar</td><td>mengapung16</td><td>160</td><td>128</td></tr>
  <tr><td>bert-base-uncased</td><td>wikitext-2-raw-v1</td><td>ml.p3.2xlarge</td><td>mengapung16</td><td>104</td><td>111</td></tr>
  <tr><td>bert-large-uncased</td><td>wikitext-2-raw-v1</td><td>ml.g5.2xbesar</td><td>mengapung16</td><td>65</td><td>48</td></tr>
  <tr><td>bert-large-uncased</td><td>wikitext-2-raw-v1</td><td>ml.p3.2xlarge</td><td>mengapung16</td><td>40</td><td>35</td></tr>
  <tr><td>dasar camembert</td><td>wikitext-2-raw-v1</td><td>ml.g5.2xbesar</td><td>mengapung16</td><td>128</td><td>162</td></tr>
  <tr><td>dasar camembert</td><td>wikitext-2-raw-v1</td><td>ml.p3.2xlarge</td><td>mengapung16</td><td>105</td><td>111</td></tr>
  <tr><td>distilbert-base-uncased</td><td>wikitext-2-raw-v1</td><td>ml.g5.2xbesar</td><td>mengapung16</td><td>256</td><td>264</td></tr>
  <tr><td>distilbert-base-uncased</td><td>wikitext-2-raw-v1</td><td>ml.p3.2xlarge</td><td>mengapung16</td><td>128</td><td>169</td></tr>
  <tr><td>gpt2</td><td>wikitext-2-raw-v1</td><td>ml.g5.2xbesar</td><td>mengapung16</td><td>128</td><td>120</td></tr>
  <tr><td>gpt2</td><td>wikitext-2-raw-v1</td><td>ml.p3.2xlarge</td><td>mengapung16</td><td>80</td><td>83</td></tr>
  <tr><td>jplu/ tf-xlm-roberta-base</td><td>wikitext-2-raw-v1</td><td>ml.g5.2xbesar</td><td>mengapung16</td><td>32</td><td>32</td></tr>
  <tr><td>jplu/ tf-xlm-roberta-base</td><td>wikitext-2-raw-v1</td><td>ml.p3.2xlarge</td><td>mengapung16</td><td>32</td><td>36</td></tr>
  <tr><td>microsoft/mpnet-basis</td><td>wikitext-2-raw-v1</td><td>ml.g5.2xbesar</td><td>mengapung16</td><td>144</td><td>160</td></tr>
  <tr><td>microsoft/mpnet-basis</td><td>wikitext-2-raw-v1</td><td>ml.p3.2xlarge</td><td>mengapung16</td><td>106</td><td>110</td></tr>
  <tr><td>roberta-basis</td><td>wikitext-2-raw-v1</td><td>ml.g5.2xbesar</td><td>mengapung16</td><td>128</td><td>128</td></tr>
  <tr><td>roberta-basis</td><td>wikitext-2-raw-v1</td><td>ml.p3.2xlarge</td><td>mengapung16</td><td>72</td><td>98</td></tr>
  <tr><td>albert-base-v2</td><td>wikitext-2-raw-v1</td><td>ml.g5.48xbesar</td><td>mengapung16</td><td>128</td><td>192</td></tr>
  <tr><td>albert-base-v2</td><td>wikitext-2-raw-v1</td><td>ml.p3.16xlarge</td><td>mengapung16</td><td>95</td><td>96</td></tr>
  <tr><td>distilbert-base-uncased</td><td>wikitext-2-raw-v1</td><td>ml.g5.48xbesar</td><td>mengapung 16</td><td>256</td><td>256</td></tr>
  <tr><td>distilbert-base-uncased</td><td>wikitext-2-raw-v1</td><td>ml.p3.16xlarge</td><td>mengapung 16</td><td>140</td><td>184</td></tr>
  <tr><td>google/ electra-small-discriminator</td><td>wikitext-2-raw-v1</td><td>ml.g5.48xbesar</td><td>mengapung 16</td><td>256</td><td>384</td></tr>
  <tr><td>google/ electra-small-discriminator</td><td>wikitext-2-raw-v1</td><td>ml.p3.16xlarge</td><td>mengapung 16</td><td>256</td><td>268</td></tr>
  <tr><td>gpt2</td><td>wikitext-2-raw-v1</td><td>ml.g5.48xbesar</td><td>mengapung 16</td><td>116</td><td>116</td></tr>
  <tr><td>gpt2</td><td>wikitext-2-raw-v1</td><td>ml.p3.16xlarge</td><td>mengapung 16</td><td>85</td><td>83</td></tr>
  <tr><td>gpt2</td><td>wikitext-2-raw-v1</td><td>ml.p4d.24xlarge</td><td>mengapung 16</td><td>94</td><td>110</td></tr>
  <tr><td>microsoft/mpnet-basis</td><td>wikitext-2-raw-v1</td><td>ml.g5.48xbesar</td><td>mengapung 16</td><td>187</td><td>164</td></tr>
  <tr><td>microsoft/mpnet-basis</td><td>wikitext-2-raw-v1</td><td>ml.p3.16xlarge</td><td>mengapung 16</td><td>106</td><td>111</td></tr>
</tbody>
</table>


### TensorFlow 2.10.0
<a name="training-compiler-tested-models-tf2100"></a>

**Model Computer Vision (CV)**

Diuji menggunakan [TensorFlowModel Garden](https://github.com/tensorflow/models) dengan Automatic Mixed Precision (AMP) seperti yang ditunjukkan.


<table>
<thead>
  <tr><th colspan="6">Node tunggal GPU/Multi-GPU</th></tr>
  <tr><th>Model</th><th>Set data</th><th>Tipe instans</th><th>presisi</th><th>Ukuran Batch untuk kerangka kerja asli </th><th>Ukuran Batch untuk SageMaker Training Compiler </th></tr>
</thead>
<tbody>
  <tr><td>DetectionTransformer- ResNet 50</td><td>COCO-2017</td><td>ml.g4dn.2xbesar</td><td>mengapung 32</td><td>2</td><td>4</td></tr>
  <tr><td>DetectionTransformer- ResNet 50</td><td>COCO-2017</td><td>ml.g5.2xbesar</td><td>mengapung 32</td><td>3</td><td>6</td></tr>
  <tr><td>DetectionTransformer- ResNet 50</td><td>COCO-2017</td><td>ml.p3.2xlarge</td><td>mengapung 32</td><td>2</td><td>4</td></tr>
  <tr><td>MaskRCNN- 50-FPN ResNet</td><td>COCO-2017</td><td>ml.g4dn.2xbesar</td><td>mengapung 16</td><td>4</td><td>6</td></tr>
  <tr><td>MaskRCNN- 50-FPN ResNet</td><td>COCO-2017</td><td>ml.g5.2xbesar</td><td>mengapung 16</td><td>6</td><td>8</td></tr>
  <tr><td>MaskRCNN- 50-FPN ResNet</td><td>COCO-2017</td><td>ml.g5.48xbesar</td><td>mengapung 16</td><td>48</td><td>64</td></tr>
  <tr><td>MaskRCNN- 50-FPN ResNet</td><td>COCO-2017</td><td>ml.p3.2xlarge</td><td>mengapung 16</td><td>4</td><td>6</td></tr>
  <tr><td>ResNet50</td><td>ImageNet</td><td>ml.g4dn.2xbesar</td><td>mengapung 16</td><td>224</td><td>256</td></tr>
  <tr><td>ResNet50</td><td>ImageNet</td><td>ml.g5.2xbesar</td><td>mengapung 16</td><td>192</td><td>160</td></tr>
  <tr><td>ResNet50</td><td>ImageNet</td><td>ml.g5.48xbesar</td><td>mengapung 16</td><td>2048</td><td>2048</td></tr>
  <tr><td>ResNet50</td><td>ImageNet</td><td>ml.p3.2xlarge</td><td>mengapung 16</td><td>224</td><td>160</td></tr>
  <tr><td>ResNet101</td><td>ImageNet</td><td>ml.g4dn.2xbesar</td><td>mengapung 16</td><td>160</td><td>128</td></tr>
  <tr><td>ResNet101</td><td>ImageNet</td><td>ml.g5.2xbesar</td><td>mengapung 16</td><td>192</td><td>256</td></tr>
  <tr><td>ResNet101</td><td>ImageNet</td><td>ml.g5.48xbesar</td><td>mengapung 16</td><td>2048</td><td>2048</td></tr>
  <tr><td>ResNet101</td><td>ImageNet</td><td>ml.p3.2xlarge</td><td>mengapung 16</td><td>160</td><td>224</td></tr>
  <tr><td>ResNet152</td><td>ImageNet</td><td>ml.g4dn.2xbesar</td><td>mengapung 16</td><td>128</td><td>128</td></tr>
  <tr><td>ResNet152</td><td>ImageNet</td><td>ml.g5.2xbesar</td><td>mengapung 16</td><td>192</td><td>224</td></tr>
  <tr><td>ResNet152</td><td>ImageNet</td><td>ml.g5.48xbesar</td><td>mengapung 16</td><td>1536</td><td>1792</td></tr>
  <tr><td>ResNet152</td><td>ImageNet</td><td>ml.p3.2xlarge</td><td>mengapung 16</td><td>128</td><td>160</td></tr>
  <tr><td>VisionTransformer</td><td>ImageNet</td><td>ml.g4dn.2xbesar</td><td>mengapung 16</td><td>80</td><td>128</td></tr>
  <tr><td>VisionTransformer</td><td>ImageNet</td><td>ml.g5.2xbesar</td><td>mengapung 16</td><td>112</td><td>144</td></tr>
  <tr><td>VisionTransformer</td><td>ImageNet</td><td>ml.g5.48xbesar</td><td>mengapung 16</td><td>896</td><td>1152</td></tr>
  <tr><td>VisionTransformer</td><td>ImageNet</td><td>ml.p3.2xlarge</td><td>mengapung 16</td><td>80</td><td>128</td></tr>
</tbody>
</table>


**Model Natural Language Processing (NLP)**

Diuji menggunakan [model Transformer](https://github.com/huggingface/transformers) dengan `Sequence_Len=128` dan Automatic Mixed Precision (AMP) seperti yang ditunjukkan.


<table>
<thead>
  <tr><th colspan="6">Node tunggal GPU/Multi-GPU</th></tr>
  <tr><th>Model</th><th>Set data</th><th>Tipe instans</th><th>presisi</th><th>Ukuran Batch untuk kerangka kerja asli </th><th>Ukuran Batch untuk SageMaker Training Compiler </th></tr>
</thead>
<tbody>
  <tr><td>albert-base-v2</td><td>wikitext-2-raw-v1</td><td>g4dn.16xlarge</td><td>mengapung 16</td><td>128</td><td>112</td></tr>
  <tr><td>albert-base-v2</td><td>wikitext-2-raw-v1</td><td>p3.2xlarge</td><td>mengapung 16</td><td>128</td><td>128</td></tr>
  <tr><td>albert-base-v2</td><td>wikitext-2-raw-v1</td><td>p3.8xlarge</td><td>mengapung 16</td><td>128</td><td>135</td></tr>
  <tr><td>albert-base-v2</td><td>wikitext-2-raw-v1</td><td>g5.4xlarge</td><td>mengapung 16</td><td>128</td><td>191</td></tr>
  <tr><td>bert-base-uncased</td><td>wikitext-2-raw-v1</td><td>g4dn.16xlarge</td><td>mengapung 16</td><td>64</td><td>94</td></tr>
  <tr><td>bert-base-uncased</td><td>wikitext-2-raw-v1</td><td>p3.2xlarge</td><td>mengapung 16</td><td>96</td><td>101</td></tr>
  <tr><td>bert-base-uncased</td><td>wikitext-2-raw-v1</td><td>p3.8xlarge</td><td>mengapung 16</td><td>96</td><td>96</td></tr>
  <tr><td>bert-base-uncased</td><td>wikitext-2-raw-v1</td><td>g5.4xlarge</td><td>mengapung 16</td><td>128</td><td>128</td></tr>
  <tr><td>bert-large-uncased</td><td>wikitext-2-raw-v1</td><td>g4dn.16xlarge</td><td>mengapung 16</td><td>35</td><td>21</td></tr>
  <tr><td>bert-large-uncased</td><td>wikitext-2-raw-v1</td><td>p3.2xlarge</td><td>mengapung 16</td><td>39</td><td>26</td></tr>
  <tr><td>bert-large-uncased</td><td>wikitext-2-raw-v1</td><td>g5.4xlarge</td><td>mengapung 16</td><td>60</td><td>50</td></tr>
  <tr><td>dasar camembert</td><td>wikitext-2-raw-v1</td><td>g4dn.16xlarge</td><td>mengapung 16</td><td>96</td><td>90</td></tr>
  <tr><td>dasar camembert</td><td>wikitext-2-raw-v1</td><td>p3.2xlarge</td><td>mengapung 16</td><td>96</td><td>98</td></tr>
  <tr><td>dasar camembert</td><td>wikitext-2-raw-v1</td><td>p3.8xlarge</td><td>mengapung 16</td><td>96</td><td>96</td></tr>
  <tr><td>dasar camembert</td><td>wikitext-2-raw-v1</td><td>g5.4xlarge</td><td>mengapung 16</td><td>128</td><td>128</td></tr>
  <tr><td>distilbert-base-uncased</td><td>wikitext-2-raw-v1</td><td>g4dn.16xlarge</td><td>mengapung 16</td><td>256</td><td>160</td></tr>
  <tr><td>distilbert-base-uncased</td><td>wikitext-2-raw-v1</td><td>p3.2xlarge</td><td>mengapung 16</td><td>128</td><td>176</td></tr>
  <tr><td>distilbert-base-uncased</td><td>wikitext-2-raw-v1</td><td>p3.8xlarge</td><td>mengapung 16</td><td>128</td><td>160</td></tr>
  <tr><td>distilbert-base-uncased</td><td>wikitext-2-raw-v1</td><td>g5.4xlarge</td><td>mengapung 16</td><td>256</td><td>258</td></tr>
  <tr><td>google\_ electra-small-discriminator</td><td>wikitext-2-raw-v1</td><td>g4dn.16xlarge</td><td>mengapung 16</td><td>256</td><td>216</td></tr>
  <tr><td>google\_ electra-small-discriminator</td><td>wikitext-2-raw-v1</td><td>p3.2xlarge</td><td>mengapung 16</td><td>256</td><td>230</td></tr>
  <tr><td>google\_ electra-small-discriminator</td><td>wikitext-2-raw-v1</td><td>p3.8xlarge</td><td>mengapung 16</td><td>256</td><td>224</td></tr>
  <tr><td>google\_ electra-small-discriminator</td><td>wikitext-2-raw-v1</td><td>g5.4xlarge</td><td>mengapung 16</td><td>256</td><td>320</td></tr>
  <tr><td>gpt2</td><td>wikitext-2-mentah-v1</td><td>g4dn.16xlarge</td><td>mengapung 16</td><td>80</td><td>64</td></tr>
  <tr><td>gpt2</td><td>wikitext-2-mentah-v1</td><td>p3.2xlarge</td><td>mengapung 16</td><td>80</td><td>77</td></tr>
  <tr><td>gpt2</td><td>wikitext-2-mentah-v1</td><td>p3.8xlarge</td><td>mengapung 16</td><td>80</td><td>72</td></tr>
  <tr><td>gpt2</td><td>wikitext-2-mentah-v1</td><td>g5.4xlarge</td><td>mengapung 16</td><td>128</td><td>120</td></tr>
  <tr><td>jplu\_ tf-xlm-roberta-base</td><td>wikitext-2-mentah-v1</td><td>g4dn.16xlarge</td><td>mengapung 16</td><td>28</td><td>24</td></tr>
  <tr><td>jplu\_ tf-xlm-roberta-base</td><td>wikitext-2-mentah-v1</td><td>p3.2xlarge</td><td>mengapung 16</td><td>32</td><td>24</td></tr>
  <tr><td>jplu\_ tf-xlm-roberta-base</td><td>wikitext-2-mentah-v1</td><td>p3.8xlarge</td><td>mengapung 16</td><td>32</td><td>26</td></tr>
  <tr><td>jplu\_ tf-xlm-roberta-base</td><td>wikitext-2-mentah-v1</td><td>g5.4xlarge</td><td>mengapung 16</td><td>66</td><td>52</td></tr>
  <tr><td>microsoft\_mpnet-basis</td><td>wikitext-2-mentah-v1</td><td>g4dn.16xlarge</td><td>mengapung 16</td><td>96</td><td>92</td></tr>
  <tr><td>microsoft\_mpnet-basis</td><td>wikitext-2-mentah-v1</td><td>p3.2xlarge</td><td>mengapung16</td><td>96</td><td>101</td></tr>
  <tr><td>microsoft\_mpnet-basis</td><td>wikitext-2-mentah-v1</td><td>p3.8xlarge</td><td>mengapung16</td><td>96</td><td>101</td></tr>
  <tr><td>microsoft\_mpnet-basis</td><td>wikitext-2-mentah-v1</td><td>g5.4xlarge</td><td>mengapung16</td><td>128</td><td>152</td></tr>
  <tr><td>roberta-basis</td><td>wikitext-2-mentah-v1</td><td>g4dn.16xlarge</td><td>mengapung16</td><td>64</td><td>72</td></tr>
  <tr><td>roberta-basis</td><td>wikitext-2-mentah-v1</td><td>p3.2xlarge</td><td>mengapung16</td><td>64</td><td>84</td></tr>
  <tr><td>roberta-basis</td><td>wikitext-2-mentah-v1</td><td>p3.8xlarge</td><td>mengapung16</td><td>64</td><td>86</td></tr>
  <tr><td>roberta-basis</td><td>wikitext-2-mentah-v1</td><td>g5.4xlarge</td><td>mengapung16</td><td>128</td><td>128</td></tr>
</tbody>
</table>


### TensorFlow 2.9.1
<a name="training-compiler-tested-models-tf291"></a>

Diuji menggunakan [TensorFlowModel Garden](https://github.com/tensorflow/models) dengan Automatic Mixed Precision (AMP).


<table>
<thead>
  <tr><th colspan="5">Node tunggal GPU/Multi-GPU</th></tr>
  <tr><th>Model</th><th>Set data</th><th>Tipe instans</th><th>Ukuran Batch untuk kerangka kerja asli </th><th>Ukuran Batch untuk SageMaker Training Compiler </th></tr>
</thead>
<tbody>
  <tr><td>ResNet50</td><td>ImageNet</td><td>ml.g4dn.2xbesar</td><td>192</td><td>256\*</td></tr>
  <tr><td rowspan="3">ResNet101</td><td rowspan="3">ImageNet</td><td>ml.g4dn.2xbesar</td><td>128</td><td>160</td></tr>
  <tr><td>ml.g5.2xbesar</td><td>224</td><td>256\*</td></tr>
  <tr><td>ml.p3.16xlarge</td><td>1536</td><td>1792</td></tr>
  <tr><td rowspan="3">ResNet152</td><td rowspan="3">ImageNet</td><td>ml.g5.2xbesar</td><td>192</td><td>224</td></tr>
  <tr><td>ml.p3.2xlarge</td><td>160</td><td>160</td></tr>
  <tr><td>ml.p3.16xlarge</td><td>1024</td><td>1280</td></tr>
  <tr><td rowspan="4">VisionTransformer</td><td rowspan="4">ImageNet</td><td>ml.g4dn.2xbesar</td><td>80</td><td>128\*</td></tr>
  <tr><td>ml.g5.2xbesar</td><td>112</td><td>128\*</td></tr>
  <tr><td>ml.p3.2xlarge</td><td>56</td><td>128\*</td></tr>
  <tr><td>ml.p3.16xlarge</td><td>640</td><td>1024\*</td></tr>
  <tr><td rowspan="4">DetectionTransformer- ResNet 50</td><td rowspan="4">COCO-2017</td><td>ml.g4dn.2xbesar</td><td>2</td><td>2</td></tr>
  <tr><td>ml.g5.2xbesar</td><td>3</td><td>6</td></tr>
  <tr><td>ml.p3.2xlarge</td><td>2</td><td>4</td></tr>
  <tr><td>ml.p3.16xlarge</td><td>8</td><td>32</td></tr>
  <tr><td rowspan="3">MaskRCNN- 50-FPN ResNet</td><td rowspan="3">COCO-2017</td><td>ml.g4dn.2xbesar</td><td>4</td><td>4</td></tr>
  <tr><td>ml.g5.2xbesar</td><td>6</td><td>8</td></tr>
  <tr><td>ml.p3.2xlarge</td><td>4</td><td>6</td></tr>
</tbody>
</table>


\* Ukuran batch yang ditandai dengan simbol tanda bintang (\*) menunjukkan ukuran batch terbesar yang diuji oleh tim pengembang SageMaker Training Compiler. Untuk sel yang ditandai, instance mungkin dapat memuat ukuran batch yang lebih besar dari yang ditunjukkan.

### Transformers 4.21.1 dengan 1.11.0 PyTorch
<a name="training-compiler-tested-models-hf421-pt111"></a>

Diuji dengan `Sequence_Len=512` dan Presisi Campuran Otomatis (AMP).


<table>
<thead>
  <tr><th colspan="6">GPU Tunggal simpul tunggal</th></tr>
  <tr><th>Model </th><th>Set data</th><th>Tipe instans</th><th>Jumlah instans</th><th>Ukuran Batch untuk kerangka kerja asli</th><th>Ukuran Batch untuk Training Compiler</th></tr>
</thead>
<tbody>
  <tr><td rowspan="3">albert-base-v2</td><td rowspan="3">wikitext-2</td><td>ml.g4dn.2xbesar</td><td>1</td><td>14</td><td>28</td></tr>
  <tr><td>ml.g5.2xbesar</td><td>1</td><td>18</td><td>40</td></tr>
  <tr><td>ml.p3.2xlarge</td><td>1</td><td>14</td><td>32</td></tr>
  <tr><td rowspan="3">bert-base-cased</td><td rowspan="3">wikitext-2</td><td>ml.g4dn.2xbesar</td><td>1</td><td>12</td><td>24</td></tr>
  <tr><td>ml.g5.2xbesar</td><td>1</td><td>28</td><td>44</td></tr>
  <tr><td>ml.p3.2xlarge</td><td>1</td><td>16</td><td>20</td></tr>
  <tr><td rowspan="3">dasar camembert</td><td rowspan="3">wikitext-2</td><td>ml.g4dn.2xbesar</td><td>1</td><td>16</td><td>28</td></tr>
  <tr><td>ml.g5.2xbesar</td><td>1</td><td>24</td><td>40</td></tr>
  <tr><td>ml.p3.2xlarge</td><td>1</td><td>16</td><td>24</td></tr>
  <tr><td rowspan="4">distilbert-base-uncased</td><td rowspan="3">wikitext-2</td><td>ml.g4dn.2xbesar</td><td>1</td><td>28</td><td>52</td></tr>
  <tr><td>ml.g5.2xbesar</td><td>1</td><td>40</td><td>76</td></tr>
  <tr><td>ml.p3.2xlarge</td><td>1</td><td>32</td><td>48</td></tr>
  <tr><td>wikitext-103-v1</td><td>ml.p4d.24xlarge</td><td>4</td><td>82</td><td>160</td></tr>
  <tr><td rowspan="3">distilgpt2</td><td rowspan="3">wikitext-2</td><td>ml.g4dn.2xbesar</td><td>1</td><td>6</td><td>18</td></tr>
  <tr><td>ml.g5.2xbesar</td><td>1</td><td>12</td><td>28</td></tr>
  <tr><td>ml.p3.2xlarge</td><td>1</td><td>6</td><td>16</td></tr>
  <tr><td rowspan="3">distilroberta-basis</td><td rowspan="3">wikitext-2</td><td>ml.g4dn.2xbesar</td><td>1</td><td>20</td><td>40</td></tr>
  <tr><td>ml.g5.2xbesar</td><td>1</td><td>28</td><td>56</td></tr>
  <tr><td>ml.p3.2xlarge</td><td>1</td><td>24</td><td>40</td></tr>
  <tr><td rowspan="3">Eleutherai/GPT-neo-125m</td><td rowspan="3">wikitext-2</td><td>ml.g4dn.2xbesar</td><td>1</td><td>4</td><td>8</td></tr>
  <tr><td>ml.g5.2xbesar</td><td>1</td><td>6</td><td>14</td></tr>
  <tr><td>ml.p3.2xlarge</td><td>1</td><td>4</td><td>10</td></tr>
  <tr><td rowspan="4">gpt2</td><td rowspan="3">wikitext-2</td><td>ml.g4dn.2xbesar</td><td>1</td><td>4</td><td>8</td></tr>
  <tr><td>ml.g5.2xbesar</td><td>1</td><td>6</td><td>16</td></tr>
  <tr><td>ml.p3.2xlarge</td><td>1</td><td>4</td><td>10</td></tr>
  <tr><td>wikitext-103-v1</td><td>ml.p4d.24xlarge</td><td>4</td><td>13</td><td>25</td></tr>
  <tr><td rowspan="4">roberta-basis</td><td rowspan="3">wikitext-2</td><td>ml.g4dn.2xbesar</td><td>1</td><td>12</td><td>20</td></tr>
  <tr><td>ml.g5.2xbesar</td><td>1</td><td>24</td><td>36</td></tr>
  <tr><td>ml.p3.2xlarge</td><td>1</td><td>12</td><td>20</td></tr>
  <tr><td>wikitext-103-v1</td><td>ml.p4d.24xlarge</td><td>4</td><td>36</td><td>64</td></tr>
  <tr><td rowspan="3">xlnet-base-cased</td><td rowspan="3">wikitext-2</td><td>ml.g4dn.2xbesar</td><td>1</td><td>2</td><td>6</td></tr>
  <tr><td>ml.g5.2xbesar</td><td>1</td><td>2</td><td>10</td></tr>
  <tr><td>ml.p3.2xlarge</td><td>1</td><td>2</td><td>8</td></tr>
  <tr><td rowspan="4">bert-base-uncased</td><td rowspan="4">wikitext-103-v1</td><td rowspan="4">ml.p4d.24xlarge</td><td>2</td><td>32</td><td>64</td></tr>
  <tr><td>4</td><td>32</td><td>64</td></tr>
  <tr><td>8</td><td>32</td><td>64</td></tr>
  <tr><td>16</td><td>32</td><td>64</td></tr>
  <tr><td>roberta-besar</td><td>wikitext-103-v1</td><td>ml.p4d.24xlarge</td><td>4</td><td>16</td><td>24</td></tr>
  <tr><td>microsoft/deberta-v3-basis</td><td>wikitext-103-v1</td><td>ml.p4d.24xlarge</td><td>16</td><td>9</td><td>23</td></tr>
</tbody>
</table>


### Transformers 4.17.0 dengan 1.10.2 PyTorch
<a name="training-compiler-tested-models-hf417-pt110"></a>

Diuji dengan `Sequence_Len=512` dan Presisi Campuran Otomatis (AMP).


<table>
<thead>
  <tr><th colspan="4">GPU Tunggal simpul tunggal</th></tr>
  <tr><th>Model </th><th>Tipe instans</th><th>Ukuran Batch untuk kerangka kerja asli</th><th>Ukuran Batch untuk Training Compiler</th></tr>
</thead>
<tbody>
  <tr><td rowspan="2">albert-base-v2</td><td>ml.p3.2xlarge</td><td>14</td><td>28</td></tr>
  <tr><td>ml.g4dn.2xbesar</td><td>14</td><td>24</td></tr>
  <tr><td rowspan="2">bert-base-cased</td><td>ml.p3.2xlarge</td><td>16</td><td>24</td></tr>
  <tr><td>ml.g4dn.2xbesar</td><td>12</td><td>24</td></tr>
  <tr><td rowspan="2">bert-base-uncased</td><td>ml.p3.2xlarge</td><td>16</td><td>24</td></tr>
  <tr><td>ml.g4dn.2xbesar</td><td>12</td><td>28</td></tr>
  <tr><td rowspan="2">dasar camembert</td><td>ml.p3.2xlarge</td><td>12</td><td>24</td></tr>
  <tr><td>ml.g4dn.2xbesar</td><td>12</td><td>28</td></tr>
  <tr><td rowspan="2">distilbert-base-uncased</td><td>ml.p3.2xlarge</td><td>28</td><td>48</td></tr>
  <tr><td>ml.g4dn.2xbesar</td><td>24</td><td>52</td></tr>
  <tr><td rowspan="2">distilgpt2</td><td>ml.p3.2xlarge</td><td>6</td><td>12</td></tr>
  <tr><td>ml.g4dn.2xbesar</td><td>6</td><td>14</td></tr>
  <tr><td rowspan="2">distilroberta-basis</td><td>ml.p3.2xlarge</td><td>20</td><td>40</td></tr>
  <tr><td>ml.g4dn.2xbesar</td><td>12</td><td>40</td></tr>
  <tr><td rowspan="2">Eleutherai/GPT-neo-125m</td><td>ml.p3.2xlarge</td><td>2</td><td>10</td></tr>
  <tr><td>ml.g4dn.2xbesar</td><td>2</td><td>8</td></tr>
  <tr><td rowspan="2">facebook/bart-base</td><td>ml.p3.2xlarge</td><td>2</td><td>6</td></tr>
  <tr><td>ml.g4dn.2xbesar</td><td>2</td><td>6</td></tr>
  <tr><td rowspan="2">gpt2</td><td>ml.p3.2xlarge</td><td>4</td><td>8</td></tr>
  <tr><td>ml.g4dn.2xbesar</td><td>2</td><td>8</td></tr>
  <tr><td rowspan="2">roberta-basis</td><td>ml.p3.2xlarge</td><td>12</td><td>20</td></tr>
  <tr><td>ml.g4dn.2xbesar</td><td>12</td><td>20</td></tr>
  <tr><td rowspan="2">xlnet-base-cased</td><td>ml.p3.2xlarge</td><td>2</td><td>8</td></tr>
  <tr><td>ml.g4dn.2xbesar</td><td>4</td><td>6</td></tr>
</tbody>
</table>


### Transformers 4.11.0 dengan 1.9.0 PyTorch
<a name="training-compiler-tested-models-hf411-pt190"></a>

Diuji dengan `Sequence_Len=512` dan Presisi Campuran Otomatis (AMP).


<table>
<thead>
  <tr><th colspan="4">GPU Tunggal simpul tunggal</th></tr>
  <tr><th>Model </th><th>Tipe instans</th><th>Ukuran Batch untuk asli</th><th>Ukuran Batch untuk Training Compiler</th></tr>
</thead>
<tbody>
  <tr><td>albert-base-v2 </td><td>ml.p3.2xlarge</td><td>12</td><td>32</td></tr>
  <tr><td>bert-base-cased </td><td>ml.p3.2xlarge</td><td>14</td><td>24</td></tr>
  <tr><td>bert-base-chinese</td><td>ml.p3.2xlarge</td><td>16</td><td>24</td></tr>
  <tr><td>bert-base-multilingual-cased </td><td>ml.p3.2xlarge</td><td>4</td><td>16</td></tr>
  <tr><td>bert-base-multilingual-uncased </td><td>ml.p3.2xlarge</td><td>8</td><td>16</td></tr>
  <tr><td>bert-base-uncased </td><td>ml.p3.2xlarge</td><td>12</td><td>24</td></tr>
  <tr><td>cl-tohoku/ -kata-masking bert-base-japanese-whole</td><td>ml.p3.2xlarge</td><td>12</td><td>24</td></tr>
  <tr><td>cl-tohoku/ bert-base-japanese </td><td>ml.p3.2xlarge</td><td>12</td><td>24</td></tr>
  <tr><td>distilbert-base-uncased </td><td>ml.p3.2xlarge</td><td>28</td><td>32</td></tr>
  <tr><td>distilbert-base-uncased-finetuned-sst-2-bahasa Inggris</td><td>ml.p3.2xlarge</td><td>28</td><td>32</td></tr>
  <tr><td>distilgpt2 </td><td>ml.p3.2xlarge</td><td>16</td><td>32</td></tr>
  <tr><td>facebook/bart-base </td><td>ml.p3.2xlarge</td><td>4</td><td>8</td></tr>
  <tr><td>gpt2</td><td>ml.p3.2xlarge</td><td>6</td><td>20</td></tr>
  <tr><td>LMv2Nreimer/mini -L6-H384- distilled-from-RoBERTa-Large </td><td>ml.p3.2xlarge</td><td>20</td><td>32</td></tr>
  <tr><td>roberta-basis </td><td>ml.p3.2xlarge</td><td>12</td><td>20</td></tr>
</tbody>
</table>



<table>
<thead>
  <tr><th colspan="4">Multi-GPU simpul tunggal</th></tr>
  <tr><th>Model </th><th>Tipe instans</th><th>Ukuran Batch untuk asli</th><th>Ukuran Batch untuk Training Compiler</th></tr>
</thead>
<tbody>
  <tr><td>bert-base-chinese </td><td>ml.p3.8xlarge</td><td>16</td><td>26</td></tr>
  <tr><td>bert-base-multilingual-cased </td><td>ml.p3.8xlarge</td><td>6</td><td>16</td></tr>
  <tr><td>bert-base-multilingual-uncased</td><td>ml.p3.8xlarge</td><td>6</td><td>16</td></tr>
  <tr><td>bert-base-uncased </td><td>ml.p3.8xlarge</td><td>14</td><td>24</td></tr>
  <tr><td>distilbert-base-uncased </td><td>ml.p3.8xlarge</td><td>14</td><td>32</td></tr>
  <tr><td>distilgpt2</td><td>ml.p3.8xlarge</td><td>6</td><td>32</td></tr>
  <tr><td>facebook/bart-base</td><td>ml.p3.8xlarge</td><td>8</td><td>16</td></tr>
  <tr><td>gpt2 </td><td>ml.p3.8xlarge</td><td>8</td><td>20</td></tr>
  <tr><td>roberta-basis </td><td>ml.p3.8xlarge</td><td>12</td><td>20</td></tr>
</tbody>
</table>


### Transformers 4.17.0 dengan 2.6.3 TensorFlow
<a name="training-compiler-tested-models-hf417-tf263"></a>

Diuji dengan `Sequence_Len=128` dan Presisi Campuran Otomatis (AMP).


| Model  | Tipe instans | Ukuran batch untuk kerangka kerja asli | Ukuran Batch untuk Training Compiler | 
| --- | --- | --- | --- | 
| albert-base-v2 | ml.g4dn.16xlarge | 136 | 208 | 
| albert-base-v2 | ml.g5.4xbesar | 219 | 312 | 
| albert-base-v2 | ml.p3.2xlarge | 152 | 208 | 
| albert-base-v2 | ml.p3.8xlarge | 152 | 192 | 
| bert-base-uncased | ml.g4dn.16xlarge | 120 | 101 | 
| bert-base-uncased | ml.g5.4xbesar | 184 | 160 | 
| bert-base-uncased | ml.p3.2xlarge | 128 | 108 | 
| bert-large-uncased | ml.g4dn.16xlarge | 37 | 28 | 
| bert-large-uncased | ml.g5.4xbesar | 64 | 55 | 
| bert-large-uncased | ml.p3.2xlarge | 40 | 32 | 
| dasar camembert | ml.g4dn.16xlarge | 96 | 100 | 
| dasar camembert | ml.g5.4xbesar | 190 | 160 | 
| dasar camembert | ml.p3.2xlarge | 129 | 108 | 
| dasar camembert | ml.p3.8xlarge | 128 | 104 | 
| distilbert-base-uncased | ml.g4dn.16xlarge | 210 | 160 | 
| distilbert-base-uncased | ml.g5.4xbesar | 327 | 288 | 
| distilbert-base-uncased | ml.p3.2xlarge | 224 | 196 | 
| distilbert-base-uncased | ml.p3.8xlarge | 192 | 182 | 
| google\_ electra-small-discriminator | ml.g4dn.16xlarge | 336 | 288 | 
| google\_ electra-small-discriminator | ml.g5.4xbesar | 504 | 384 | 
| google\_ electra-small-discriminator | ml.p3.2xlarge | 352 | 323 | 
| gpt2 | ml.g4dn.16xlarge | 89 | 64 | 
| gpt2 | ml.g5.4xbesar | 140 | 146 | 
| gpt2 | ml.p3.2xlarge | 94 | 96 | 
| gpt2 | ml.p3.8xlarge | 96 | 88 | 
| jplu\_ tf-xlm-roberta-base | ml.g4dn.16xlarge | 52 | 16 | 
| jplu\_ tf-xlm-roberta-base | ml.g5.4xbesar | 64 | 44 | 
| microsoft\_mpnet-basis | ml.g4dn.16xlarge | 120 | 100 | 
| microsoft\_mpnet-basis | ml.g5.4xbesar | 192 | 160 | 
| microsoft\_mpnet-basis | ml.p3.2xlarge | 128 | 104 | 
| microsoft\_mpnet-basis | ml.p3.8xlarge | 130 | 92 | 
| roberta-basis | ml.g4dn.16xlarge | 108 | 64 | 
| roberta-basis | ml.g5.4xbesar | 176 | 142 | 
| roberta-basis | ml.p3.2xlarge | 118 | 100 | 
| roberta-basis | ml.p3.8xlarge | 112 | 88 | 

### Transformers 4.11.0 dengan 2.5.1 TensorFlow
<a name="training-compiler-tested-models-hf411-tf251"></a>

Diuji dengan `Sequence_Len=128` dan Presisi Campuran Otomatis (AMP).


<table>
<thead>
  <tr><th colspan="4">GPU Tunggal simpul tunggal</th></tr>
  <tr><th>Model </th><th>Tipe instans</th><th>Ukuran Batch untuk asli</th><th>Ukuran Batch untuk Training Compiler</th></tr>
</thead>
<tbody>
  <tr><td>albert-base-v2 </td><td>ml.p3.2xlarge</td><td>128</td><td>128</td></tr>
  <tr><td>bart-dasar </td><td>ml.p3.2xlarge</td><td>12</td><td>64</td></tr>
  <tr><td>bart-besar </td><td>ml.p3.2xlarge</td><td>4</td><td>28</td></tr>
  <tr><td>bert-base-cased </td><td>ml.p3.2xlarge</td><td>16</td><td>128</td></tr>
  <tr><td>bert-base-chinese</td><td>ml.p3.2xlarge</td><td>16</td><td>128</td></tr>
  <tr><td>bert-base-multilingual-cased </td><td>ml.p3.2xlarge</td><td>12</td><td>64</td></tr>
  <tr><td>bert-base-multilingual-uncased </td><td>ml.p3.2xlarge</td><td>16</td><td>96</td></tr>
  <tr><td>bert-base-uncased</td><td>ml.p3.2xlarge</td><td>16</td><td>96</td></tr>
  <tr><td>bert-large-uncased </td><td>ml.p3.2xlarge</td><td>4</td><td>24</td></tr>
  <tr><td>cl-tohoku/ bert-base-japanese </td><td>ml.p3.2xlarge</td><td>16</td><td>128</td></tr>
  <tr><td>cl-tohoku/ -kata-masking bert-base-japanese-whole </td><td>ml.p3.2xlarge</td><td>16</td><td>128</td></tr>
  <tr><td>distilbert-base-sst2 </td><td>ml.p3.2xlarge</td><td>32</td><td>128</td></tr>
  <tr><td>distilbert-base-uncased </td><td>ml.p3.2xlarge</td><td>32</td><td>128</td></tr>
  <tr><td>distilgpt2</td><td>ml.p3.2xlarge</td><td>32</td><td>128</td></tr>
  <tr><td>gpt2 </td><td>ml.p3.2xlarge</td><td>12</td><td>64</td></tr>
  <tr><td>gpt2-besar </td><td>ml.p3.2xlarge</td><td>2</td><td>24</td></tr>
  <tr><td>jplu/ tf-xlm-roberta-base </td><td>ml.p3.2xlarge</td><td>12</td><td>32</td></tr>
  <tr><td>roberta-basis </td><td>ml.p3.2xlarge</td><td>4</td><td>64</td></tr>
  <tr><td>roberta-besar </td><td>ml.p3.2xlarge</td><td>4</td><td>64</td></tr>
  <tr><td>t5-dasar </td><td>ml.p3.2xlarge</td><td>64</td><td>64</td></tr>
  <tr><td>t5-kecil </td><td>ml.p3.2xlarge</td><td>128</td><td>128</td></tr>
</tbody>
</table>
