기계 번역으로 제공되는 번역입니다. 제공된 번역과 원본 영어의 내용이 상충하는 경우에는 영어 버전이 우선합니다.

# 클라우드 인스턴스
<a name="neo-cloud-instances"></a>

Amazon SageMaker Neo는 TensorFlow, PyTorch, MXNet 등 인기 있는 기계 학습 프레임워크를 위한 컴파일을 지원합니다. 컴파일된 모델은 클라우드 인스턴스 및 AWS Inferentia 인스턴스에 배포할 수 있습니다. 지원되는 프레임워크의 전체 목록은 [지원되는 인스턴스 유형 및 프레임워크](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-supported-cloud.html)를 참조하세요.

3가지 방법, 즉 AWS CLI, SageMaker AI 콘솔, SageMaker AI SDK for Python 중 하나로 모델을 컴파일할 수 있습니다. 자세한 내용은 [Neo를 이용한 모델 컴파일](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-job-compilation.html)을 참조하세요. 컴파일된 모델 아티팩트는 컴파일 작업 중에 지정한 Amazon S3 버킷 URI에 저장됩니다. SageMaker AI SDK for Python, AWS SDK for Python (Boto3), AWS CLI 또는 AWS 콘솔을 사용하여 컴파일된 모델을 클라우드 인스턴스 및 AWS Inferentia 인스턴스에 배포할 수 있습니다.

AWS CLI, 콘솔 또는 Boto3를 사용하여 모델을 배포할 경우에는 도커 이미지 Amazon ECR URI를 기본 컨테이너로 선택해야 합니다. Amazon ECR URI 목록은 [Neo 추론 컨테이너 이미지](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-deployment-hosting-services-container-images.html)를 참조하세요.

**Topics**
+ [지원되는 인스턴스 유형 및 프레임워크](neo-supported-cloud.md)
+ [모델 배포](neo-deployment-hosting-services.md)
+ [배포된 서비스를 사용한 추론 요청](neo-requests.md)
+ [추론 컨테이너 이미지](neo-deployment-hosting-services-container-images.md)

# 지원되는 인스턴스 유형 및 프레임워크
<a name="neo-supported-cloud"></a>

Amazon SageMaker Neo는 컴파일 및 배포 모두에 널리 사용되는 딥 러닝 프레임워크를 지원합니다. 모델은 클라우드 인스턴스 및 AWS Inferentia 인스턴스 유형에 배포할 수 있습니다.

다음은 SageMaker Neo가 지원하는 프레임워크와 사용자가 컴파일하고 배포할 수 있는 대상 클라우드 인스턴스에 대해 설명합니다. 컴파일된 모델을 클라우드 또는 Inferentia 인스턴스에 배포하는 방법에 대한 자세한 내용은 [클라우드 인스턴스로 모델 배포](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-deployment-hosting-services.html)를 참고하세요.

## 클라우드 인스턴스
<a name="neo-supported-cloud-instances"></a>

SageMaker Neo는 CPU 및 GPU 클라우드 인스턴스에 대해 다음과 같은 딥 러닝 프레임워크를 지원합니다.


| 프레임워크 | 프레임워크 버전 | 모델 버전 |  모델 | 모델 형식(\$1.tar.gz로 패키징됨) | 툴킷 | 
| --- | --- | --- | --- | --- | --- | 
| MXNet | 1.8.0 | 1.8.0 이하 지원 | 이미지 분류, 객체 감지, 의미적 분할, 포즈 추정, 활동 인식 | 기호 파일(.json) 한 개 및 파라미터 파일(.params) 한 개 | GluonCV v0.8.0 | 
| ONNX | 1.7.0 | 1.7.0 이하 지원 | 이미지 분류, SVM | 모델 파일(.onnx) 한 개 |  | 
| Keras | 2.2.4 | 2.2.4 이하 지원 | 이미지 분류 | 모델 정의 파일(.h5) 한 개 |  | 
| PyTorch | 1.4, 1.5, 1.6, 1.7, 1.8, 1.12, 1.13, 2.0 | 1.4, 1.5, 1.6, 1.7, 1.8, 1.12, 1.13, 2.0 지원 |  이미지 분류 버전 1.13 및 2.0은 객체 감지, 비전 트랜스포머, HuggingFace를 지원합니다.  | 입력 dtype이 float32인 모델 정의 파일(.pt 또는.pth) 한 개 |  | 
| TensorFlow | 1.15.3 또는 2.9 | 1.15.3 및 2.9를 지원합니다. | 이미지 분류 | 저장된 모델의 경우, .pb 또는 .pbtxt 파일 하나와 변수가 포함되어 있는 변수 디렉터리 동결 모델의 경우, .pb 또는 .pbtxt 파일 하나만 |  | 
| XGBoost | 1.3.3 | 1.3.3 이하 지원 | 의사결정 트리 | 트리의 노드 수가 2^31개 미만인 XGBoost 모델 파일(.model) 한 개 |  | 

**참고**  
“모델 버전”은 모델을 훈련하고 내보내는 데 사용되는 프레임워크 버전입니다.

## 인스턴스 유형
<a name="neo-supported-cloud-instances-types"></a>

 SageMaker AI가 컴파일한 모델을 아래 나열된 클라우드 인스턴스 중 하나에 배포할 수 있습니다.


| Instance | 컴퓨팅 유형 | 
| --- | --- | 
| `ml_c4` | 표준 | 
| `ml_c5` | 표준 | 
| `ml_m4` | 표준 | 
| `ml_m5` | 표준 | 
| `ml_p2` | 액셀러레이티드 컴퓨팅 | 
| `ml_p3` | 액셀러레이티드 컴퓨팅 | 
| `ml_g4dn` | 액셀러레이티드 컴퓨팅 | 

 각 인스턴스 유형별로 사용 가능한 vCPU, 메모리 및 시간당 요금에 대한 자세한 내용은 [Amazon SageMaker 요금](https://aws.amazon.com/sagemaker/pricing/)을 참고하세요.

**참고**  
PyTorch 프레임워크를 사용하여 `ml_*` 인스턴스를 컴파일할 때 **출력 구성**의 **컴파일러 옵션** 필드를 사용하여 모델 입력의 올바른 데이터 유형(`dtype`)을 제공하세요.  
기본값은 `"float32"`로 설정됩니다.

## AWS Inferentia
<a name="neo-supported-inferentia"></a>

 SageMaker Neo는 Inf1에 대해 다음과 같은 딥 러닝 프레임워크를 지원합니다.


| 프레임워크 | 프레임워크 버전 | 모델 버전 |  모델 | 모델 형식(\$1.tar.gz로 패키징됨) | 툴킷 | 
| --- | --- | --- | --- | --- | --- | 
| MXNet | 1.5 또는 1.8  | 1.8, 1.5 및 이전 버전을 지원합니다. | 이미지 분류, 객체 감지, 의미적 분할, 포즈 추정, 활동 인식 | 기호 파일(.json) 한 개 및 파라미터 파일(.params) 한 개 | GluonCV v0.8.0 | 
| PyTorch | 1.7, 1.8 또는 1.9 | 1.9 이하 지원 | 이미지 분류 | 입력 dtype이 float32인 모델 정의 파일(.pt 또는.pth) 한 개 |  | 
| TensorFlow | 1.15 또는 2.5 | 2.5, 1.15 및 이전 버전 지원 | 이미지 분류 | 저장된 모델의 경우, .pb 또는 .pbtxt 파일 하나와 변수가 포함되어 있는 변수 디렉터리 동결 모델의 경우, .pb 또는 .pbtxt 파일 하나만 |  | 

**참고**  
“모델 버전”은 모델을 훈련하고 내보내는 데 사용되는 프레임워크 버전입니다.

SageMaker Neo가 컴파일한 모델을 AWS 추론 기반 Amazon EC2 Inf1 인스턴스에 배포할 수 있습니다. AWS Inferentia는 딥 러닝을 가속화하도록 설계된 아마존 최초의 맞춤형 실리콘 칩입니다. 현재 `ml_inf1` 인스턴스를 사용하여 컴파일된 모델을 배포할 수 있습니다.

### AWS Inferentia2 및 AWS Trainium
<a name="neo-supported-inferentia-trainium"></a>

현재 SageMaker Neo가 컴파일한 모델을 AWS Inferentia2 기반 Amazon EC2 Inf2 인스턴스(미국 동부(오하이오) 리전) 및 AWS Trainium 기반 Amazon EC2 Trn1 인스턴스(미국 동부(버지니아 북부) 리전)에 배포할 수 있습니다. 이러한 인스턴스에서 지원되는 모델에 대한 자세한 내용은 AWS Neuron 설명서의 [모델 아키텍처 적합 지침](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/arch/model-architecture-fit.html) 및 [Neuron Github 리포지토리](https://github.com/aws-neuron/aws-neuron-sagemaker-samples)의 예제를 참고하세요.

# 모델 배포
<a name="neo-deployment-hosting-services"></a>

HTTPS 엔드포인트에 Amazon SageMaker Neo 컴파일 모델을 배포하려면 Amazon SageMaker AI 호스팅 서비스를 사용하여 모델에 맞는 엔드포인트를 구성 및 만들어야 합니다. 현재, 개발자는 Amazon SageMaker API를 사용하여 ml.c5, ml.c4, ml.m5, ml.m4, ml.p3, ml.p2, ml.inf1 인스턴스에 모듈을 배포할 수 있습니다.

[Inferentia](https://aws.amazon.com/machine-learning/inferentia/) 및 [Trainium](https://aws.amazon.com/machine-learning/trainium/) 인스턴스의 경우 모델을 특별히 그러한 인스턴스용으로 컴파일해야 합니다. 다른 인스턴스 유형용으로 컴파일된 모델은 Inferentia 또는 Trainium 인스턴스에서 작동하도록 보장되지 않습니다.

컴파일된 모델을 배포하는 경우 컴파일에 사용한 대상에 대해 동일한 인스턴스를 사용해야 합니다. 그러면 추론을 수행하는 데 사용할 수 있는 SageMaker AI 엔드포인트가 만들어집니다. [Amazon SageMaker AI SDK for Python](https://sagemaker.readthedocs.io/en/stable/), [SDK for Python(Boto3)](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html), [AWS Command Line Interface](https://docs.aws.amazon.com/cli/latest/reference/) 및 [SageMaker 콘솔](https://console.aws.amazon.com/sagemaker) 중 하나를 사용하여 NEO에서 컴파일된 모델을 배포할 수 있습니다.

**참고**  
 AWS CLI콘솔 또는 Boto3를 사용하여 모델을 배포하려면 [Neo 추론 컨테이너 이미지를](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-deployment-hosting-services-container-images.html) 참조하여 기본 컨테이너의 추론 이미지 URI를 선택합니다.

**Topics**
+ [사전 조건](neo-deployment-hosting-services-prerequisites.md)
+ [SageMaker SDK를 사용하여 컴파일된 모델 배포](neo-deployment-hosting-services-sdk.md)
+ [Boto3를 사용하여 컴파일된 모델 배포](neo-deployment-hosting-services-boto3.md)
+ [를 사용하여 컴파일된 모델 배포 AWS CLI](neo-deployment-hosting-services-cli.md)
+ [콘솔을 사용하여 컴파일된 모델 배포](neo-deployment-hosting-services-console.md)

# 사전 조건
<a name="neo-deployment-hosting-services-prerequisites"></a>

**참고**  
 AWS SDK for Python (Boto3) AWS CLI, 또는 SageMaker AI 콘솔을 사용하여 모델을 컴파일한 경우이 섹션의 지침을 따르세요.

SageMaker NEO 컴파일 모델을 생성하려면 다음이 필요합니다.

1. 도커 이미지 Amazon ECR URI. [이 목록](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-deployment-hosting-services-container-images.html)에서 필요에 맞는 것을 선택할 수 있습니다.

1. 진입점 스크립트 파일:

   1. **PyTorch 및 MXNet 모델의 경우:**

      *SageMaker AI를 사용하여 모델을 훈련시킨 경우*, 훈련 스크립트에는 아래에 설명된 함수가 구현되어야 합니다. 훈련 스크립트는 추론 중에 진입점 스크립트 역할을 합니다. [MXNet 모듈 및 SageMaker Neo를 사용한 MNIST 훈련, 컴파일 및 배포](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker_neo_compilation_jobs/mxnet_mnist/mxnet_mnist_neo.html)에 자세히 설명된 예제에서는 훈련 스크립트(`mnist.py`)가 필수 함수를 구현합니다.

      *SageMaker AI를 사용하여 모델을 훈련시키지 않은 경우*, 추론 시 사용할 수 있는 진입점 스크립트(`inference.py`) 파일을 제공해야 합니다. 프레임워크(MXNet 또는 Pytorch)에 따라 추론 스크립트 위치는 SageMaker Python SDK [MXnet용 모델 디렉터리 구조](https://sagemaker.readthedocs.io/en/stable/frameworks/mxnet/using_mxnet.html#model-directory-structure) 또는 [PyTorch용 모델 디렉터리 구조](https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html#model-directory-structure)를 준수해야 합니다.

      CPU 및 GPU 인스턴스 유형에서 **PyTorch** 및 **MXNet**과 함께 Neo 추론 최적화 컨테이너 이미지를 사용하는 경우, 추론 스크립트는 다음 함수를 구현해야 합니다.
      + `model_fn`: 모델을 로드합니다. (선택 사항)
      + `input_fn`: 수신 요청 페이로드를 numpy 배열로 변환합니다.
      + `predict_fn`: 예측을 수행합니다.
      + `output_fn`: 예측 출력을 응답 페이로드로 변환합니다.
      + 또는, `input_fn`, `predict_fn` 및 `output_fn`을 조합하여 `transform_fn`을 정의할 수도 있습니다.

      다음은 **PyTorch와 MXNet(Gluon 및 모듈)**의 경우 `code`(`code/inference.py`)라는 디렉터리 내의 `inference.py` 스크립트 예제입니다. 이 예제는 먼저 모델을 로드한 다음 GPU의 이미지 데이터에 사용합니다.

------
#### [ MXNet Module ]

      ```
      import numpy as np
      import json
      import mxnet as mx
      import neomx  # noqa: F401
      from collections import namedtuple
      
      Batch = namedtuple('Batch', ['data'])
      
      # Change the context to mx.cpu() if deploying to a CPU endpoint
      ctx = mx.gpu()
      
      def model_fn(model_dir):
          # The compiled model artifacts are saved with the prefix 'compiled'
          sym, arg_params, aux_params = mx.model.load_checkpoint('compiled', 0)
          mod = mx.mod.Module(symbol=sym, context=ctx, label_names=None)
          exe = mod.bind(for_training=False,
                         data_shapes=[('data', (1,3,224,224))],
                         label_shapes=mod._label_shapes)
          mod.set_params(arg_params, aux_params, allow_missing=True)
          
          # Run warm-up inference on empty data during model load (required for GPU)
          data = mx.nd.empty((1,3,224,224), ctx=ctx)
          mod.forward(Batch([data]))
          return mod
      
      
      def transform_fn(mod, image, input_content_type, output_content_type):
          # pre-processing
          decoded = mx.image.imdecode(image)
          resized = mx.image.resize_short(decoded, 224)
          cropped, crop_info = mx.image.center_crop(resized, (224, 224))
          normalized = mx.image.color_normalize(cropped.astype(np.float32) / 255,
                                        mean=mx.nd.array([0.485, 0.456, 0.406]),
                                        std=mx.nd.array([0.229, 0.224, 0.225]))
          transposed = normalized.transpose((2, 0, 1))
          batchified = transposed.expand_dims(axis=0)
          casted = batchified.astype(dtype='float32')
          processed_input = casted.as_in_context(ctx)
      
          # prediction/inference
          mod.forward(Batch([processed_input]))
      
          # post-processing
          prob = mod.get_outputs()[0].asnumpy().tolist()
          prob_json = json.dumps(prob)
          return prob_json, output_content_type
      ```

------
#### [ MXNet Gluon ]

      ```
      import numpy as np
      import json
      import mxnet as mx
      import neomx  # noqa: F401
      
      # Change the context to mx.cpu() if deploying to a CPU endpoint
      ctx = mx.gpu()
      
      def model_fn(model_dir):
          # The compiled model artifacts are saved with the prefix 'compiled'
          block = mx.gluon.nn.SymbolBlock.imports('compiled-symbol.json',['data'],'compiled-0000.params', ctx=ctx)
          
          # Hybridize the model & pass required options for Neo: static_alloc=True & static_shape=True
          block.hybridize(static_alloc=True, static_shape=True)
          
          # Run warm-up inference on empty data during model load (required for GPU)
          data = mx.nd.empty((1,3,224,224), ctx=ctx)
          warm_up = block(data)
          return block
      
      
      def input_fn(image, input_content_type):
          # pre-processing
          decoded = mx.image.imdecode(image)
          resized = mx.image.resize_short(decoded, 224)
          cropped, crop_info = mx.image.center_crop(resized, (224, 224))
          normalized = mx.image.color_normalize(cropped.astype(np.float32) / 255,
                                        mean=mx.nd.array([0.485, 0.456, 0.406]),
                                        std=mx.nd.array([0.229, 0.224, 0.225]))
          transposed = normalized.transpose((2, 0, 1))
          batchified = transposed.expand_dims(axis=0)
          casted = batchified.astype(dtype='float32')
          processed_input = casted.as_in_context(ctx)
          return processed_input
      
      
      def predict_fn(processed_input_data, block):
          # prediction/inference
          prediction = block(processed_input_data)
          return prediction
      
      def output_fn(prediction, output_content_type):
          # post-processing
          prob = prediction.asnumpy().tolist()
          prob_json = json.dumps(prob)
          return prob_json, output_content_type
      ```

------
#### [ PyTorch 1.4 and Older ]

      ```
      import os
      import torch
      import torch.nn.parallel
      import torch.optim
      import torch.utils.data
      import torch.utils.data.distributed
      import torchvision.transforms as transforms
      from PIL import Image
      import io
      import json
      import pickle
      
      
      def model_fn(model_dir):
          """Load the model and return it.
          Providing this function is optional.
          There is a default model_fn available which will load the model
          compiled using SageMaker Neo. You can override it here.
      
          Keyword arguments:
          model_dir -- the directory path where the model artifacts are present
          """
      
          # The compiled model is saved as "compiled.pt"
          model_path = os.path.join(model_dir, 'compiled.pt')
          with torch.neo.config(model_dir=model_dir, neo_runtime=True):
              model = torch.jit.load(model_path)
              device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
              model = model.to(device)
      
          # We recommend that you run warm-up inference during model load
          sample_input_path = os.path.join(model_dir, 'sample_input.pkl')
          with open(sample_input_path, 'rb') as input_file:
              model_input = pickle.load(input_file)
          if torch.is_tensor(model_input):
              model_input = model_input.to(device)
              model(model_input)
          elif isinstance(model_input, tuple):
              model_input = (inp.to(device) for inp in model_input if torch.is_tensor(inp))
              model(*model_input)
          else:
              print("Only supports a torch tensor or a tuple of torch tensors")
              return model
      
      
      def transform_fn(model, request_body, request_content_type,
                       response_content_type):
          """Run prediction and return the output.
          The function
          1. Pre-processes the input request
          2. Runs prediction
          3. Post-processes the prediction output.
          """
          # preprocess
          decoded = Image.open(io.BytesIO(request_body))
          preprocess = transforms.Compose([
              transforms.Resize(256),
              transforms.CenterCrop(224),
              transforms.ToTensor(),
              transforms.Normalize(
                  mean=[
                      0.485, 0.456, 0.406], std=[
                      0.229, 0.224, 0.225]),
          ])
          normalized = preprocess(decoded)
          batchified = normalized.unsqueeze(0)
          # predict
          device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
          batchified = batchified.to(device)
          output = model.forward(batchified)
      
          return json.dumps(output.cpu().numpy().tolist()), response_content_type
      ```

------
#### [ PyTorch 1.5 and Newer ]

      ```
      import os
      import torch
      import torch.nn.parallel
      import torch.optim
      import torch.utils.data
      import torch.utils.data.distributed
      import torchvision.transforms as transforms
      from PIL import Image
      import io
      import json
      import pickle
      
      
      def model_fn(model_dir):
          """Load the model and return it.
          Providing this function is optional.
          There is a default_model_fn available, which will load the model
          compiled using SageMaker Neo. You can override the default here.
          The model_fn only needs to be defined if your model needs extra
          steps to load, and can otherwise be left undefined.
      
          Keyword arguments:
          model_dir -- the directory path where the model artifacts are present
          """
      
          # The compiled model is saved as "model.pt"
          model_path = os.path.join(model_dir, 'model.pt')
          device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
          model = torch.jit.load(model_path, map_location=device)
          model = model.to(device)
      
          return model
      
      
      def transform_fn(model, request_body, request_content_type,
                          response_content_type):
          """Run prediction and return the output.
          The function
          1. Pre-processes the input request
          2. Runs prediction
          3. Post-processes the prediction output.
          """
          # preprocess
          decoded = Image.open(io.BytesIO(request_body))
          preprocess = transforms.Compose([
                                      transforms.Resize(256),
                                      transforms.CenterCrop(224),
                                      transforms.ToTensor(),
                                      transforms.Normalize(
                                          mean=[
                                              0.485, 0.456, 0.406], std=[
                                              0.229, 0.224, 0.225]),
                                          ])
          normalized = preprocess(decoded)
          batchified = normalized.unsqueeze(0)
          
          # predict
          device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
          batchified = batchified.to(device)
          output = model.forward(batchified)
          return json.dumps(output.cpu().numpy().tolist()), response_content_type
      ```

------

   1.  **inf1 인스턴스 또는 onnx, xgboost, keras 컨테이너 이미지의 경우** 

      다른 모든 Neo 추론에 최적화된 컨테이너 이미지 또는 inferentia 인스턴스 유형의 경우, 진입점 스크립트는 Neo 딥 러닝 런타임을 위한 다음 함수를 구현해야 합니다.
      + `neo_preprocess`: 수신 요청 페이로드를 numpy 배열로 변환합니다.
      + `neo_postprocess`: Neo 딥 러닝 런타임의 예측 출력을 응답 본문으로 변환합니다.
**참고**  
이러한 두 개의 함수는 MXNet, Pytorch 또는 Tensorflow의 기능을 사용하지 않습니다.

      이러한 함수를 사용하는 방법에 대한 예는 [Neo Model 컴파일 샘플 노트북](https://docs.aws.amazon.com//sagemaker/latest/dg/neo.html#neo-sample-notebooks)을 참조하세요.

   1. **TensorFlow 모델의 경우**

      모델에 데이터가 전송되기 전에 사용자 지정 사전 및 사후 처리 로직이 필요한 경우, 추론 시 사용할 수 있는 진입점 스크립트 `inference.py` 파일을 지정해야 합니다. 스크립트는 `input_handler` 및 `output_handler` 함수 쌍이나 단일 핸들러 함수를 구현해야 합니다.
**참고**  
참고로 핸들러 함수가 구현된 경우, `input_handler` 및 `output_handler`는 무시됩니다.

      다음은 컴파일 모델과 함께 사용하여 이미지 분류 모델에서 사용자 지정 사전 및 사후 처리를 수행할 수 있는 `inference.py` 스크립트의 코드 예제입니다. SageMaker AI 클라이언트는 이미지 파일을 `application/x-image` 콘텐츠 유형으로 `input_handler` 함수에 전송하고, 함수에서 JSON으로 변환됩니다. 그런 다음 변환된 이미지 파일은 REST API를 사용하여 [Tensorflow 모델 서버(TFX](https://www.tensorflow.org/tfx/serving/api_rest))로 전송됩니다.

      ```
      import json
      import numpy as np
      import json
      import io
      from PIL import Image
      
      def input_handler(data, context):
          """ Pre-process request input before it is sent to TensorFlow Serving REST API
          
          Args:
          data (obj): the request data, in format of dict or string
          context (Context): an object containing request and configuration details
          
          Returns:
          (dict): a JSON-serializable dict that contains request body and headers
          """
          f = data.read()
          f = io.BytesIO(f)
          image = Image.open(f).convert('RGB')
          batch_size = 1
          image = np.asarray(image.resize((512, 512)))
          image = np.concatenate([image[np.newaxis, :, :]] * batch_size)
          body = json.dumps({"signature_name": "serving_default", "instances": image.tolist()})
          return body
      
      def output_handler(data, context):
          """Post-process TensorFlow Serving output before it is returned to the client.
          
          Args:
          data (obj): the TensorFlow serving response
          context (Context): an object containing request and configuration details
          
          Returns:
          (bytes, string): data to return to client, response content type
          """
          if data.status_code != 200:
              raise ValueError(data.content.decode('utf-8'))
      
          response_content_type = context.accept_header
          prediction = data.content
          return prediction, response_content_type
      ```

      사용자 지정 사전 또는 사후 처리가 없는 경우 SageMaker AI 클라이언트는 SageMaker AI 엔드포인트로 전송하기 전에 유사한 방식으로 파일 이미지를 JSON으로 변환합니다.

      자세한 내용은 [SageMaker Python SDK에서 TensorFlow 제공 엔드포인트에 배포](https://sagemaker.readthedocs.io/en/stable/frameworks/tensorflow/deploying_tensorflow_serving.html#providing-python-scripts-for-pre-pos-processing)를 참조하세요.

1. 컴파일된 모델 아티팩트가 포함된 Amazon S3 버킷 URI입니다.

# SageMaker SDK를 사용하여 컴파일된 모델 배포
<a name="neo-deployment-hosting-services-sdk"></a>

모델이 AWS SDK for Python (Boto3) AWS CLI또는 Amazon SageMaker AI 콘솔을 사용하여 컴파일된 경우 [ 사전 조건](https://docs.aws.amazon.com//sagemaker/latest/dg/neo-deployment-hosting-services-prerequisites) 섹션을 충족해야 합니다. 다음 사용 사례 중 하나를 따라 모델을 컴파일한 방식에 따라 SageMaker Neo로 컴파일된 모델을 배포하세요.

**Topics**
+ [SageMaker SDK를 사용하여 모델을 컴파일한 경우](#neo-deployment-hosting-services-sdk-deploy-sm-sdk)
+ [MXNet 또는 PyTorch를 사용하여 모델을 컴파일한 경우](#neo-deployment-hosting-services-sdk-deploy-sm-boto3)
+ [Boto3, SageMaker 콘솔 또는 TensorFlow용 CLI를 사용하여 모델을 컴파일한 경우](#neo-deployment-hosting-services-sdk-deploy-sm-boto3-tensorflow)

## SageMaker SDK를 사용하여 모델을 컴파일한 경우
<a name="neo-deployment-hosting-services-sdk-deploy-sm-sdk"></a>

컴파일된 모델에 대한 [sagemaker.Model](https://sagemaker.readthedocs.io/en/stable/api/inference/model.html?highlight=sagemaker.Model) 객체 핸들은 추론 요청을 제공하는 엔드포인트를 생성하도록 허용하는 [deploy()](https://sagemaker.readthedocs.io/en/stable/api/inference/model.html?highlight=sagemaker.Model#sagemaker.model.Model.deploy) 함수를 제공합니다. 이 함수를 사용하면 엔드포인트에 사용되는 인스턴스의 수 및 유형을 설정할 수 있습니다. 모델을 컴파일한 인스턴스를 선택해야 합니다. 예를 들어, [모델 컴파일(Amazon SageMaker SDK)](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-job-compilation-sagemaker-sdk.html) 섹션에서 컴파일된 작업에서는 `ml_c5`입니다.

```
predictor = compiled_model.deploy(initial_instance_count = 1, instance_type = 'ml.c5.4xlarge')

# Print the name of newly created endpoint
print(predictor.endpoint_name)
```

## MXNet 또는 PyTorch를 사용하여 모델을 컴파일한 경우
<a name="neo-deployment-hosting-services-sdk-deploy-sm-boto3"></a>

SageMaker AI 모델을 만들고 프레임워크별 모델 API에서 deploy() API를 사용하여 배포합니다. MXNet의 경우 [MXNetModel](https://sagemaker.readthedocs.io/en/stable/frameworks/mxnet/sagemaker.mxnet.html?highlight=MXNetModel#mxnet-model)이고 PyTorch의 경우 [PyTorchModel](https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/sagemaker.pytorch.html?highlight=PyTorchModel#sagemaker.pytorch.model.PyTorchModel)입니다. SageMaker AI 모델을 만들고 배포할 때 `MMS_DEFAULT_RESPONSE_TIMEOUT` 환경 변수를 `500`으로 설정하고, `entry_point` 파라미터를 추론 스크립트(`inference.py`)로 지정하고, `source_dir` 파라미터를 추론 스크립트의 디렉터리 위치(`code`)로 지정해야 합니다. 추론 스크립트(`inference.py`)를 준비하려면 사전 조건 단계를 따르세요.

다음 예시는 SageMaker AI SDK for Python을 사용하여 컴파일된 모델을 배포하기 위해 이러한 함수를 사용하는 방법을 보여줍니다.

------
#### [ MXNet ]

```
from sagemaker.mxnet import MXNetModel

# Create SageMaker model and deploy an endpoint
sm_mxnet_compiled_model = MXNetModel(
    model_data='insert S3 path of compiled MXNet model archive',
    role='AmazonSageMaker-ExecutionRole',
    entry_point='inference.py',
    source_dir='code',
    framework_version='1.8.0',
    py_version='py3',
    image_uri='insert appropriate ECR Image URI for MXNet',
    env={'MMS_DEFAULT_RESPONSE_TIMEOUT': '500'},
)

# Replace the example instance_type below to your preferred instance_type
predictor = sm_mxnet_compiled_model.deploy(initial_instance_count = 1, instance_type = 'ml.p3.2xlarge')

# Print the name of newly created endpoint
print(predictor.endpoint_name)
```

------
#### [ PyTorch 1.4 and Older ]

```
from sagemaker.pytorch import PyTorchModel

# Create SageMaker model and deploy an endpoint
sm_pytorch_compiled_model = PyTorchModel(
    model_data='insert S3 path of compiled PyTorch model archive',
    role='AmazonSageMaker-ExecutionRole',
    entry_point='inference.py',
    source_dir='code',
    framework_version='1.4.0',
    py_version='py3',
    image_uri='insert appropriate ECR Image URI for PyTorch',
    env={'MMS_DEFAULT_RESPONSE_TIMEOUT': '500'},
)

# Replace the example instance_type below to your preferred instance_type
predictor = sm_pytorch_compiled_model.deploy(initial_instance_count = 1, instance_type = 'ml.p3.2xlarge')

# Print the name of newly created endpoint
print(predictor.endpoint_name)
```

------
#### [ PyTorch 1.5 and Newer ]

```
from sagemaker.pytorch import PyTorchModel

# Create SageMaker model and deploy an endpoint
sm_pytorch_compiled_model = PyTorchModel(
    model_data='insert S3 path of compiled PyTorch model archive',
    role='AmazonSageMaker-ExecutionRole',
    entry_point='inference.py',
    source_dir='code',
    framework_version='1.5',
    py_version='py3',
    image_uri='insert appropriate ECR Image URI for PyTorch',
)

# Replace the example instance_type below to your preferred instance_type
predictor = sm_pytorch_compiled_model.deploy(initial_instance_count = 1, instance_type = 'ml.p3.2xlarge')

# Print the name of newly created endpoint
print(predictor.endpoint_name)
```

------

**참고**  
`AmazonSageMakerFullAccess` 및 `AmazonS3ReadOnlyAccess` 정책은 `AmazonSageMaker-ExecutionRole` IAM 역할에 연결되어야 합니다.

## Boto3, SageMaker 콘솔 또는 TensorFlow용 CLI를 사용하여 모델을 컴파일한 경우
<a name="neo-deployment-hosting-services-sdk-deploy-sm-boto3-tensorflow"></a>

`TensorFlowModel` 객체를 구성한 다음 deploy를 호출합니다.

```
role='AmazonSageMaker-ExecutionRole'
model_path='S3 path for model file'
framework_image='inference container arn'
tf_model = TensorFlowModel(model_data=model_path,
                framework_version='1.15.3',
                role=role, 
                image_uri=framework_image)
instance_type='ml.c5.xlarge'
predictor = tf_model.deploy(instance_type=instance_type,
                    initial_instance_count=1)
```

자세한 내용은 [모델 아티팩트에서 직접 배포](https://sagemaker.readthedocs.io/en/stable/frameworks/tensorflow/deploying_tensorflow_serving.html#deploying-directly-from-model-artifacts)를 참조하세요.

[이 목록](https://docs.aws.amazon.com//sagemaker/latest/dg/neo-deployment-hosting-services-container-images.html)에서 요구 사항에 맞는 도커 이미지 Amazon ECR URI를 선택할 수 있습니다.

`TensorFlowModel` 객체를 구성하는 방법에 대한 자세한 내용은 [SageMaker SDK](https://sagemaker.readthedocs.io/en/stable/frameworks/tensorflow/sagemaker.tensorflow.html#tensorflow-serving-model)를 참조하세요.

**참고**  
모델을 GPU에 배포하는 경우 첫 번째 추론 요청의 지연 시간이 길어질 수 있습니다. 첫 번째 추론 요청에서 최적화된 컴퓨팅 커널이 만들어지기 때문입니다. TFX로 보내기 전에 추론 요청의 워밍업 파일을 만들어 모델 파일과 함께 저장하는 것이 좋습니다. 이를 모델을 “워밍업”하는 것이라고 합니다.

다음 코드 스니펫은 [사전 조건](https://docs.aws.amazon.com//sagemaker/latest/dg/neo-deployment-hosting-services-prerequisites) 섹션의 이미지 분류 예제를 위한 워밍업 파일을 생성하는 방법을 보여줍니다.

```
import tensorflow as tf
from tensorflow_serving.apis import classification_pb2
from tensorflow_serving.apis import inference_pb2
from tensorflow_serving.apis import model_pb2
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_log_pb2
from tensorflow_serving.apis import regression_pb2
import numpy as np

with tf.python_io.TFRecordWriter("tf_serving_warmup_requests") as writer:       
    img = np.random.uniform(0, 1, size=[224, 224, 3]).astype(np.float32)
    img = np.expand_dims(img, axis=0)
    test_data = np.repeat(img, 1, axis=0)
    request = predict_pb2.PredictRequest()
    request.model_spec.name = 'compiled_models'
    request.model_spec.signature_name = 'serving_default'
    request.inputs['Placeholder:0'].CopyFrom(tf.compat.v1.make_tensor_proto(test_data, shape=test_data.shape, dtype=tf.float32))
    log = prediction_log_pb2.PredictionLog(
    predict_log=prediction_log_pb2.PredictLog(request=request))
    writer.write(log.SerializeToString())
```

모델을 “워밍업”하는 방법에 대한 자세한 내용은 [TensorFlow TFX 페이지](https://www.tensorflow.org/tfx/serving/saved_model_warmup)를 참조하세요.

# Boto3를 사용하여 컴파일된 모델 배포
<a name="neo-deployment-hosting-services-boto3"></a>

모델이 AWS SDK for Python (Boto3) AWS CLI또는 Amazon SageMaker AI 콘솔을 사용하여 컴파일된 경우 [ 사전 조건](https://docs.aws.amazon.com//sagemaker/latest/dg/neo-deployment-hosting-services-prerequisites) 섹션을 충족해야 합니다. 아래 단계에 따라 [Python용 Amazon Web Services SDK(Boto3)](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html)를 사용하여 SageMaker NEO 컴파일 모델 생성하고 배포하세요.

**Topics**
+ [모델 배포](#neo-deployment-hosting-services-boto3-steps)

## 모델 배포
<a name="neo-deployment-hosting-services-boto3-steps"></a>

[사전 조건](https://docs.aws.amazon.com//sagemaker/latest/dg/neo-deployment-hosting-services-prerequisites)을 충족한 후에는, `create_model`, `create_enpoint_config` 및 `create_endpoint` API를 사용하세요.

다음 예시는 이러한 API를 사용하여 Neo로 컴파일된 모델을 배포하는 방법을 보여줍니다.

```
import boto3
client = boto3.client('sagemaker')

# create sagemaker model
create_model_api_response = client.create_model(
                                    ModelName='my-sagemaker-model',
                                    PrimaryContainer={
                                        'Image': <insert the ECR Image URI>,
                                        'ModelDataUrl': 's3://path/to/model/artifact/model.tar.gz',
                                        'Environment': {}
                                    },
                                    ExecutionRoleArn='ARN for AmazonSageMaker-ExecutionRole'
                            )

print ("create_model API response", create_model_api_response)

# create sagemaker endpoint config
create_endpoint_config_api_response = client.create_endpoint_config(
                                            EndpointConfigName='sagemaker-neomxnet-endpoint-configuration',
                                            ProductionVariants=[
                                                {
                                                    'VariantName': <provide your variant name>,
                                                    'ModelName': 'my-sagemaker-model',
                                                    'InitialInstanceCount': 1,
                                                    'InstanceType': <provide your instance type here>
                                                },
                                            ]
                                       )

print ("create_endpoint_config API response", create_endpoint_config_api_response)

# create sagemaker endpoint
create_endpoint_api_response = client.create_endpoint(
                                    EndpointName='provide your endpoint name',
                                    EndpointConfigName=<insert your endpoint config name>,
                                )

print ("create_endpoint API response", create_endpoint_api_response)
```

**참고**  
`AmazonSageMakerFullAccess` 및 `AmazonS3ReadOnlyAccess` 정책은 `AmazonSageMaker-ExecutionRole` IAM 역할에 연결되어야 합니다.

`create_model`, `create_endpoint_config` 및 `create_endpoint` API의 전체 구문은 각각 [https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_model](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_model), [https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint_config](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint_config) 및 [https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint)를 참조하세요.

SageMaker AI를 사용하여 모델을 훈련시키지 않은 경우 다음 환경 변수를 지정합니다.

------
#### [ MXNet and PyTorch ]

```
"Environment": {
    "SAGEMAKER_PROGRAM": "inference.py",
    "SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code",
    "SAGEMAKER_CONTAINER_LOG_LEVEL": "20",
    "SAGEMAKER_REGION": "insert your region",
    "MMS_DEFAULT_RESPONSE_TIMEOUT": "500"
}
```

------
#### [ TensorFlow ]

```
"Environment": {
    "SAGEMAKER_PROGRAM": "inference.py",
    "SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code",
    "SAGEMAKER_CONTAINER_LOG_LEVEL": "20",
    "SAGEMAKER_REGION": "insert your region"
}
```

------

 SageMaker AI를 사용하여 모델을 훈련한 경우 `SAGEMAKER_SUBMIT_DIRECTORY` 환경 변수를 훈련 스크립트가 포함된 전체 Amazon S3 버킷 URI로 지정합니다.

# 를 사용하여 컴파일된 모델 배포 AWS CLI
<a name="neo-deployment-hosting-services-cli"></a>

모델이 AWS SDK for Python (Boto3) AWS CLI또는 Amazon SageMaker AI 콘솔을 사용하여 컴파일된 경우 [ 사전 조건](https://docs.aws.amazon.com//sagemaker/latest/dg/neo-deployment-hosting-services-prerequisites) 섹션을 충족해야 합니다. 아래 단계에 따라 [AWS CLI](https://docs.aws.amazon.com/cli/latest/reference/)를 사용하여 SageMaker NEO 컴파일 모델을 생성하고 배포하세요.

**Topics**
+ [모델 배포](#neo-deploy-cli)

## 모델 배포
<a name="neo-deploy-cli"></a>

[ 사전 조건을](https://docs.aws.amazon.com//sagemaker/latest/dg/neo-deployment-hosting-services-prerequisites) 충족한 후에는 , `create-model` `create-enpoint-config`및 `create-endpoint` AWS CLI 명령을 사용합니다. 다음 예시는 이러한 명령을 사용하여 Neo로 컴파일된 모델을 배포하는 방법을 보여줍니다.


### 모델 생성
<a name="neo-deployment-hosting-services-cli-create-model"></a>

[Neo 추론 컨테이너 이미지](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-deployment-hosting-services-container-images.html)에서 추론 이미지 URI를 선택한 다음 `create-model` API를 사용하여 SageMaker AI 모델을 만듭니다. 두 단계를 이용해 할 수 있습니다.

1. `create_model.json` 파일을 생성합니다. 파일 내에서 모델 이름, 이미지 URI, Amazon S3 버킷의 `model.tar.gz` 파일 경로, SageMaker AI 실행 역할을 지정합니다.

   ```
   {
       "ModelName": "insert model name",
       "PrimaryContainer": {
           "Image": "insert the ECR Image URI",
           "ModelDataUrl": "insert S3 archive URL",
           "Environment": {"See details below"}
       },
       "ExecutionRoleArn": "ARN for AmazonSageMaker-ExecutionRole"
   }
   ```

   SageMaker AI를 사용하여 모델을 훈련한 경우 다음 환경 변수를 지정합니다.

   ```
   "Environment": {
       "SAGEMAKER_SUBMIT_DIRECTORY" : "[Full S3 path for *.tar.gz file containing the training script]"
   }
   ```

   SageMaker AI를 사용하여 모델을 훈련시키지 않은 경우 다음 환경 변수를 지정합니다.

------
#### [ MXNet and PyTorch ]

   ```
   "Environment": {
       "SAGEMAKER_PROGRAM": "inference.py",
       "SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code",
       "SAGEMAKER_CONTAINER_LOG_LEVEL": "20",
       "SAGEMAKER_REGION": "insert your region",
       "MMS_DEFAULT_RESPONSE_TIMEOUT": "500"
   }
   ```

------
#### [ TensorFlow ]

   ```
   "Environment": {
       "SAGEMAKER_PROGRAM": "inference.py",
       "SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code",
       "SAGEMAKER_CONTAINER_LOG_LEVEL": "20",
       "SAGEMAKER_REGION": "insert your region"
   }
   ```

------
**참고**  
`AmazonSageMakerFullAccess` 및 `AmazonS3ReadOnlyAccess` 정책은 `AmazonSageMaker-ExecutionRole` IAM 역할에 연결되어야 합니다.

1. 다음 명령을 실행합니다.

   ```
   aws sagemaker create-model --cli-input-json file://create_model.json
   ```

   `create-model` API의 전체 구문은 [https://docs.aws.amazon.com/cli/latest/reference/sagemaker/create-model.html](https://docs.aws.amazon.com/cli/latest/reference/sagemaker/create-model.html) 섹션을 참조하세요.

### 엔드포인트 구성 생성
<a name="neo-deployment-hosting-services-cli-create-endpoint-config"></a>

SageMaker AI 모델을 만든 후 `create-endpoint-config` API를 사용하여 엔드포인트 구성을 만듭니다. 이렇게 하려면 엔드포인트 구성 사양이 포함된 JSON 파일을 생성하세요. 예를 들어, 다음 코드 템플릿을 사용하여 이를 `create_config.json`으로 저장할 수 있습니다.

```
{
    "EndpointConfigName": "<provide your endpoint config name>",
    "ProductionVariants": [
        {
            "VariantName": "<provide your variant name>",
            "ModelName": "my-sagemaker-model",
            "InitialInstanceCount": 1,
            "InstanceType": "<provide your instance type here>",
            "InitialVariantWeight": 1.0
        }
    ]
}
```

이제 다음 AWS CLI 명령을 실행하여 엔드포인트 구성을 생성합니다.

```
aws sagemaker create-endpoint-config --cli-input-json file://create_config.json
```

`create-endpoint-config` API의 전체 구문은 [https://docs.aws.amazon.com/cli/latest/reference/sagemaker/create-endpoint-config.html](https://docs.aws.amazon.com/cli/latest/reference/sagemaker/create-endpoint-config.html) 섹션을 참조하세요.

### 엔드포인트 생성
<a name="neo-deployment-hosting-services-cli-create-endpoint"></a>

엔드포인트 구성을 생성한 후 `create-endpoint` API를 사용하여 엔드포인트를 생성합니다.

```
aws sagemaker create-endpoint --endpoint-name '<provide your endpoint name>' --endpoint-config-name '<insert your endpoint config name>'
```

`create-endpoint` API의 전체 구문은 [https://docs.aws.amazon.com/cli/latest/reference/sagemaker/create-endpoint.html](https://docs.aws.amazon.com/cli/latest/reference/sagemaker/create-endpoint.html) 섹션을 참조하세요.

# 콘솔을 사용하여 컴파일된 모델 배포
<a name="neo-deployment-hosting-services-console"></a>

모델이 AWS SDK for Python (Boto3)또는 AWS CLI Amazon SageMaker AI 콘솔을 사용하여 컴파일된 경우 [ 사전 조건](https://docs.aws.amazon.com//sagemaker/latest/dg/neo-deployment-hosting-services-prerequisites) 섹션을 충족해야 합니다. 아래 단계에 따라 SageMaker AI 콘솔([https://console.aws.amazon.com/ SageMaker AI](https://console.aws.amazon.com/sagemaker/))을 사용하여 SageMaker AI NEO 컴파일 모델을 만들고 배포합니다.

**Topics**
+ [모델 배포](#deploy-the-model-console-steps)

## 모델 배포
<a name="deploy-the-model-console-steps"></a>

 [사전 조건](https://docs.aws.amazon.com//sagemaker/latest/dg/neo-deployment-hosting-services-prerequisites)을 충족한 후에는 다음 단계를 사용하여 Neo로 컴파일된 모델을 배포하세요.

1. **모델**을 선택한 다음 **추론** 그룹에서 **모델 생성**을 선택합니다. **모델 생성** 페이지에서 **모델 이름**, **IAM 역할**, 그리고 필요할 경우 **VPC**(선택)를 입력합니다.  
![\[추론을 위한 Neo 모델 생성\]](http://docs.aws.amazon.com/ko_kr/sagemaker/latest/dg/images/create-pipeline-model.png)

1. 모델 배포에 사용되는 컨테이너에 대한 정보를 추가하려면 **컨테이너 추가**를 선택하고 **다음**을 선택합니다. **컨테이너 입력 옵션**, **추론 코드 이미지 위치** 및 **모델 아티팩트의 위치**에 값을 입력하고, 선택에 따라 **컨테이너 호스트 이름** 및 **Environmental variables(환경 변수)** 필드를 입력합니다.  
![\[추론을 위한 Neo 모델 생성\]](http://docs.aws.amazon.com/ko_kr/sagemaker/latest/dg/images/neo-deploy-console-container-definition.png)

1. Neo 컴파일 모델을 배포하려면 다음 항목을 선택합니다.
   + **컨테이너 입력 옵션**: **모델 아티팩트 및 추론 이미지**를 션택합니다.
   + **추론 코드 이미지 위치**: AWS 리전 및 애플리케이션 종류에 따라 [네오 추론 컨테이너 이미지](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-deployment-hosting-services-container-images.html)에서 추론 이미지 URI를 선택합니다.
   + **모델 아티팩트 위치**: Neo 컴파일 API에서 생성된 컴파일된 모델 아티팩트의 Amazon S3 버킷 URI를 입력합니다.
   + **환경 변수**:
     + **SageMaker XGBoost**의 경우 이 필드를 비워 두세요.
     + SageMaker AI를 사용하여 모델을 훈련한 경우 `SAGEMAKER_SUBMIT_DIRECTORY` 환경 변수를 훈련 스크립트가 포함된 Amazon S3 버킷 URI로 지정합니다.
     + SageMaker AI를 사용하여 모델을 훈련시키지 않은 경우 다음 환경 변수를 지정합니다.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/ko_kr/sagemaker/latest/dg/neo-deployment-hosting-services-console.html)

1. 컨테이너에 대한 정보가 정확한지 확인한 다음 **모델 생성**을 선택합니다. **모델 생성 시작 페이지**에서 **엔드포인트 생성**을 선택합니다.  
![\[모델 생성 시작 페이지\]](http://docs.aws.amazon.com/ko_kr/sagemaker/latest/dg/images/neo-deploy-console-create-model-land-page.png)

1. **엔드포인트 생성 및 구성** 다이어그램에서 **엔드포인트 이름**을 지정합니다. **엔드포인트 구성 연결**에서 **새 엔드포인트 구성 생성**을 선택합니다.  
![\[Neo 콘솔이 엔드포인트 UI를 생성 및 구성합니다.\]](http://docs.aws.amazon.com/ko_kr/sagemaker/latest/dg/images/neo-deploy-console-config-endpoint.png)

1. **새로운 엔드포인트 구성** 페이지에서 **엔드포인트 구성 이름**을 지정합니다.  
![\[Neo 콘솔의 새 엔드포인트 구성 UI.\]](http://docs.aws.amazon.com/ko_kr/sagemaker/latest/dg/images/neo-deploy-console-new-endpoint-config.png)

1. 모델 이름 옆의 **편집**을 선택하고 **프로덕션 변형 편집** 페이지에서 올바른 **인스턴스 유형**을 지정합니다. **인스턴스 유형** 값은 컴파일 작업에 지정된 유형과 일치해야 합니다.  
![\[Neo 콘솔의 새 엔드포인트 구성 UI.\]](http://docs.aws.amazon.com/ko_kr/sagemaker/latest/dg/images/neo-deploy-console-edit-production-variant.png)

1. **저장**을 선택합니다.

1. **새 엔드포인트 구성** 페이지에서 **엔드포인트 구성 생성**을 선택한 다음 **엔드포인트 생성**을 선택합니다.

# 배포된 서비스를 사용한 추론 요청
<a name="neo-requests"></a>

[모델 배포](neo-deployment-hosting-services.md)의 설명을 따른 경우에는 SageMaker AI 엔드포인트가 설정되어 실행 중이어야 합니다. Neo가 컴파일한 모델을 배포한 방식에 관계없이 다음과 같은 세 가지 방법으로 추론 요청을 제출할 수 있습니다.

**Topics**
+ [배포된 서비스에서 추론 요청(Amazon SageMaker SDK)](neo-requests-sdk.md)
+ [배포된 서비스에서 추론 요청(Boto3)](neo-requests-boto3.md)
+ [배포된 서비스에서 추론 요청(AWS CLI)](neo-requests-cli.md)

# 배포된 서비스에서 추론 요청(Amazon SageMaker SDK)
<a name="neo-requests-sdk"></a>

다음과 같은 코드 예제를 사용하여 모델 훈련에 사용한 프레임워크를 기반으로 배포된 서비스로부터 추론을 요청할 수 있습니다. 각 프레임워크의 코드 예제는 비슷합니다. 주요 차이점은 TensorFlow가 콘텐츠 유형으로 `application/json`을 요구한다는 것입니다.

 
## PyTorch 및 MXNet
<a name="neo-requests-sdk-py-mxnet"></a>

 **PyTorch v1.4 이상** 또는 **MXNet 1.7.0 이상**을 사용 중이고 Amazon SageMaker AI 엔드포인트 `InService`가 있는 경우 Python용 SageMaker AI SDK의 `predictor` 패키지를 사용하여 추론 요청을 할 수 있습니다.

**참고**  
API는 SageMaker AI SDK for Python 버전에 따라 달라집니다.  
버전 1.x의 경우, [https://sagemaker.readthedocs.io/en/v1.72.0/api/inference/predictors.html#sagemaker.predictor.RealTimePredictor](https://sagemaker.readthedocs.io/en/v1.72.0/api/inference/predictors.html#sagemaker.predictor.RealTimePredictor)및 [https://sagemaker.readthedocs.io/en/v1.72.0/api/inference/predictors.html#sagemaker.predictor.RealTimePredictor.predict](https://sagemaker.readthedocs.io/en/v1.72.0/api/inference/predictors.html#sagemaker.predictor.RealTimePredictor.predict) API를 사용하세요.
버전 2.x의 경우, [https://sagemaker.readthedocs.io/en/stable/api/inference/predictors.html#sagemaker.predictor.Predictor](https://sagemaker.readthedocs.io/en/stable/api/inference/predictors.html#sagemaker.predictor.Predictor)및 [https://sagemaker.readthedocs.io/en/stable/api/inference/predictors.html#sagemaker.predictor.Predictor.predict](https://sagemaker.readthedocs.io/en/stable/api/inference/predictors.html#sagemaker.predictor.Predictor.predict) API를 사용하세요.

다음 코드 예제에서는 이러한 API를 통해 추론용 이미지를 전송하는 방법을 보여줍니다.

------
#### [ SageMaker Python SDK v1.x ]

```
from sagemaker.predictor import RealTimePredictor

endpoint = 'insert name of your endpoint here'

# Read image into memory
payload = None
with open("image.jpg", 'rb') as f:
    payload = f.read()

predictor = RealTimePredictor(endpoint=endpoint, content_type='application/x-image')
inference_response = predictor.predict(data=payload)
print (inference_response)
```

------
#### [ SageMaker Python SDK v2.x ]

```
from sagemaker.predictor import Predictor

endpoint = 'insert name of your endpoint here'

# Read image into memory
payload = None
with open("image.jpg", 'rb') as f:
    payload = f.read()
    
predictor = Predictor(endpoint)
inference_response = predictor.predict(data=payload)
print (inference_response)
```

------

## TensorFlow
<a name="neo-requests-sdk-py-tf"></a>

다음 코드 예제에서는 SageMaker Python SDK API를 사용하여 추론용 이미지를 전송하는 방법을 보여줍니다.

```
from sagemaker.predictor import Predictor
from PIL import Image
import numpy as np
import json

endpoint = 'insert the name of your endpoint here'

# Read image into memory
image = Image.open(input_file)
batch_size = 1
image = np.asarray(image.resize((224, 224)))
image = image / 128 - 1
image = np.concatenate([image[np.newaxis, :, :]] * batch_size)
body = json.dumps({"instances": image.tolist()})
    
predictor = Predictor(endpoint)
inference_response = predictor.predict(data=body)
print(inference_response)
```

# 배포된 서비스에서 추론 요청(Boto3)
<a name="neo-requests-boto3"></a>

 SageMaker AI 엔드포인트 `InService`가 있으면 SageMaker AI SDK for Python(Boto3) 클라이언트 및 [https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker-runtime.html#SageMakerRuntime.Client.invoke_endpoint](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker-runtime.html#SageMakerRuntime.Client.invoke_endpoint) API를 사용하여 추론 요청을 제출할 수 있습니다. 다음 코드 예제에서는 추론을 위해 이미지를 전송하는 방법을 보여줍니다.

------
#### [ PyTorch and MXNet ]

```
import boto3

import json
 
endpoint = 'insert name of your endpoint here'
 
runtime = boto3.Session().client('sagemaker-runtime')
 
# Read image into memory
with open(image, 'rb') as f:
    payload = f.read()
# Send image via InvokeEndpoint API
response = runtime.invoke_endpoint(EndpointName=endpoint, ContentType='application/x-image', Body=payload)

# Unpack response
result = json.loads(response['Body'].read().decode())
```

------
#### [ TensorFlow ]

TensorFlow의 경우 콘텐츠 유형에 대한 입력을 `application/json`과 함께 제출하세요.

```
from PIL import Image
import numpy as np
import json
import boto3

client = boto3.client('sagemaker-runtime') 
input_file = 'path/to/image'
image = Image.open(input_file)
batch_size = 1
image = np.asarray(image.resize((224, 224)))
image = image / 128 - 1
image = np.concatenate([image[np.newaxis, :, :]] * batch_size)
body = json.dumps({"instances": image.tolist()})
ioc_predictor_endpoint_name = 'insert name of your endpoint here'
content_type = 'application/json'   
ioc_response = client.invoke_endpoint(
    EndpointName=ioc_predictor_endpoint_name,
    Body=body,
    ContentType=content_type
 )
```

------
#### [ XGBoost ]

 XGBoost 애플리케이션의 경우 대신 CSV 텍스트를 제출해야 합니다.

```
import boto3
import json
 
endpoint = 'insert your endpoint name here'
 
runtime = boto3.Session().client('sagemaker-runtime')
 
csv_text = '1,-1.0,1.0,1.5,2.6'
# Send CSV text via InvokeEndpoint API
response = runtime.invoke_endpoint(EndpointName=endpoint, ContentType='text/csv', Body=csv_text)
# Unpack response
result = json.loads(response['Body'].read().decode())
```

------

 BYOM은 사용자 지정 콘텐츠 유형에 허용됩니다. 자세한 내용은 [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_runtime_InvokeEndpoint.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_runtime_InvokeEndpoint.html) 단원을 참조하십시오.

# 배포된 서비스에서 추론 요청(AWS CLI)
<a name="neo-requests-cli"></a>

Amazon SageMaker AI 엔드포인트 `InService`가 있으면 [https://docs.aws.amazon.com/cli/latest/reference/sagemaker-runtime/invoke-endpoint.html](https://docs.aws.amazon.com/cli/latest/reference/sagemaker-runtime/invoke-endpoint.html)를 사용하여 추론 요청을 할 수 있습니다. AWS Command Line Interface (AWS CLI)를 사용하여 추론 요청을 할 수 있습니다. 다음 예제에서는 추론을 위해 이미지를 전송하는 방법을 보여줍니다.

```
aws sagemaker-runtime invoke-endpoint --endpoint-name 'insert name of your endpoint here' --body fileb://image.jpg --content-type=application/x-image output_file.txt
```

추론이 성공하면 추론 요청에 대한 정보를 포함한 `output_file.txt`가 생성됩니다.

 TensorFlow의 경우 콘텐츠 유형으로 입력을 `application/json`과 함께 제출합니다.

```
aws sagemaker-runtime invoke-endpoint --endpoint-name 'insert name of your endpoint here' --body fileb://input.json --content-type=application/json output_file.txt
```

# 추론 컨테이너 이미지
<a name="neo-deployment-hosting-services-container-images"></a>

SageMaker Neo에서 이제 `ml_*` 대상에 대한 추론 이미지 URI 정보를 제공합니다. 자세한 내용은 [DescribeCompilationJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeCompilationJob.html#sagemaker-DescribeCompilationJob-response-InferenceImage)을 참조하세요.

사용 사례에 따라 아래 제공된 추론 이미지 URI 템플릿에서 강조 표시된 부분을 적절한 값으로 바꾸세요.

## Amazon SageMaker AI XGBoost
<a name="inference-container-collapse-xgboost"></a>

```
aws_account_id.dkr.ecr.aws_region.amazonaws.com/xgboost-neo:latest
```

사용한 *aws\$1region*을 기준으로 이 페이지 끝에 있는 표의 *aws\$1account\$1id*를 바꾸세요.

## Keras
<a name="inference-container-collapse-keras"></a>

```
aws_account_id.dkr.ecr.aws_region.amazonaws.com/sagemaker-neo-keras:fx_version-instance_type-py3
```

사용한 *aws\$1region*을 기준으로 이 페이지 끝에 있는 표의 *aws\$1account\$1id*를 바꾸세요.

*fx\$1version*을 `2.2.4`로 바꾸세요.

*instance\$1type*을 `cpu` 또는 `gpu`로 바꾸세요.

## MXNet
<a name="inference-container-collapse-mxnet"></a>

------
#### [ CPU or GPU instance types ]

```
aws_account_id.dkr.ecr.aws_region.amazonaws.com/sagemaker-inference-mxnet:fx_version-instance_type-py3
```

사용한 *aws\$1region*을 기준으로 이 페이지 끝에 있는 표의 *aws\$1account\$1id*를 바꾸세요.

*fx\$1version*을 `1.8.0`로 바꾸세요.

*instance\$1type*을 `cpu` 또는 `gpu`로 바꾸세요.

------
#### [ Inferentia1 ]

```
aws_account_id.dkr.ecr.aws_region.amazonaws.com/sagemaker-neo-mxnet:fx_version-instance_type-py3
```

*aws\$1region*을 `us-east-1` 또는 `us-west-2`로 바꾸세요.

사용한 *aws\$1region*을 기준으로 이 페이지 끝에 있는 표의 *aws\$1account\$1id*를 바꾸세요.

*fx\$1version*을 `1.5.1`로 바꾸세요.

*`instance_type`*를 `inf`로 바꿉니다.

------

## ONNX
<a name="inference-container-collapse-onnx"></a>

```
aws_account_id.dkr.ecr.aws_region.amazonaws.com/sagemaker-neo-onnx:fx_version-instance_type-py3
```

사용한 *aws\$1region*을 기준으로 이 페이지 끝에 있는 표의 *aws\$1account\$1id*를 바꾸세요.

*fx\$1version*을 `1.5.0`로 바꾸세요.

*instance\$1type*을 `cpu` 또는 `gpu`로 바꾸세요.

## PyTorch
<a name="inference-container-collapse-pytorch"></a>

------
#### [ CPU or GPU instance types ]

```
aws_account_id.dkr.ecr.aws_region.amazonaws.com/sagemaker-inference-pytorch:fx_version-instance_type-py3
```

사용한 *aws\$1region*을 기준으로 이 페이지 끝에 있는 표의 *aws\$1account\$1id*를 바꾸세요.

*fx\$1version*을 `1.4`, `1.5`, `1.6`, `1.7`, `1.8`, `1.12`, `1.13` 또는 `2.0`으로 바꾸세요.

*instance\$1type*을 `cpu` 또는 `gpu`로 바꾸세요.

------
#### [ Inferentia1 ]

```
aws_account_id.dkr.ecr.aws_region.amazonaws.com/sagemaker-neo-pytorch:fx_version-instance_type-py3
```

*aws\$1region*을 `us-east-1` 또는 `us-west-2`로 바꾸세요.

사용한 *aws\$1region*을 기준으로 이 페이지 끝에 있는 표의 *aws\$1account\$1id*를 바꾸세요.

*fx\$1version*을 `1.5.1`로 바꾸세요.

*`instance_type`*를 `inf`로 바꿉니다.

------
#### [ Inferentia2 and Trainium1 ]

```
763104351884.dkr.ecr.aws_region.amazonaws.com/pytorch-inference-neuronx:1.13.1-neuronx-py38-sdk2.10.0-ubuntu20.04
```

*aws\$1region*을 Inferentia2의 경우 `us-east-2`로, Trainium1의 경우 `us-east-1`로 바꾸세요.

------

## TensorFlow
<a name="inference-container-collapse-tf"></a>

------
#### [ CPU or GPU instance types ]

```
aws_account_id.dkr.ecr.aws_region.amazonaws.com/sagemaker-inference-tensorflow:fx_version-instance_type-py3
```

사용한 *aws\$1region*을 기준으로 이 페이지 끝에 있는 표의 *aws\$1account\$1id*를 바꾸세요.

*fx\$1version*을 `1.15.3` 또는 `2.9`로 바꾸세요.

*instance\$1type*을 `cpu` 또는 `gpu`로 바꾸세요.

------
#### [ Inferentia1 ]

```
aws_account_id.dkr.ecr.aws_region.amazonaws.com/sagemaker-neo-tensorflow:fx_version-instance_type-py3
```

사용한 *aws\$1region*을 기준으로 이 페이지 끝에 있는 표의 *aws\$1account\$1id*를 바꾸세요. `inf` 인스턴스 유형의 경우 `us-east-1` 및 `us-west-2`만 지원된다는 점에 유의하세요.

*fx\$1version*을 `1.15.0`으로 바꾸세요.

*instance\$1type*을 `inf`로 바꾸세요.

------
#### [ Inferentia2 and Trainium1 ]

```
763104351884.dkr.ecr.aws_region.amazonaws.com/tensorflow-inference-neuronx:2.10.1-neuronx-py38-sdk2.10.0-ubuntu20.04
```

*aws\$1region*을 Inferentia2의 경우 `us-east-2`로, Trainium1의 경우 `us-east-1`로 바꾸세요.

------

다음 표는 *aws\$1account\$1id*를 *aws\$1region*과 매핑한 것입니다. 이 표를 사용하여 애플리케이션에 필요한 올바른 추론 이미지 URI를 찾을 수 있습니다.


| aws\$1account\$1id | aws\$1region | 
| --- | --- | 
| 785573368785 | us-east-1 | 
| 007439368137 | us-east-2 | 
| 710691900526 | us-west-1 | 
| 301217895009 | us-west-2 | 
| 802834080501 | eu-west-1 | 
| 205493899709 | eu-west-2 | 
| 254080097072 | eu-west-3 | 
| 601324751636 | eu-north-1 | 
| 966458181534 | eu-south-1 | 
| 746233611703 | eu-central-1 | 
| 110948597952 | ap-east-1 | 
| 763008648453 | ap-south-1 | 
| 941853720454 | ap-northeast-1 | 
| 151534178276 | ap-northeast-2 | 
| 925152966179 | ap-northeast-3 | 
| 324986816169 | ap-southeast-1 | 
| 355873309152 | ap-southeast-2 | 
| 474822919863 | cn-northwest-1 | 
| 472730292857 | cn-north-1 | 
| 756306329178 | sa-east-1 | 
| 464438896020 | ca-central-1 | 
| 836785723513 | me-south-1 | 
| 774647643957 | af-south-1 | 
| 275950707576 | il-central-1 |