# Container Insights 问题排查如果在使用 Container Insights 时遇到问题，以下部分可为您提供帮助。 ## 在 Amazon EKS 或 Kubernetes 上部署失败如果未在 Kubernetes 集群上正确部署该代理，请尝试执行以下操作： + 运行以下命令以获取 pod 列表。 ``` kubectl get pods -n amazon-cloudwatch ``` + 运行以下命令并在输出底部检查事件。 ``` kubectl describe pod pod-name -n amazon-cloudwatch ``` + 运行以下命令以检查日志。 ``` kubectl logs pod-name -n amazon-cloudwatch ``` ## 未经授权的 panic：无法从 kubelet 检索 cadvisor 数据如果您的部署失败，并显示错误 `Unauthorized panic: Cannot retrieve cadvisor data from kubelet`，则您的 kubelet 可能未启用 Webhook 授权模式。Container Insights 需要此模式。有关更多信息，请参阅 [在 CloudWatch 中验证 Container Insights 的先决条件](Container-Insights-prerequisites.md)。 ## 在 Amazon ECS 上已删除和重新创建的集群上部署 Container Insights 如果删除未启用 Container Insights 的现有 Amazon ECS 集群，并使用相同名称重新创建它，则无法在重新创建此集群时在此新集群上启用 Container Insights。您可以通过重新创建来启用它，然后输入以下命令： ``` aws ecs update-cluster-settings --cluster myCICluster --settings name=container Insights,value=enabled ``` ## 端点无效错误如果您看到类似于以下内容的错误消息，请检查以确保已将正在使用的命令中的所有占位符（例如 *cluster-name* 和 *region-name*）替换为正确的部署信息。 ``` "log": "2020-04-02T08:36:16Z E! cloudwatchlogs: code: InvalidEndpointURL, message: invalid endpoint uri, original error: &url.Error{Op:\"parse\", URL:\"https://logs.{{region_name}}.amazonaws.com/\", Err:\"{\"}, &awserr.baseError{code:\"InvalidEndpointURL\", message:\"invalid endpoint uri\", errs:[]error{(*url.Error)(0xc0008723c0)}}\n", ``` ## 指标未显示在控制台中如果您在 AWS 管理控制台中未看到任何 Container Insights 指标，请确保已完成 Container Insights 的设置。在完全设置 Container Insights 之前，不会显示指标。有关更多信息，请参阅 [设置 Container Insights](deploy-container-insights.md)。 ## 升级集群后，Amazon EKS 或 Kubernetes 上缺少 Pod 指标如果在您将 CloudWatch 代理作为进程守护程序集部署在新的或升级的集群上后，全部或部分容器组（pod）指标丢失，或者您看到包含 `W! No pod metric collected` 消息的错误日志，则本节可能很有帮助。这些错误可能是由容器运行时的更改引起的，例如 containerd 或 docker systemd cgroup 驱动程序。您通常可以通过更新部署清单来解决此问题，以便将主机的 containerd 套接字挂载到容器中。请参见以下示例： ``` # For full example see https://github.com/aws-samples/amazon-cloudwatch-container-insights/blob/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/cwagent/cwagent-daemonset.yaml apiVersion: apps/v1 kind: DaemonSet metadata: name: cloudwatch-agent namespace: amazon-cloudwatch spec: template: spec: containers: - name: cloudwatch-agent # ... # Don't change the mountPath volumeMounts: # ... - name: dockersock mountPath: /var/run/docker.sock readOnly: true - name: varlibdocker mountPath: /var/lib/docker readOnly: true - name: containerdsock # NEW mount mountPath: /run/containerd/containerd.sock readOnly: true # ... volumes: # ... - name: dockersock hostPath: path: /var/run/docker.sock - name: varlibdocker hostPath: path: /var/lib/docker - name: containerdsock # NEW volume hostPath: path: /run/containerd/containerd.sock ``` ## 使用 Bottlerocket for Amazon EKS 时没有 pod 指标 Bottlerocket 是一个基于 Linux 的开源操作系统，AWS 专为运行容器而构建。 Bottlerocket 在主机上使用不同的 `containerd` 路径，因此您需要将卷更改为其位置。如果未这样设置，包含 `W! No pod metric collected` 的日志会显示错误。请参阅以下示例。 ``` volumes: # ... - name: containerdsock hostPath: # path: /run/containerd/containerd.sock # bottlerocket does not mount containerd sock at normal place # https://github.com/bottlerocket-os/bottlerocket/commit/91810c85b83ff4c3660b496e243ef8b55df0973b path: /run/dockershim.sock ``` ## 将 containerd 运行时用于 Amazon EKS 或 Kubernetes 时，没有容器文件系统指标这是一个已知问题，正在由社群贡献者解决。有关更多信息，请参阅 GitHub 上的 [containerd 的磁盘使用指标](https://github.com/google/cadvisor/issues/2785)和 [containerd 的 cadvisor 不支持容器文件系统指标](https://github.com/aws/amazon-cloudwatch-agent/issues/192)。 ## 收集 Prometheus 指标时，CloudWatch 代理的日志卷意外增加这是 CloudWatch 代理的 1.247347.6b250880 版本中推出的回归。此回归已在代理的更新版本中得到修复。它的影响仅限于客户收集 CloudWatch 代理本身的日志并且仍在使用 Prometheus 的情况。有关更多信息，请参阅 GitHub 上的 [[prometheus] 代理正在日志中打印所有收集的指标](https://github.com/aws/amazon-cloudwatch-agent/issues/209)。 ## 未从 Dockerhub 找到发布说明中提到的最新 docker 镜像在内部开始实际发布之前，我们会更新 Github 上的发布说明和标签。在 Github 上增加版本号后，通常需要 1 – 2 周才能在注册表上看到最新的 Docker 镜像。CloudWatch 代理容器镜像不会在夜间发布。您可以在以下位置直接从源代码构建镜像：[https://github.com/aws/amazon-cloudwatch-agent/tree/main/amazon-cloudwatch-container-insights/cloudwatch-agent-dockerfile](https://github.com/aws/amazon-cloudwatch-agent/tree/main/amazon-cloudwatch-container-insights/cloudwatch-agent-dockerfile) ## CloudWatch 代理上的 CrashLoopBackoff 错误如果您看到 CloudWatch 代理出现 `CrashLoopBackOff` 错误，请确保您的 IAM 权限设置正确。有关更多信息，请参阅 [在 CloudWatch 中验证 Container Insights 的先决条件](Container-Insights-prerequisites.md)。 ## CloudWatch 代理或 Fluentd 容器组（pod）卡在待处理状态如果您有一个 CloudWatch 代理或 Fluentd 容器组（pod）卡在 `Pending` 状态或出现 `FailedScheduling` 错误，请根据代理所需的内核数量和 RAM 量确定您的节点是否有足够的计算资源。使用以下命令描述此 pod： ``` kubectl describe pod cloudwatch-agent-85ppg -n amazon-cloudwatch ```