

# Validation rules for manifest files
<a name="md-create-manifest-file-validation-rules"></a>

 When you import a manifest file, Amazon Rekognition Custom Labels applies validation rules for limits, syntax, and semantics. The SageMaker AI Ground Truth schema enforces syntax validation. For more information, see [Output](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-data-output.html). The following are the validation rules for limits and semantics.

**Note**  
The 20% invalidity rules apply cumulatively across all validation rules. If the import exceeds the 20% limit due to any combination, such as 15% invalid JSON and 15% invalid images, the import fails. 
Each dataset object is a line in the manifest. Blank/invalid lines are also counted as dataset objects.
Overlaps are (common labels between test and train)/(train labels).

**Topics**
+ [Limits](#md-validation-rules-limits)
+ [Semantics](#md-validation-rules-semantics)

## Limits
<a name="md-validation-rules-limits"></a>


| Validation | Limit | Error raised | 
| --- | --- | --- | 
|  Manifest file size  |  Maximum 1 GB  |  Error  | 
|  Maximum line count for a manifest file  |  Maximum of 250,000 dataset objects as lines in a manifest.   |  Error  | 
|  Lower boundary on total number of valid dataset objects per label   |  >= 1  |  Error  | 
|  Lower boundary on labels  |  >=2  |  Error  | 
|  Upper bound on labels  |  <= 250  |  Error  | 
|  Minimum bounding boxes per image  |  0  |  None  | 
|  Maximum bounding boxes per image  |  50  |  None  | 

## Semantics
<a name="md-validation-rules-semantics"></a>




| Validation | Limit | Error raised | 
| --- | --- | --- | 
|  Empty manifest  |    |  Error  | 
|  Missing/in-accessible source-ref object  |  Number of objects less than 20%  |  Warning  | 
|  Missing/in-accessible source-ref object  |  Number of objects > 20%  |  Error  | 
|  Test labels not present in training dataset   |  At least 50% overlap in the labels  |  Error  | 
|  Mix of label vs. object examples for same label in a dataset. Classification and detection for the same class in a dataset object.   |    |  No error or warning  | 
|  Overlapping assets between test and train   |  There should not be an overlap between test and training datasets.   |    | 
|  Images in a dataset must be from same bucket   |  Error if the objects are in a different bucket  |  Error  | 