本頁面由 Cloud Translation API 翻譯而成。

準備物件追蹤的影片訓練資料

本頁面說明如何準備影片訓練資料，以用於 Vertex AI 資料集，訓練影片物件追蹤模型。

以下各節將說明資料需求、結構定義檔案，以及結構定義所定義的資料匯入檔案格式 (JSONL 和 CSV)。

或者，您也可以匯入未標註的影片，然後使用 Google Cloud 控制台標註影片 (請參閱「使用 Google Cloud 控制台標註」)。

資料條件

下列規定適用於用於訓練 AutoML 或自訂訓練模型的資料集。

Vertex AI 支援以下列影片格式，可用於訓練模型或要求預測 (為影片加上註解)。
- .MOV
- .MPEG4
- .MP4
- .AVI
如要在網路控制台中查看影片內容或為影片加上註解，影片必須採用瀏覽器原生支援的格式。由於並非所有瀏覽器都能原生處理 .MOV 或 .AVI 內容，建議您使用 .MPEG4 或 .MP4 影片格式。
檔案大小上限為 50 GB (長度上限為 3 小時)。不支援容器中具有格式錯誤或空白時間戳記的個別影片檔案。
每個資料集的標籤數上限為 1,000。
您可以為匯入檔案中的影片指派「ML_USE」標籤。在訓練期間，您可以選擇使用這些標籤將影片和相應的註解分割成「訓練」或「測試」集。針對影片物件追蹤，請注意以下事項：
- 每個資料集中的已標籤視訊窗格數上限為 150,000。
- 每個資料集的已加註定界框總數上限為 1,000,000。
- 每個註解集的標籤數上限為 1,000。

用於訓練 AutoML 模型的影片資料最佳做法

下列做法適用於用於訓練 AutoML 模型的資料集。

訓練資料應儘可能貼近要用來進行預測的資料。例如，如果您使用的資料樣本含有模糊和低解析的影片 (例如監視攝影機拍攝的影片)，則訓練資料應該也要含有模糊和低解析影片。通常，您應該還要考慮為訓練影片提供多個角度、解析度和背景。
Vertex AI 模型通常無法預測人類無法指派的標籤。如果無法訓練人類在觀看影片 1-2 秒後指派標籤，則可能也無法訓練模型這麼做。
最常見標籤適用的影片數量最多比最不常見標籤的影片數多 100 倍時，模型的訓練效果最佳。建議您移除使用頻率偏低的標籤。物件追蹤：
- 定界框尺寸最低為 10 x 10 像素。
- 如果是解析度遠大於 1024 x 1024 像素的視訊窗格，AutoML 物件追蹤可能會在進行窗格正規化的過程中造成一些影像失真。
- 每個標籤至少需在三個獨立的視訊畫面中呈現。此外，每個標籤也必須至少有十個註解。

結構定義檔案

建立用於匯入註解的 jsonl 檔案時，請使用下列可公開存取的結構定義檔。這個結構定義檔案會決定資料輸入檔案的格式。檔案的結構會遵循 OpenAPI 架構測試。

物件追蹤結構定義檔案：

gs://google-cloud-aiplatform/schema/dataset/ioformat/object_tracking_io_format_1.0.0.yaml

完整結構定義檔案

   title: VideoObjectTracking version: 1.0.0 description: >   Import and export format for importing/exporting videos together with   temporal bounding box annotations. type: object required: - videoGcsUri properties:   videoGcsUri:     type: string     description: >       A Cloud Storage URI pointing to a video. Up to 50 GB in size and       up to 3 hours in duration. Supported file mime types: `video/mp4`,       `video/avi`, `video/quicktime`.   TemporalBoundingBoxAnnotations:     type: array     description: Multiple temporal bounding box annotations. Each on a frame of the video.     items:       type: object       description: >         Temporal bounding box anntoation on video. `xMin`, `xMax`, `yMin`, and         `yMax` are relative to the video frame size, and the point 0,0 is in the         top left of the frame.       properties:         displayName:           type: string           description: >             It will be imported as/exported from AnnotationSpec's display name,             i.e., the name of the label/class.         xMin:           description: The leftmost coordinate of the bounding box.           type: number           format: double         xMax:           description: The rightmost coordinate of the bounding box.           type: number           format: double         yMin:           description: The topmost coordinate of the bounding box.           type: number           format: double         yMax:           description: The bottommost coordinate of the bounding box.           type: number           format: double         timeOffset:           type: string           description: >             A time offset of a video in which the object has been detected.             Expressed as a number of seconds as measured from the             start of the video, with fractions up to a microsecond precision, and             with "s" appended at the end.         instanceId:           type: number           format: integer           description: >             The instance of the object, expressed as a positive integer. Used to             tell apart objects of the same type when multiple are present on a             single video.         annotationResourceLabels:           description: Resource labels on the Annotation.           type: object           additionalProperties:             type: string   dataItemResourceLabels:     description: Resource labels on the DataItem.     type: object     additionalProperties:       type: string

輸入檔案

影片物件追蹤訓練資料的格式如下。

如要匯入資料，請建立 JSONL 或 CSV 檔案。

JSONL

每行 JSON：
詳情請參閱物件追蹤 YAML 檔案。

   { 	"videoGcsUri": "gs://bucket/filename.ext", 	"TemporalBoundingBoxAnnotations": [{ 		"displayName": "LABEL", 		"xMin": "leftmost_coordinate_of_the_bounding box", 		"xMax": "rightmost_coordinate_of_the_bounding box", 		"yMin": "topmost_coordinate_of_the_bounding box", 		"yMax": "bottommost_coordinate_of_the_bounding box", 		"timeOffset": "timeframe_object-detected"                 "instanceId": "instance_of_object                 "annotationResourceLabels": "resource_labels" 	}], 	"dataItemResourceLabels": { 		"aiplatform.googleapis.com/ml_use": "train|test" 	} }

JSONL 範例 - 影片物件追蹤：

   {'videoGcsUri': 'gs://demo-data/video1.mp4', 'temporal_bounding_box_annotations': [{'displayName': 'horse', 'instance_id': '-1', 'time_offset': '4.000000s', 'xMin': '0.668912', 'yMin': '0.560642', 'xMax': '1.000000', 'yMax': '1.000000'}], "dataItemResourceLabels": {"aiplatform.googleapis.com/ml_use": "training"}} {'videoGcsUri': 'gs://demo-data/video2.mp4', 'temporal_bounding_box_annotations': [{'displayName': 'horse', 'instance_id': '-1', 'time_offset': '71.000000s', 'xMin': '0.679056', 'yMin': '0.070957', 'xMax': '0.801716', 'yMax': '0.290358'}], "dataItemResourceLabels": {"aiplatform.googleapis.com/ml_use": "test"}} ...

CSV

CSV 檔案中的列格式：

[ML_USE,]VIDEO_URI,LABEL,[INSTANCE_ID],TIME_OFFSET,BOUNDING_BOX

資料欄清單

ML_USE (選填)。訓練模型時用於資料分割。請使用「訓練」或「測試」。
VIDEO_URI：這個欄位包含影片的 Cloud Storage URI。Cloud Storage URI 會區分大小寫。
LABEL. 標籤開頭必須是字母，且只能含有字母、數字和底線。您可以在 CSV 檔案中新增多個資料列，每個資料列識別同一個影片片段，不同的資料列使用不同的標籤，以這種方法為影片指定多個標籤。
INSTANCE_ID (選填)。用來識別跨視訊窗格物件的執行個體 ID。如果有提供，AutoML 物件追蹤會用它來對物件追蹤、調整、訓練及評測。相同物件執行個體的定界框，在不同的視訊窗格中呈現，會以相同的執行個體 ID 標示。執行個體 ID 僅在各視訊中是唯一值，但在資料集中則否。舉例來說，來自不同視訊的兩個物件擁有相同的執行個體 ID，並不表示他們是相同的執行個體。
TIME_OFFSET：指示從影片開頭的時間偏移量。時間偏移值為浮點數，單位為秒。
BOUNDING_BOX：影片畫面中物件的定界框。指定定界框時，需要使用多個欄。

A. x_relative_min,y_relative_min
B. x_relative_max,y_relative_min
C. x_relative_max,y_relative_max
D. x_relative_min,y_relative_max

每個頂點皆由 x、y 座標值指定。座標值必須是介於 0 到 1 之間的浮點數，其中 0 代表 x 或 y 的最小值，而 1 則代表 x 或 y 的最大值。
舉例來說，(0,0) 代表左上角，(1,1) 代表右下角，整個影像的定界框會表示為 (0,0,,,1,1,,) 或 (0,0,1,0,1,1,0,1)。
AutoML 物件追蹤不需指定頂點的順序。此外，如果四個指定頂點形成的矩形沒有辦法跟影像的邊緣切齊的話，Vertex AI 即會指定可以形成此矩形的頂點。
物件的定界框可透過以下兩種方式指定：
1. 指定兩個頂點，由一組 x、y 座標組成，如果是矩形的對角點：
  A. x_relative_min、y_relative_min
  C. x_relative_max,y_relative_max
  如以下範例所示：
  x_relative_min, y_relative_min,,,x_relative_max,y_relative_max,,
2. 所有四個頂點皆如以下所示：
  x_relative_min,y_relative_min, x_relative_max,y_relative_min, x_relative_max,y_relative_max, x_relative_min,y_relative_max,
  如果四個指定頂點無法形成與圖片邊緣平行的矩形，Vertex AI 會指定可形成此矩形的頂點。

資料集檔案中的資料列範例

以下幾列說明如何在資料集中指定資料。範例包含 Cloud Storage 影片的路徑、物件的標籤、開始追蹤的時間偏移，以及兩個對角線頂點。VIDEO_URI.,LABEL,INSTANCE_ID,TIME_OFFSET,x_relative_min,y_relative_min,x_relative_max,y_relative_min,x_relative_max,y_relative_max,x_relative_min,y_relative_max

gs://folder/video1.avi,car,,12.90,0.8,0.2,,,0.9,0.3,,
gs://folder/video1.avi,bike,,12.50,0.45,0.45,,,0.55,0.55,,
where,

VIDEO_URI 為 gs://folder/video1.avi，
LABEL 為 car，
INSTANCE_ID (未指定)
TIME_OFFSET 為 12.90，
x_relative_min,y_relative_min 是 0.8,0.2，
未指定 x_relative_max,y_relative_min，
x_relative_max,y_relative_max 是 0.9,0.3，
未指定 x_relative_min,y_relative_max

如先前所述，您也可以提供四個頂點，藉此指定邊界框，如以下範例所示。

gs://folder/video1.avi,car,,12.10,0.8,0.8,0.9,0.8,0.9,0.9,0.8,0.9 gs://folder/video1.avi,car,,12.90,0.4,0.8,0.5,0.8,0.5,0.9,0.4,0.9 gs://folder/video1.avi,car,,12.10,0.4,0.2,0.5,0.2,0.5,0.3,0.4,0.3

CSV 範例 - 不含標籤：

您也可以不在資料檔案中指定任何標籤，直接提供影片。然後，您必須在訓練模型之前使用 Google Cloud 控制台將標籤套用至您的資料。如要進行這項操作，您只須提供影片的 Cloud Storage URI，後面加上十一個逗號，如以下範例所示。

未指派 ml_use 的範例：

   gs://folder/video1.avi   ...

已指派 ml_use 的範例：

   TRAINING,gs://folder/video1.avi   TEST,gs://folder/video2.avi   ...

建立資料集