From da73a94e0736c2001befb7c54a71041f0d9299e0 Mon Sep 17 00:00:00 2001 From: gopalakp <72235203+gopalakp@users.noreply.github.com> Date: Fri, 16 Oct 2020 11:56:23 -0700 Subject: [PATCH] Add Ground Truth Streaming notebooks (#1617) * Add Ground Truth Streaming notebooks * Made below changes * Replace .format with f-strings * Added pip sagemaker isntall * Download image from public url * Minor comments * Minor f-string updates to chained notebook Co-authored-by: Gopalakrishna, Priyanka --- ...reate_chained_streaming_labeling_job.ipynb | 1745 +++++++++++++++++ ..._truth_create_streaming_labeling_job.ipynb | 992 ++++++++++ 2 files changed, 2737 insertions(+) create mode 100644 ground_truth_labeling_jobs/ground_truth_streaming_labeling_jobs/ground_truth_create_chained_streaming_labeling_job.ipynb create mode 100644 ground_truth_labeling_jobs/ground_truth_streaming_labeling_jobs/ground_truth_create_streaming_labeling_job.ipynb diff --git a/ground_truth_labeling_jobs/ground_truth_streaming_labeling_jobs/ground_truth_create_chained_streaming_labeling_job.ipynb b/ground_truth_labeling_jobs/ground_truth_streaming_labeling_jobs/ground_truth_create_chained_streaming_labeling_job.ipynb new file mode 100644 index 0000000000..bf6b0aa37e --- /dev/null +++ b/ground_truth_labeling_jobs/ground_truth_streaming_labeling_jobs/ground_truth_create_chained_streaming_labeling_job.ipynb @@ -0,0 +1,1745 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Chaining using Ground Truth Streaming Labeling Jobs\n", + "\n", + "You can use a streaming labeling job to perpetually send new data objects to Amazon SageMaker Ground Truth to be labeled. Ground Truth streaming labeling jobs remain active until they are manually stopped or have been idle for more than 10 days. You can intermittently send new data objects to workers while the labeling job is active. \n", + "\n", + "Use this notebook to create a Ground Truth streaming labeling job using any of the [built-in task types](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-task-types.html). You can make necessary parameter changes for the custom workflow. You can either configure the notebook to create a labeling job using your own input data, or run the notebook on *default* mode and use provided, image input data. **To use your own input data, set `DEFAULT` to `False`**.\n", + "\n", + "Chaining is a powerful feature that you can use to send the output of one streaming labeling job to another streaming labeling job. This opens up multiple possibilities to setup jobs so that data of Job 1 and can flow to Job2, Data of Job 2 can flow to Job n-1, data of Job n-1 can flow to Job n in real time.\n", + "\n", + "In this notebook, we show how you can use 2 such streaming jobs in a chained fashion. If you select `DEFAULT` to `True`, we setup Job 1 which is an \"Object Detection\" job where one can draw bounding boxes around objects and Job 2 to be an \"Object Detection Adjustment\" job where one can adjust the previously drawn bounding boxes from Job 1." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "DEFAULT=True" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To read more about streaming labeling jobs, see the Amazon SageMaker documentation on [Ground Truth Streaming Labeling Jobs](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-streaming-labeling-job.html). \n", + "\n", + "To learn more about each step in this notebook, refer to [Create a Streaming Labeling Job](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-streaming-create-job.html)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Get latest version of AWS python SDK" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!pip install -q --upgrade pip\n", + "!pip install awscli -q --upgrade\n", + "!pip install botocore -q --upgrade\n", + "!pip install boto3 -q --upgrade\n", + "!pip install sagemaker -q --upgrade\n", + "\n", + "# NOTE: Restart Kernel after the above command" + ] + }, + { + "cell_type": "code", + "execution_count": 254, + "metadata": {}, + "outputs": [], + "source": [ + "import boto3\n", + "import botocore\n", + "import json\n", + "import time\n", + "import sagemaker\n", + "import re\n", + "import os" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "\n", + "You will create some of the resources you need to launch a Ground Truth streaming labeling job in this notebook. You must create the following resources before executing this notebook:\n", + "\n", + "* A work team. A work team is a group of workers that complete labeling tasks. If you want to preview the worker UI and execute the labeling task you will need to create a private work team, add yourself as a worker to this team, and provide the work team ARN below. If you do not want to use a private or vendor work team ARN, set `private_work_team` to `False` to use the Amazon Mechanical Turk workforce. To learn more about private, vendor, and Amazon Mechanical Turk workforces, see [Create and Manage Workforces\n", + "](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-workforce-management.html).\n", + " * **IMPORTANT**: 3D point cloud and video frame labeling jobs only support private and vendor workforces. If you plan to use 3D point cloud or video frame input data, specify a private or vendor workforce below for WORKTEAM_ARN. " + ] + }, + { + "cell_type": "code", + "execution_count": 255, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "This notebook will use the work team ARN: <>\n" + ] + } + ], + "source": [ + "private_work_team = True # Set it to false if using Amazon Mechanical Turk Workforce\n", + "\n", + "if(private_work_team): \n", + " WORKTEAM_ARN = '<>'\n", + "else :\n", + " region = boto3.session.Session().region_name\n", + " WORKTEAM_ARN = f'arn:aws:sagemaker:{region}:394669845002:workteam/public-crowd/default'\n", + "print(f'This notebook will use the work team ARN: {WORKTEAM_ARN}')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Make sure workteam arn is populated if private work team is chosen\n", + "assert (WORKTEAM_ARN != '<>')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "* The IAM execution role you used to create this notebook instance must have the following permissions: \n", + " * AWS managed policy [AmazonSageMakerGroundTruthExecution](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AmazonSageMakerGroundTruthExecution). Run the following code-block to see your IAM execution role name. This [GIF](add-policy.gif) demonstrates how to add this policy to an IAM role in the IAM console. You can also find instructions in the IAM User Guide: [Adding and removing IAM identity permissions](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html#add-policies-console).\n", + " * When you create your role, you specify Amazon S3 permissions. Make sure that your IAM role has access to the S3 bucket that you plan to use in this example. If you do not specify an S3 bucket in this notebook, the default bucket in the AWS region you are running this notebook instance will be used. If you do not require granular permissions, you can attach [AmazonS3FullAccess](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AmazonS3FullAccess) to your role." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "role = sagemaker.get_execution_role()\n", + "role_name = role.split('/')[-1]\n", + "print('IMPORTANT: Make sure this execution role has the AWS Managed policy AmazonGroundTruthExecution attached.')\n", + "print('********************************************************************************')\n", + "print('The IAM execution role name:', role_name)\n", + "print('The IAM execution role ARN:', role)\n", + "print('********************************************************************************')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Make sure the bucket is in the same region as this notebook.\n", + "BUCKET = '<< YOUR S3 BUCKET NAME >>'\n", + "\n", + "sess = sagemaker.Session()\n", + "s3 = boto3.client('s3')\n", + "\n", + "if(BUCKET=='<< YOUR S3 BUCKET NAME >>'):\n", + " BUCKET=sess.default_bucket()\n", + "region = boto3.session.Session().region_name\n", + "bucket_region = s3.head_bucket(Bucket=BUCKET)['ResponseMetadata']['HTTPHeaders']['x-amz-bucket-region']\n", + "assert bucket_region == region, f'Your S3 bucket {BUCKET} and this notebook need to be in the same region.'\n", + "print(f'IMPORTANT: make sure the role {role_name} has the access to read and write to this bucket.')\n", + "print('********************************************************************************************************')\n", + "print(f'This notebook will use the following S3 bucket: {BUCKET}')\n", + "print('********************************************************************************************************')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "SNS topics for Input and Output\n", + "\n", + "You can send data objects to your streaming labeling job using Amazon Simple Notification Service (Amazon SNS). Amazon SNS is a web service that coordinates and manages the delivery of messages to and from endpoints (for example, an email address or AWS Lambda function). An Amazon SNS topic acts as a communication channel between two or more endpoints. You use Amazon SNS to send, or publish, new data objects to the topic specified in the CreateLabelingJob parameter SnsTopicArn in InputConfig.\n", + "\n", + "The following cells will create a name for your labeling job and use this name to create Amazon SNS input and output topics. This labeling job name and these topics will be used in your CreateLabelingJob request later in this notebook.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Job Name\n", + "LABELING_JOB_NAME = 'GroundTruth-streaming-' + str(int(time.time()))\n", + "\n", + "print('Your labeling job name will be :', LABELING_JOB_NAME)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Make sure role has \"Sns:CreateTopic\" access\n", + "sns = boto3.client('sns')\n", + "\n", + "# Create Input Topic\n", + "input_response = sns.create_topic(Name= LABELING_JOB_NAME + '-Input')\n", + "INPUT_SNS_TOPIC = input_response['TopicArn']\n", + "print('input_sns_topic :', INPUT_SNS_TOPIC)\n", + "\n", + "# Create Output Topic\n", + "output_response = sns.create_topic(Name= LABELING_JOB_NAME + '-Output')\n", + "OUTPUT_SNS_TOPIC = output_response['TopicArn']\n", + "print('output_sns_topic :', OUTPUT_SNS_TOPIC)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Choose Labeling Job Type\n", + "\n", + "Ground Truth supports a variety of built-in task types which streamline the process of creating image, text, video, video frame, and 3D point cloud labeling jobs. You can use this notebook on *default* mode if you do not want to bring your own input data and input manifest file.\n", + "\n", + "If you have input data and an input manifest file in an S3 bucket, set `DEFAULT` to `False` and choose the **Labeling Job Task Type** you want to use below and specify the S3 URI of your input manifest file below. The S3 URI looks similar to `s3://your-bucket/path-to-input-manifest/input-manifest.manifest`. To learn more about each task type, see [Built-in Task Types](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-task-types.html)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Choose Labeling Job Built-In Task Type\n", + "\n", + "Copy one of the following task types and use it to set the value for `task_type`. If you set **`DEFAULT`** to `True`, at the beginning of this notebook, the image bounding box task type will be used by default. \n", + "\n", + "To create a custom labeling workflow, set `CUSTOM` to `True` and specify your custom lambda functions to pre-process your input data and process output data in the section **Create Custom Labeling Workflow** below." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "## Choose from following:\n", + "## Bounding Box\n", + "## Image Classification (Single Label)\n", + "## Image Classification (Multi-label)\n", + "## Image Semantic Segmentation\n", + "## Text Classification (Single Label)\n", + "## Text Classification (Multi-label)\n", + "## Named Entity Recognition\n", + "## Video Classification\n", + "## Video Frame Object Detection\n", + "## Video Frame Object Tracking\n", + "## 3D Point Cloud Object Detection\n", + "## 3D Point Cloud Object Detection\n", + "## 3D Point Cloud Semantic Segmentation\n", + "\n", + "task_type = \"<>\"\n", + "if(DEFAULT):\n", + " task_type = \"Bounding Box\"\n", + "print(f'Your task type: {task_type}')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "task_type_map = {\n", + "\"Bounding Box\" : \"BoundingBox\",\n", + "\"Image Classification (Single Label)\" : \"ImageMultiClass\",\n", + "\"Image Classification (Multi-label)\" : \"ImageMultiClassMultiLabel\",\n", + "\"Image Semantic Segmentation\" : \"SemanticSegmentation\",\n", + "\"Text Classification (Single Label)\" : \"TextMultiClass\",\n", + "\"Text Classification (Multi-label)\" : \"TextMultiClassMultiLabel\",\n", + "\"Named Entity Recognition\" : \"NamedEntityRecognition\",\n", + "\"Video Classification\" : \"VideoMultiClass\",\n", + "\"Video Frame Object Detection\" : \"VideoObjectDetection\",\n", + "\"Video Frame Object Tracking\" : \"VideoObjectTracking\",\n", + "\"3D Point Cloud Object Detection\" : \"3DPointCloudObjectDetection\",\n", + "\"3D Point Cloud Object Tracking\" : \"3DPointCloudObjectTracking\",\n", + "\"3D Point Cloud Semantic Segmentation\" : \"3DPointCloudSemanticSegmentation\"\n", + "}\n", + "\n", + "\n", + "arn_region_map = {'us-west-2': '081040173940',\n", + " 'us-east-1': '432418664414',\n", + " 'us-east-2': '266458841044',\n", + " 'eu-west-1': '568282634449',\n", + " 'eu-west-2': '487402164563',\n", + " 'ap-northeast-1': '477331159723',\n", + " 'ap-northeast-2': '845288260483',\n", + " 'ca-central-1': '918755190332',\n", + " 'eu-central-1': '203001061592',\n", + " 'ap-south-1': '565803892007',\n", + " 'ap-southeast-1': '377565633583',\n", + " 'ap-southeast-2': '454466003867'\n", + " }" + ] + }, + { + "cell_type": "code", + "execution_count": 252, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "arn:aws:lambda:us-west-2:081040173940:function:PRE-BoundingBox\n", + "arn:aws:lambda:us-west-2:081040173940:function:ACS-BoundingBox\n" + ] + } + ], + "source": [ + "task_type_suffix = task_type_map[task_type]\n", + "region_account = arn_region_map[region]\n", + "PRE_HUMAN_TASK_LAMBDA = f'arn:aws:lambda:{region}:{region_account}:function:PRE-{task_type_suffix}'\n", + "POST_ANNOTATION_LAMBDA = f'arn:aws:lambda:{region}:{region_account}:function:ACS-{task_type_suffix}' \n", + "print(PRE_HUMAN_TASK_LAMBDA)\n", + "print(POST_ANNOTATION_LAMBDA)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "3D point cloud and video frame task types have special requirements. The following variables will be used to configure your labeling job for these task types. To learn more, see the following topics in the documentation:\n", + "* [3D Point Cloud Labeling Jobs Overview](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-point-cloud-general-information.html)\n", + "* [Video Frame Labeling Job Overview](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-video-overview.html)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "point_cloud_task = re.search(r'Point Cloud', task_type) is not None\n", + "video_frame_task = re.search(r'Video Frame', task_type) is not None" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create Custom Labeling Workflow\n", + "\n", + "If you want to create a custom labeling workflow, you can create your own lambda functions to pre-process your input data and post-process the labels returned from workers. To learn more, see [Step 3: Processing with AWS Lambda](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-custom-templates-step3.html).\n", + "\n", + "To use this notebook to run a custom flow, set `CUSTOM` to `True` and specify your pre- and post-processing lambdas below." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "CUSTOM = False\n", + "if(CUSTOM):\n", + " PRE_HUMAN_TASK_LAMBDA = ''\n", + " POST_ANNOTATION_LAMBDA = ''" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Specify Labels\n", + "\n", + "You specify the labels that you want workers to use to annotate your data in a label category configuration file. When you create a 3D point cloud or video frame labeling job, you can add label category attributes to your labeling category configruation file. Workers can assign one or more attributes to annotations to give more information about that object. \n", + "\n", + "For all task types, you can use the following cell to identify the labels you use for your labeling job. To create a label category configuration file with label category attributes, see [Create a Labeling Category Configuration File with Label Category Attributes\n", + "](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-label-cat-config-attributes.html) in the Amazon SageMaker developer guide. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Add label categories of your choice\n", + "LABEL_CATEGORIES = []\n", + "\n", + "if(DEFAULT):\n", + " LABEL_CATEGORIES = ['Pedestrian', 'Street Car', 'Biker']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The following cell will create a label category configuration file using the labels specified above. \n", + "\n", + "**IMPORTANT**: Make sure you have added label categories above and they appear under `labels` when you run the following cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Specify labels and this notebook will upload and a label category configuration file to S3. \n", + "json_body = {\n", + " \"document-version\": \"2018-11-28\",\n", + " 'labels': [{'label': label} for label in LABEL_CATEGORIES]\n", + "}\n", + "with open('class_labels.json', 'w') as f:\n", + " json.dump(json_body, f)\n", + " \n", + "print(\"Your label category configuration file:\")\n", + "print(\"\\n\",json.dumps(json_body, indent=2))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "s3.upload_file('class_labels.json', BUCKET, 'class_labels.json')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "LABEL_CATEGORIES_S3_URI = f's3://{BUCKET}/class_labels.json'\n", + "print(f'You should now see class_labels.json in {LABEL_CATEGORIES_S3_URI}')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create A Worker Task Template\n", + "\n", + "Part or all of your images will be annotated by human annotators. It is essential to provide good instructions. Good instructions are:\n", + "\n", + "1. Concise. We recommend limiting verbal/textual instruction to two sentences and focusing on clear visuals.\n", + "2. Visual. In the case of object detection, we recommend providing several labeled examples with different numbers of boxes.\n", + "3. When used through the AWS Console, Ground Truth helps you create the instructions using a visual wizard. When using the API, you need to create an HTML template for your instructions. \n", + "\n", + "NOTE: If you use any images in your template (as we do), they need to be publicly accessible. You can enable public access to files in your S3 bucket through the S3 Console, as described in S3 Documentation.\n", + "\n", + "### Specify Resources Used for Human Task UI\n", + "\n", + "The human task user interface (UI) is the interface that human workers use to label your data. Depending on the type of labeling job you create, you will specify a resource that is used to generate the human task UI in the `UiConfig` parameter of `CreateLabelingJob`. \n", + "\n", + "For 3D point cloud and video frame labeling tasks, you will specify a pre-defined `HumanTaskUiARN`. For all other labeling job task types, you will specify a `UiTemplateS3Uri`. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Bounding Box Image Labeling Job (Default) \n", + "\n", + "If you set `DEFAULT` to `True`, use the following to create a worker task template and upload it to your S3 bucket. Ground Trust uses this template to generate your human task UI. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from IPython.core.display import HTML, display\n", + "\n", + "def make_template(save_fname='instructions.template'):\n", + " template = r\"\"\"\n", + " \n", + " \n", + " \n", + "\n", + "
    \n", + "
  1. Inspect the image
  2. \n", + "
  3. Determine if the specified label is/are visible in the picture.
  4. \n", + "
  5. Outline each instance of the specified label in the image using the provided “Box” tool.
  6. \n", + "
\n", + " \n", + "\n", + "
\n", + " \n", + "
    \n", + "
  • Boxes should fit tightly around each object
  • \n", + "
  • Do not include parts of the object are overlapping or that cannot be seen, even though you think you can interpolate the whole shape.
  • \n", + "
  • Avoid including shadows.
  • \n", + "
  • If the target is off screen, draw the box up to the edge of the image.
  • \n", + "
\n", + "
\n", + " \n", + "
\n", + "\n", + " \"\"\".format()\n", + " with open(save_fname, 'w') as f:\n", + " f.write(template)\n", + "if(DEFAULT): \n", + " make_template(save_fname='instructions.template')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if(DEFAULT):\n", + " result = s3.upload_file('instructions.template', BUCKET, 'instructions.template')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Image, Text, and Custom Labeling Jobs (Non Default) \n", + "\n", + "For all image and text based built-in task types, you can find a sample worker task template on that task type page. Find the page for your task type on [Built-in Task Types](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-task-types.html). You will see an example template under the section **Create a {Insert-Task-Type} Job (API)**. \n", + "\n", + "Update `` and ``. Add your template to the following code block and run the code blocks below to generate your worker task template and upload it to your S3 bucket.\n", + "\n", + "For custom labeling workflows, you can provide a custom HTML worker task template using Crowd HTML Elements. To learn more, see [Step 2: Creating your custom labeling task template](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-custom-templates-step2.html).\n", + "\n", + "Ground Trust uses this template to generate your human task UI. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from IPython.core.display import HTML, display\n", + "\n", + "def make_template(save_fname='instructions.template'):\n", + " template = r\"\"\"\n", + " <<>>\n", + " \"\"\".format()\n", + " with open(save_fname, 'w') as f:\n", + " f.write(template)\n", + "\n", + "#This will upload your template to S3 if you are not running on DEFAULT mode, and if your take type\n", + "#does not use video frames or 3D point clouds. \n", + "if(not DEFAULT and not video_frame_task and not point_cloud_task):\n", + " make_template(save_fname='instructions.html')\n", + " s3.upload_file('instructions.template', BUCKET, 'instructions.template')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 3D Point Cloud and Video Frame Task Types\n", + "\n", + "If you are creating a 3D point cloud or video frame task type, your worker UI is configured by Ground Truth. If you chose one of these task types above, the following cell will specify the correct template. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import re\n", + "\n", + "if(not DEFAULT):\n", + " if (point_cloud_task):\n", + " task_type_suffix_humanuiarn = task_type_suffix.split('3D')[-1]\n", + " HUMAN_UI_ARN = f'arn:aws:sagemaker:{region}:394669845002:human-task-ui/{task_type_suffix_humanuiarn}'\n", + " if (video_frame_task):\n", + " HUMAN_UI_ARN = f'arn:aws:sagemaker:{region}:394669845002:human-task-ui/{task_type_suffix}'\n", + " print(f'The Human Task UI ARN is: {HUMAN_UI_ARN}')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## (Optional) Create an Input Manifest File\n", + "\n", + "You can optionally specify an input manifest file Amazon S3 URI in ManifestS3Uri when you create the streaming labeling job. Ground Truth sends each data object in the manifest file to workers for labeling as soon as the labeling job starts.\n", + "\n", + "Each line in an input manifest file is an entry containing an object, or a reference to an object, to label. An entry can also contain labels from previous jobs and for some task types, additional information.\n", + "\n", + "To learn how to create an input manifest file, see [Use an Input Manifest File](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-input-data-input-manifest.html). Copy the S3 URI of the file below." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# [Optional] The path in Amazon S3 to your input manifest file. \n", + "INPUT_MANIFEST = ''" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Specify Parameters for Labeling Job\n", + "\n", + "If you set `DEFAULT` to `False`, you must specify the following parameters. These will be used to configure and create your lableing job. If you set `DEFAULT` to `True`, default parameters will be used.\n", + "\n", + "To learn more about these parameters, use the following documentation:\n", + "* [TaskTitle](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HumanTaskConfig.html#sagemaker-Type-HumanTaskConfig-TaskTitle)\n", + "* [TaskDescription](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HumanTaskConfig.html#sagemaker-Type-HumanTaskConfig-TaskDescription)\n", + "* [TaskKeywords](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HumanTaskConfig.html#sagemaker-Type-HumanTaskConfig-TaskKeywords)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "TASK_TITLE = '<>'\n", + "\n", + "if(DEFAULT):\n", + " TASK_TITLE = 'Add bounding boxes to detect objects in an image'\n", + " \n", + "TASK_DESCRIPTION = '<>'\n", + "if(DEFAULT):\n", + " TASK_DESCRIPTION = 'Categorize images into classes using bounding boxes' \n", + "\n", + "# Keywords for your task, in a string-array. ex) ['image classification', 'image dataset']\n", + "TASK_KEYWORDS = ['<>']\n", + "if(DEFAULT):\n", + " TASK_KEYWORDS = ['bounding box', 'image dataset']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# The path in Amazon S3 to your worker task template or human task UI\n", + "HUMAN_UI = []\n", + "if(point_cloud_task or video_frame_task):\n", + " HUMAN_TASK_UI_ARN = HUMAN_UI_ARN\n", + " HUMAN_UI.append(HUMAN_TASK_UI_ARN)\n", + " UI_CONFIG_PARAM = 'HumanTaskUiArn'\n", + "else:\n", + " UI_TEMPLATE_S3_URI = f's3://{BUCKET}/instructions.template'\n", + " HUMAN_UI.append(UI_TEMPLATE_S3_URI)\n", + " UI_CONFIG_PARAM = 'UiTemplateS3Uri'\n", + " \n", + "print(f'{UI_CONFIG_PARAM} resource that will be used: {HUMAN_UI[0]}')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# The ARN for your SNS input topic.\n", + "INPUT_TOPIC_ARN = INPUT_SNS_TOPIC\n", + "\n", + "# The ARN for your SNS output topic.\n", + "OUTPUT_TOPIC_ARN = OUTPUT_SNS_TOPIC\n", + "\n", + "# If you want to store your output manifest in a different folder, provide an OUTPUT_PATH. \n", + "OUTPUT_FOLDER_PREFIX = '/gt-streaming-demo-output'\n", + "OUTPUT_BUCKET = 's3://' + BUCKET + OUTPUT_FOLDER_PREFIX\n", + "print(\"Your output data will be stored in:\", OUTPUT_BUCKET)\n", + "\n", + "# An IAM role with AmazonGroundTruthExecution policies attached.\n", + "# This must be the same role that you used to create this notebook instance. \n", + "ROLE_ARN = role" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Use the CreateLabelingJob API to create a streaming labeling job [Job 1]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if(re.search(r'Semantic Segmentation', task_type) is not None or re.match(r'Object Tracking', task_type) is not None or video_frame_task):\n", + " LABEL_ATTRIBUTE_NAME = LABELING_JOB_NAME + '-ref'\n", + "else:\n", + " LABEL_ATTRIBUTE_NAME = LABELING_JOB_NAME\n", + "\n", + "human_task_config = {\n", + " \"PreHumanTaskLambdaArn\": PRE_HUMAN_TASK_LAMBDA,\n", + " \"MaxConcurrentTaskCount\": 100, # Maximum of 100 objects will be available to the workteam at any time\n", + " \"NumberOfHumanWorkersPerDataObject\": 1, # We will obtain and consolidate 1 human annotationsfor each image.\n", + " \"TaskAvailabilityLifetimeInSeconds\": 21600, # Your workteam has 6 hours to complete all pending tasks.\n", + " \"TaskDescription\": TASK_DESCRIPTION,\n", + " # If using public workforce, specify \"PublicWorkforceTaskPrice\"\n", + " \"WorkteamArn\": WORKTEAM_ARN,\n", + " \"AnnotationConsolidationConfig\": { \n", + " \"AnnotationConsolidationLambdaArn\": POST_ANNOTATION_LAMBDA\n", + " },\n", + " \"TaskKeywords\": TASK_KEYWORDS,\n", + " \"TaskTimeLimitInSeconds\": 600, # Each image must be labeled within 10 minutes.\n", + " \"TaskTitle\": TASK_TITLE,\n", + " \"UiConfig\": {\n", + " UI_CONFIG_PARAM : HUMAN_UI[0]\n", + " }\n", + "}\n", + "\n", + "#if you are using the Amazon Mechanical Turk workforce, specify the amount you want to pay a \n", + "#worker to label a data object. See https://aws.amazon.com/sagemaker/groundtruth/pricing/ for recommendations. \n", + "if (not private_work_team):\n", + " human_task_config[\"PublicWorkforceTaskPrice\"] = {\n", + " \"AmountInUsd\": {\n", + " \"Dollars\": 0,\n", + " \"Cents\": 3,\n", + " \"TenthFractionsOfACent\": 6,\n", + " }\n", + " } \n", + " human_task_config[\"WorkteamArn\"] = WORKTEAM_ARN\n", + "else:\n", + " human_task_config[\"WorkteamArn\"] = WORKTEAM_ARN\n", + "\n", + "ground_truth_request = {\n", + " \"InputConfig\": {\n", + " \"DataSource\": {\n", + " \"SnsDataSource\": {\n", + " \"SnsTopicArn\": INPUT_TOPIC_ARN\n", + " }\n", + " }\n", + " },\n", + " \"HumanTaskConfig\" : human_task_config,\n", + " \"LabelAttributeName\": LABEL_ATTRIBUTE_NAME,\n", + " \"LabelCategoryConfigS3Uri\" : LABEL_CATEGORIES_S3_URI,\n", + " \"LabelingJobName\": LABELING_JOB_NAME,\n", + " \"OutputConfig\": {\n", + " \"S3OutputPath\": OUTPUT_BUCKET,\n", + " \"SnsTopicArn\": OUTPUT_TOPIC_ARN\n", + " },\n", + " \"RoleArn\": ROLE_ARN\n", + "}\n", + "\n", + "if(INPUT_MANIFEST is not ''):\n", + " ground_truth_request[\"InputConfig\"][\"DataSource\"][\"S3DataSource\"] = {\"ManifestS3Uri\": INPUT_MANIFEST}\n", + " " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### DataAttributes\n", + "You should not share explicit, confidential, or personal information or protected health information with the Amazon Mechanical Turk workforce. \n", + "\n", + "If you are using Amazon Mechanical Turk workforce, you must verify that your data is free of personal, confidential, and explicit content and protected health information using this code cell. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if (not private_work_team):\n", + " ground_truth_request[\"InputConfig\"][\"DataAttributes\"]={\"ContentClassifiers\": [\"FreeOfPersonallyIdentifiableInformation\",\"FreeOfAdultContent\"]}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(\"Your create labeling job request:\\n\",json.dumps(ground_truth_request,indent=4))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sagemaker_client = boto3.client('sagemaker')\n", + "sagemaker_client.create_labeling_job(**ground_truth_request)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Use the DescribeLabelingJob API to describe a streaming labeling job" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sagemaker_client.describe_labeling_job(LabelingJobName=LABELING_JOB_NAME)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Wait until the labeling job status equals InProgress before moving forward in this notebook" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sagemaker_client.describe_labeling_job(LabelingJobName=LABELING_JOB_NAME)['LabelingJobStatus']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Check for LabelingJobStatus and interpreting describe response\n", + "\n", + "* If you specified \"S3DataSource.ManifestS3Uri\" in the above request, the objects in the S3 file will automatically make their way to the labeling job. You will see counters incrementing from the objects from the file. \n", + "* Streaming jobs create a SQS queue in your account. You can check for existence of the queue by name \"GroundTruth-LABELING_JOB_NAME\" via console or through below command" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sqs = boto3.client('sqs')\n", + "response = sqs.get_queue_url(QueueName='GroundTruth-' + LABELING_JOB_NAME.lower())\n", + "print(\"Queue url is :\", response['QueueUrl'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Job 2 Setup\n", + "\n", + "Use the following section to set up your second labeling job. This labeling job will be chained to the first job that you set up above. This means the output data from the first labeling job will be sent to this labeling job as input data. \n", + "\n", + "Bounding box, semantic segmentation, and all video frame and 3D point cloud labeling job types support an *adjustment* task which you can use to have worker modify and add to the annotations created in the first labeling job for that respective task type. You can select one of these adjustment task types below. \n", + "\n", + "If you do not choose an adjustment task type, the output data from this second job will contain any new labels that workers add, as well as the labels added in the first labeling job. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Job Name\n", + "LABELING_JOB_NAME2 = 'GroundTruth-streaming-' + str(int(time.time()))\n", + "\n", + "print('Your labeling job 2 name will be :', LABELING_JOB_NAME2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## SNS topics for Input and Output for Job 2\n", + "\n", + "Input SNS topic for Job 2 is same as Output SNS topic of Job 1. This is how we will set up chaining.\n", + "We will create Output SNS topic.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Create Input Topic\n", + "\n", + "# Output topic of Job 1\n", + "INPUT_SNS_TOPIC2 = OUTPUT_SNS_TOPIC\n", + "print('input_sns_topic of Job 2:', INPUT_SNS_TOPIC2)\n", + "\n", + "# Create Output Topic\n", + "output_response = sns.create_topic(Name= LABELING_JOB_NAME2 + '-Output')\n", + "OUTPUT_SNS_TOPIC2 = output_response['TopicArn']\n", + "print('output_sns_topic of Job 2:', OUTPUT_SNS_TOPIC2)\n", + "\n", + "# The ARN for your SNS input topic.\n", + "INPUT_TOPIC_ARN2 = INPUT_SNS_TOPIC2\n", + "\n", + "# The ARN for your SNS output topic.\n", + "OUTPUT_TOPIC_ARN2 = OUTPUT_SNS_TOPIC2\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "## Choose Labeling Job Type [Job2]\n", + "\n", + "Ground Truth supports a variety of built-in task types which streamline the process of creating image, text, video, video frame, and 3D point cloud labeling jobs. You can use this notebook on *default* mode if you do not want to bring your own input data and input manifest file.\n", + "\n", + "If you have input data and an input manifest file in an S3 bucket, set `DEFAULT` to `False` and choose the **Labeling Job Task Type** you want to use below and specify the S3 URI of your input manifest file below. The S3 URI looks similar to `s3://your-bucket/path-to-input-manifest/input-manifest.manifest`. To learn more about each task type, see [Built-in Task Types](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-task-types.html)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Choose Labeling Job Built-In Task Type\n", + "\n", + "Copy one of the following task types and use it to set the value for `task_type`. If you set **`DEFAULT`** to `True`, at the beginning of this notebook, the image bounding box task type will be used by default. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "## Choose from following:\n", + "## Bounding Box\n", + "## Image Classification (Single Label)\n", + "## Image Classification (Multi-label)\n", + "## Image Semantic Segmentation\n", + "## Text Classification (Single Label)\n", + "## Text Classification (Multi-label)\n", + "## Named Entity Recognition\n", + "## Video Classification\n", + "## Video Frame Object Detection\n", + "## Video Frame Object Tracking\n", + "## 3D Point Cloud Object Detection\n", + "## 3D Point Cloud Object Detection\n", + "## 3D Point Cloud Semantic Segmentation\n", + "## 3D Point Cloud Semantic Segmentation\n", + "## Adjustment Semantic Segmentation\n", + "## Verification Semantic Segmentation\n", + "## Verification Bounding Box\n", + "## Adjustment Bounding Box\n", + "## Adjustment Video Object Detection\n", + "## Adjustment Video Object Tracking\n", + "## Adjustment 3D Point Cloud Object Detection\n", + "## Adjustment 3D Point Cloud Object Tracking\n", + "## Adjustment 3D Point Cloud Semantic Segmentation\n", + "\n", + "task_type2 = \"<>\"\n", + "if(DEFAULT):\n", + " task_type2 = \"Adjustment Bounding Box\"\n", + "print(f'Your task type: {task_type2}')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The following cells will configure the lambda functions Ground Truth uses to pre-process your input data and output data. These cells will configure your PreHumanTaskLambdaArn and AnnotationConsolidationLambdaArn.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "task_type_map2 = {\n", + "\"Bounding Box\" : \"BoundingBox\",\n", + "\"Image Classification (Single Label)\" : \"ImageMultiClass\",\n", + "\"Image Classification (Multi-label)\" : \"ImageMultiClassMultiLabel\",\n", + "\"Image Semantic Segmentation\" : \"SemanticSegmentation\",\n", + "\"Text Classification (Single Label)\" : \"TextMultiClass\",\n", + "\"Text Classification (Multi-label)\" : \"TextMultiClassMultiLabel\",\n", + "\"Named Entity Recognition\" : \"NamedEntityRecognition\",\n", + "\"Video Classification\" : \"VideoMultiClass\",\n", + "\"Video Frame Object Detection\" : \"VideoObjectDetection\",\n", + "\"Video Frame Object Tracking\" : \"VideoObjectTracking\",\n", + "\"3D Point Cloud Object Detection\" : \"3DPointCloudObjectDetection\",\n", + "\"3D Point Cloud Object Tracking\" : \"3DPointCloudObjectTracking\",\n", + "\"3D Point Cloud Semantic Segmentation\" : \"3DPointCloudSemanticSegmentation\",\n", + "\"Adjustment Semantic Segmentation\" : \"AdjustmentSemanticSegmentation\",\n", + "\"Verification Semantic Segmentation\" : \"VerificationSemanticSegmentation\",\n", + "\"Verification Bounding Box\" : \"VerificationBoundingBox\",\n", + "\"Adjustment Bounding Box\" : \"AdjustmentBoundingBox\",\n", + "\"Adjustment Video Object Detection\" : \"AdjustmentVideoObjectDetection\",\n", + "\"Adjustment Video Object Tracking\" : \"AdjustmentVideoObjectTracking\",\n", + "\"Adjustment 3D Point Cloud Object Detection\" : \"Adjustment3DPointCloudObjectDetection\",\n", + "\"Adjustment 3D Point Cloud Object Tracking\" : \"Adjustment3DPointCloudObjectTracking\",\n", + "\"Adjustment 3D Point Cloud Semantic Segmentation\" : \"Adjustment3DPointCloudSemanticSegmentation\",\n", + " \n", + "}\n", + "\n", + "\n", + "arn_region_map = {'us-west-2': '081040173940',\n", + " 'us-east-1': '432418664414',\n", + " 'us-east-2': '266458841044',\n", + " 'eu-west-1': '568282634449',\n", + " 'eu-west-2': '487402164563',\n", + " 'ap-northeast-1': '477331159723',\n", + " 'ap-northeast-2': '845288260483',\n", + " 'ca-central-1': '918755190332',\n", + " 'eu-central-1': '203001061592',\n", + " 'ap-south-1': '565803892007',\n", + " 'ap-southeast-1': '377565633583',\n", + " 'ap-southeast-2': '454466003867'\n", + " }" + ] + }, + { + "cell_type": "code", + "execution_count": 253, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "arn:aws:lambda:us-west-2:081040173940:function:PRE-AdjustmentBoundingBox\n", + "arn:aws:lambda:us-west-2:081040173940:function:ACS-AdjustmentBoundingBox\n" + ] + } + ], + "source": [ + "task_type_suffix2 = task_type_map2[task_type2]\n", + "region_account = arn_region_map[region]\n", + "PRE_HUMAN_TASK_LAMBDA2 = f'arn:aws:lambda:{region}:{region_account}:function:PRE-{task_type_suffix2}'\n", + "POST_ANNOTATION_LAMBDA2 = f'arn:aws:lambda:{region}:{region_account}:function:ACS-{task_type_suffix2}' \n", + "print(PRE_HUMAN_TASK_LAMBDA2)\n", + "print(POST_ANNOTATION_LAMBDA2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "3D point cloud and video frame task types have special requirements. The following variables will be used to configure your labeling job for these task types. To learn more, see the following topics in the documentation:\n", + "* [3D Point Cloud Labeling Jobs Overview](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-point-cloud-general-information.html)\n", + "* [Video Frame Labeling Job Overview](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-video-overview.html)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "point_cloud_task = re.search(r'Point Cloud', task_type) is not None\n", + "video_frame_task = re.search(r'Video Frame', task_type) is not None" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create Custom Labeling Workflow\n", + "\n", + "If you want to create a custom labeling workflow, you can create your own lambda functions to pre-process your input data and post-process the labels returned from workers. To learn more, see [Step 3: Processing with AWS Lambda](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-custom-templates-step3.html).\n", + "\n", + "To use this notebook to run a custom flow, set `CUSTOM` to `True` and specify your pre- and post-processing lambdas below." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "CUSTOM = False\n", + "if(CUSTOM):\n", + " PRE_HUMAN_TASK_LAMBDA2 = ''\n", + " POST_ANNOTATION_LAMBDA2 = ''" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Specify Labels for Job 2\n", + "\n", + "You specify the labels that you want workers to use to annotate your data in a label category configuration file. When you create a 3D point cloud or video frame labeling job, you can add label category attributes to your labeling category configruation file. Workers can assign one or more attributes to annotations to give more information about that object. \n", + "\n", + "For all task types, you can use the following cell to identify the labels you use for your labeling job. To create a label category configuration file with label category attributes, see [Create a Labeling Category Configuration File with Label Category Attributes\n", + "](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-label-cat-config-attributes.html) in the Amazon SageMaker developer guide. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Add label categories of your choice\n", + "LABEL_CATEGORIES = []\n", + "if(DEFAULT):\n", + " LABEL_CATEGORIES = ['Pedestrian', 'Street Car', 'Biker']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The following cell will create a label category configuration file using the labels specified above. \n", + "\n", + "**IMPORTANT**: Make sure you have added label categories above and they appear under `labels` when you run the following cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Specify labels and this notebook will upload and a label category configuration file to S3. \n", + "json_body = {\n", + " \"document-version\": \"2018-11-28\",\n", + " 'labels': [{'label': label} for label in LABEL_CATEGORIES]\n", + "}\n", + "with open('class_labels2.json', 'w') as f:\n", + " json.dump(json_body, f)\n", + " \n", + "print(\"Your label category configuration file:\")\n", + "print(\"\\n\",json.dumps(json_body, indent=2))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "s3.upload_file('class_labels2.json', BUCKET, 'class_labels2.json')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "LABEL_CATEGORIES_S3_URI2 = f's3://{BUCKET}/class_labels2.json'\n", + "print(f'You should now see class_labels2.json in {LABEL_CATEGORIES_S3_URI2}')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create A Worker Task Template for Job 2\n", + "\n", + "Part or all of your images will be annotated by human annotators. It is essential to provide good instructions. Good instructions are:\n", + "\n", + "1. Concise. We recommend limiting verbal/textual instruction to two sentences and focusing on clear visuals.\n", + "2. Visual. In the case of object detection, we recommend providing several labeled examples with different numbers of boxes.\n", + "3. When used through the AWS Console, Ground Truth helps you create the instructions using a visual wizard. When using the API, you need to create an HTML template for your instructions. \n", + "\n", + "NOTE: If you use any images in your template (as we do), they need to be publicly accessible. You can enable public access to files in your S3 bucket through the S3 Console, as described in S3 Documentation.\n", + "\n", + "### Specify Resources Used for Human Task UI\n", + "\n", + "The human task user interface (UI) is the interface that human workers use to label your data. Depending on the type of labeling job you create, you will specify a resource that is used to generate the human task UI in the `UiConfig` parameter of `CreateLabelingJob`. \n", + "\n", + "For 3D point cloud and video frame labeling tasks, you will specify a pre-defined `HumanTaskUiARN`. For all other labeling job task types, you will specify a `UiTemplateS3Uri`. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Bounding Box Adjustment Labeling Job (Default) \n", + "\n", + "If you set `DEFAULT` to `True`, use the following to create a worker task template and upload it to your S3 bucket. Ground Trust uses this template to generate your human task UI. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from IPython.core.display import HTML, display\n", + "\n", + "def make_template(save_fname='instructions2.template'):\n", + " template = r\"\"\"\n", + " \n", + " \n", + " \n", + "\n", + "
    \n", + "
  1. Inspect the image
  2. \n", + "
  3. Determine if the specified label is/are visible in the picture.
  4. \n", + "
  5. Outline each instance of the specified label in the image using the provided “Box” tool.
  6. \n", + "
\n", + " \n", + "\n", + "
\n", + " \n", + "
    \n", + "
  • Boxes should fit tightly around each object
  • \n", + "
  • Do not include parts of the object are overlapping or that cannot be seen, even though you think you can interpolate the whole shape.
  • \n", + "
  • Avoid including shadows.
  • \n", + "
  • If the target is off screen, draw the box up to the edge of the image.
  • \n", + "
\n", + "
\n", + " \n", + "
\n", + "\n", + " \"\"\".format(label_attribute_name_from_prior_job=LABEL_ATTRIBUTE_NAME)\n", + " with open(save_fname, 'w') as f:\n", + " f.write(template)\n", + "if(DEFAULT): \n", + " make_template(save_fname='instructions2.template')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if(DEFAULT):\n", + " result = s3.upload_file('instructions2.template', BUCKET, 'instructions2.template')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Image, Text, and Custom Labeling Jobs (Non Default) \n", + "\n", + "For all image and text based built-in task types, you can find a sample worker task template on that task type page. Find the page for your task type on [Built-in Task Types](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-task-types.html). You will see an example template under the section **Create a {Insert-Task-Type} Job (API)**. \n", + "\n", + "The following template shows an example of a Semantic Segmentation adjustment job template. This template can be used to render segmentation masks from a previous labeling job, to have workers adjust or add to the mask in the new labeling job. \n", + "\n", + "For custom labeling workflows, you can provide a custom HTML worker task template using Crowd HTML Elements. To learn more, see [Step 2: Creating your custom labeling task template](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-custom-templates-step2.html).\n", + "\n", + "Ground Trust uses this template to generate your human task UI. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from IPython.core.display import HTML, display\n", + "\n", + "def make_template(save_fname='instructions2.template'):\n", + " template = r\"\"\"\n", + "\n", + " \n", + " \n", + "
  1. Read the task carefully and inspect the image.
  2. Read the options and review the examples provided to understand more about the labels.
  3. Choose the appropriate label that best suits the image.
\n", + "
\n", + " \n", + "

Good example

Enter description to explain a correctly done segmentation

Bad example

Enter description of an incorrectly done segmentation

\n", + "
\n", + "
\n", + "
\"\"\".format(label_attribute_name_from_prior_job=LABEL_ATTRIBUTE_NAME)\n", + " with open(save_fname, 'w') as f:\n", + " f.write(template)\n", + "\n", + "#This will upload your template to S3 if you are not running on DEFAULT mode, and if your take type\n", + "#does not use video frames or 3D point clouds. \n", + "if(not DEFAULT and not video_frame_task and not point_cloud_task):\n", + " make_template(save_fname='instructions2.template')\n", + " s3.upload_file('instructions2.template', BUCKET, 'instructions2.template')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 3D Point Cloud and Video Frame Task Types\n", + "\n", + "If you are creating a 3D point cloud or video frame task type, your worker UI is configured by Ground Truth. If you chose one of these task types above, the following cell will specify the correct template. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import re\n", + "\n", + "if(not DEFAULT):\n", + " if (point_cloud_task):\n", + " task_type_suffix_humanuiarn = task_type_suffix.split('3D')[-1]\n", + " HUMAN_UI_ARN2 = f'arn:aws:sagemaker:{region}:394669845002:human-task-ui/{task_type_suffix_humanuiarn}'\n", + " if (video_frame_task):\n", + " HUMAN_UI_ARN2 = f'arn:aws:sagemaker:{region}:394669845002:human-task-ui/{task_type_suffix}'\n", + " print(f'The Human Task UI ARN is: {HUMAN_UI_ARN2}')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## (Optional) Create an Input Manifest File\n", + "\n", + "You can optionally specify an input manifest file Amazon S3 URI in ManifestS3Uri when you create the streaming labeling job. Ground Truth sends each data object in the manifest file to workers for labeling as soon as the labeling job starts.\n", + "\n", + "Each line in an input manifest file is an entry containing an object, or a reference to an object, to label. An entry can also contain labels from previous jobs and for some task types, additional information.\n", + "\n", + "To learn how to create an input manifest file, see [Use an Input Manifest File](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-input-data-input-manifest.html). Copy the S3 URI of the file below.\n", + "\n", + "As Job 2 is a chained job, you can connect Output manifest of Job 1 to Job 2. In this case, you may get objects same objects in Job 2 from both output SNS and output S3 of Job 1. These will be considered duplicates and ignored if idempotency key is the same.\n", + "For simplicity, leave this field blank unless you really need it!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# [Optional] The path in Amazon S3 to your input manifest file. \n", + "INPUT_MANIFEST = ''" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Specify Parameters for Labeling Job 2\n", + "\n", + "If you set `DEFAULT` to `False`, you must specify the following parameters. These will be used to configure and create your lableing job. If you set `DEFAULT` to `True`, default parameters will be used.\n", + "\n", + "To learn more about these parameters, use the following documentation:\n", + "* [TaskTitle](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HumanTaskConfig.html#sagemaker-Type-HumanTaskConfig-TaskTitle)\n", + "* [TaskDescription](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HumanTaskConfig.html#sagemaker-Type-HumanTaskConfig-TaskDescription)\n", + "* [TaskKeywords](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HumanTaskConfig.html#sagemaker-Type-HumanTaskConfig-TaskKeywords)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "TASK_TITLE2 = '<>'\n", + "\n", + "if(DEFAULT):\n", + " TASK_TITLE2 = 'Adjust Bounding boxes around objects'\n", + " \n", + "TASK_DESCRIPTION2 = '<>'\n", + "\n", + "if(DEFAULT):\n", + " TASK_DESCRIPTION2 = 'Adjust bounding boxes around specified objects in your images' \n", + "\n", + "# Keywords for your task, in a string-array. ex) ['image classification', 'image dataset']\n", + "TASK_KEYWORDS2 = ['<>']\n", + "\n", + "if(DEFAULT):\n", + " TASK_KEYWORDS2 = ['bounding box', 'image dataset']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# The path in Amazon S3 to your worker task template or human task UI\n", + "HUMAN_UI2 = []\n", + "if(point_cloud_task or video_frame_task):\n", + " HUMAN_TASK_UI_ARN2 = HUMAN_UI_ARN2\n", + " HUMAN_UI2.append(HUMAN_TASK_UI_ARN2)\n", + " UI_CONFIG_PARAM = 'HumanTaskUiArn'\n", + "else:\n", + " UI_TEMPLATE_S3_URI2 = f's3://{BUCKET}/instructions2.template'\n", + " HUMAN_UI2.append(UI_TEMPLATE_S3_URI2)\n", + " UI_CONFIG_PARAM = 'UiTemplateS3Uri'\n", + " \n", + "print(f'{UI_CONFIG_PARAM} resource that will be used: {HUMAN_UI2[0]}')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# If you want to store your output manifest in a different folder, provide an OUTPUT_PATH. \n", + "OUTPUT_FOLDER_PREFIX = '/gt-streaming-demo-output'\n", + "OUTPUT_BUCKET = 's3://' + BUCKET + OUTPUT_FOLDER_PREFIX\n", + "print(\"Your output data will be stored in:\", OUTPUT_BUCKET)\n", + "\n", + "# An IAM role with AmazonGroundTruthExecution policies attached.\n", + "# This must be the same role that you used to create this notebook instance. \n", + "ROLE_ARN = role" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Use the CreateLabelingJob API to Create a 2nd Streaming Labeling Job" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if(re.search(r'Semantic Segmentation', task_type) is not None or re.match(r'Object Tracking', task_type) is not None or video_frame_task):\n", + " LABEL_ATTRIBUTE_NAME2 = LABELING_JOB_NAME2 + '-ref'\n", + "else:\n", + " LABEL_ATTRIBUTE_NAME2 = LABELING_JOB_NAME2\n", + "\n", + "human_task_config = {\n", + " \"PreHumanTaskLambdaArn\": PRE_HUMAN_TASK_LAMBDA2,\n", + " \"MaxConcurrentTaskCount\": 100, # Maximum of 100 objects will be available to the workteam at any time\n", + " \"NumberOfHumanWorkersPerDataObject\": 1, # We will obtain and consolidate 1 human annotationsfor each image.\n", + " \"TaskAvailabilityLifetimeInSeconds\": 21600, # Your workteam has 6 hours to complete all pending tasks.\n", + " \"TaskDescription\": TASK_DESCRIPTION2,\n", + " # If using public workforce, specify \"PublicWorkforceTaskPrice\"\n", + " \"WorkteamArn\": WORKTEAM_ARN,\n", + " \"AnnotationConsolidationConfig\": { \n", + " \"AnnotationConsolidationLambdaArn\": POST_ANNOTATION_LAMBDA2\n", + " },\n", + " \"TaskKeywords\": TASK_KEYWORDS2,\n", + " \"TaskTimeLimitInSeconds\": 600, # Each image must be labeled within 10 minutes.\n", + " \"TaskTitle\": TASK_TITLE2,\n", + " \"UiConfig\": {\n", + " UI_CONFIG_PARAM : HUMAN_UI2[0]\n", + " }\n", + "}\n", + "\n", + "#if you are using the Amazon Mechanical Turk workforce, specify the amount you want to pay a \n", + "#worker to label a data object. See https://aws.amazon.com/sagemaker/groundtruth/pricing/ for recommendations. \n", + "if (not private_work_team):\n", + " human_task_config[\"PublicWorkforceTaskPrice\"] = {\n", + " \"AmountInUsd\": {\n", + " \"Dollars\": 0,\n", + " \"Cents\": 3,\n", + " \"TenthFractionsOfACent\": 6,\n", + " }\n", + " } \n", + " human_task_config[\"WorkteamArn\"] = WORKTEAM_ARN\n", + "else:\n", + " human_task_config[\"WorkteamArn\"] = WORKTEAM_ARN\n", + "\n", + "ground_truth_request2 = {\n", + " \"InputConfig\": {\n", + " \"DataSource\": {\n", + " \"SnsDataSource\": {\n", + " \"SnsTopicArn\": INPUT_TOPIC_ARN2\n", + " }\n", + " }\n", + " },\n", + " \"HumanTaskConfig\" : human_task_config,\n", + " \"LabelAttributeName\": LABEL_ATTRIBUTE_NAME2,\n", + " \"LabelCategoryConfigS3Uri\" : LABEL_CATEGORIES_S3_URI2,\n", + " \"LabelingJobName\": LABELING_JOB_NAME2,\n", + " \"OutputConfig\": {\n", + " \"S3OutputPath\": OUTPUT_BUCKET,\n", + " \"SnsTopicArn\": OUTPUT_TOPIC_ARN2\n", + " },\n", + " \"RoleArn\": ROLE_ARN\n", + "}\n", + "\n", + "if(INPUT_MANIFEST is not ''):\n", + " ground_truth_request2[\"InputConfig\"][\"DataSource\"][\"S3DataSource\"] = {\"ManifestS3Uri\": INPUT_MANIFEST}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### DataAttributes\n", + "You should not share explicit, confidential, or personal information or protected health information with the Amazon Mechanical Turk workforce. \n", + "\n", + "If you are using Amazon Mechanical Turk workforce, you must verify that your data is free of personal, confidential, and explicit content and protected health information using this code cell. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if (not private_work_team):\n", + " ground_truth_request2[\"InputConfig\"][\"DataAttributes\"]={\"ContentClassifiers\": [\"FreeOfPersonallyIdentifiableInformation\",\"FreeOfAdultContent\"]}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(\"Your create labeling job request:\\n\",json.dumps(ground_truth_request2,indent=4))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sagemaker_client = boto3.client('sagemaker')\n", + "sagemaker_client.create_labeling_job(**ground_truth_request2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Use the DescribeLabelingJob API to describe 2nd Streaming Labeling Job" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sagemaker_client.describe_labeling_job(LabelingJobName=LABELING_JOB_NAME2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Publish a new object to your first labeling job [Job 1] once it has started\n", + "\n", + "Once you start a labeling job, you an publish a new request to it using Amazon SNS. \n", + "\n", + "### Configure your Request\n", + "\n", + "You will need to specify `REQUEST` in the following format: \n", + "\n", + "**For non-text objects**\n", + "\n", + "First, make sure that your object is located in `s3_bucket_location`\n", + "\n", + "`{\"source-ref\": \"s3_bucket_location\"}`\n", + "\n", + "**For text objects**\n", + "\n", + "`{\"source\": \"Lorem ipsum dolor sit amet\"}`\n", + "\n", + "Modify one of these examples to specify your request in the next cell. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "REQUEST = '' " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you set `Default` to `True` use the following cell upload a sample-image to your S3 bucket and send that image to labeling job. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if(DEFAULT): \n", + " !wget https://aws-ml-blog.s3.amazonaws.com/artifacts/gt-labeling-job-resources/example-image.jpg\n", + " s3.upload_file('example-image.jpg', BUCKET, 'example-image.jpg')\n", + " REQUEST = str({\"source-ref\": f\"s3://{BUCKET}/example-image.jpg\"})\n", + "print(f'Your request: {REQUEST}')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Publish Your Request\n", + "\n", + "First, check the `LabelCounters` variable for your labeling job using `DescribeLabelingJob`. After you publish your request, you'll see `Unlabeled` increases to `1` (or the number of objects you send to your labeling job)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sagemaker_client.describe_labeling_job(LabelingJobName=LABELING_JOB_NAME)['LabelCounters']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The following will publish your request to your Amazon SNS input topic." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# TopicArn is of the first job [Job 1]\n", + "print(f'Your Request: {REQUEST}\\n')\n", + "if(REQUEST != ''):\n", + " published_message = sns.publish(TopicArn=INPUT_TOPIC_ARN,Message=REQUEST)\n", + " print(f'Published Message: {published_message}')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You may need to wait 1 to 2 minutes for your request to appear in `LabelCounters`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "sagemaker_client.describe_labeling_job(LabelingJobName=LABELING_JOB_NAME)['LabelCounters']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "After your first job finishes, check the status of your chained job. You should see your request appear in `LabelCounters`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sagemaker_client.describe_labeling_job(LabelingJobName=LABELING_JOB_NAME2)['LabelCounters']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Call StopLabelingJob for your previously launched jobs\n", + "\n", + "To stop your Streaming job, call StopLabelingJob twice: with the `LABELING_JOB_NAME` and `LABELING_JOB_NAME2`.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sagemaker_client.stop_labeling_job(LabelingJobName=LABELING_JOB_NAME)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sagemaker_client.stop_labeling_job(LabelingJobName=LABELING_JOB_NAME2)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "conda_python3", + "language": "python", + "name": "conda_python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.5" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/ground_truth_labeling_jobs/ground_truth_streaming_labeling_jobs/ground_truth_create_streaming_labeling_job.ipynb b/ground_truth_labeling_jobs/ground_truth_streaming_labeling_jobs/ground_truth_create_streaming_labeling_job.ipynb new file mode 100644 index 0000000000..7e0ded2a42 --- /dev/null +++ b/ground_truth_labeling_jobs/ground_truth_streaming_labeling_jobs/ground_truth_create_streaming_labeling_job.ipynb @@ -0,0 +1,992 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Create a Ground Truth Streaming Labeling Job\n", + "\n", + "You can use a streaming labeling job to perpetually send new data objects to Amazon SageMaker Ground Truth to be labeled. Ground Truth streaming labeling jobs remain active until they are manually stopped or have been idle for more than 10 days. You can intermittently send new data objects to workers while the labeling job is active. \n", + "\n", + "Use this notebook to create a Ground Truth streaming labeling job using any of the [built-in task types](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-task-types.html). You can make necessary parameter changes for the custom workflow. You can either configure the notebook to create a labeling job using your own input data, or run the notebook on *default* mode and use provided, image input data. **To use your own input data, set `DEFAULT` to `False`**." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "DEFAULT=True" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To read more about streaming labeling jobs, see the Amazon SageMaker documentation on [Ground Truth Streaming Labeling Jobs](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-streaming-labeling-job.html). \n", + "\n", + "To learn more about each step in this notebook, refer to [Create a Streaming Labeling Job](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-streaming-create-job.html)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Get latest version of AWS python SDK" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!pip install -q --upgrade pip\n", + "!pip install awscli -q --upgrade\n", + "!pip install botocore -q --upgrade\n", + "!pip install boto3 -q --upgrade\n", + "!pip install sagemaker -q --upgrade\n", + "\n", + "# NOTE: Restart Kernel after the above command" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import boto3\n", + "import botocore\n", + "import json\n", + "import time\n", + "import sagemaker\n", + "import re" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "\n", + "You will create some of the resources you need to launch a Ground Truth streaming labeling job in this notebook. You must create the following resources before executing this notebook:\n", + "\n", + "* A work team. A work team is a group of workers that complete labeling tasks. If you want to preview the worker UI and execute the labeling task you will need to create a private work team, add yourself as a worker to this team, and provide the work team ARN below. If you do not want to use a private or vendor work team ARN, set `private_work_team` to `False` to use the Amazon Mechanical Turk workforce. To learn more about private, vendor, and Amazon Mechanical Turk workforces, see [Create and Manage Workforces\n", + "](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-workforce-management.html).\n", + " * **IMPORTANT**: 3D point cloud and video frame labeling jobs only support private and vendor workforces. If you plan to use 3D point cloud or video frame input data, specify a private or vendor workforce below for WORKTEAM_ARN. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "private_work_team = True # Set it to false if using Amazon Mechanical Turk Workforce\n", + "\n", + "if(private_work_team): \n", + " WORKTEAM_ARN = '<>'\n", + "else :\n", + " region = boto3.session.Session().region_name\n", + " WORKTEAM_ARN = f'arn:aws:sagemaker:{region}:394669845002:workteam/public-crowd/default'\n", + "print(f'This notebook will use the work team ARN: {WORKTEAM_ARN}')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Make sure workteam arn is populated if private work team is chosen\n", + "assert (WORKTEAM_ARN != '<>')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "* The IAM execution role you used to create this notebook instance must have the following permissions: \n", + " * AWS managed policy [AmazonSageMakerGroundTruthExecution](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AmazonSageMakerGroundTruthExecution). Run the following code-block to see your IAM execution role name. This [GIF](add-policy.gif) demonstrates how to add this policy to an IAM role in the IAM console. You can also find instructions in the IAM User Guide: [Adding and removing IAM identity permissions](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html#add-policies-console).\n", + " * When you create your role, you specify Amazon S3 permissions. Make sure that your IAM role has access to the S3 bucket that you plan to use in this example. If you do not specify an S3 bucket in this notebook, the default bucket in the AWS region you are running this notebook instance will be used. If you do not require granular permissions, you can attach [AmazonS3FullAccess](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AmazonS3FullAccess) to your role." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "role = sagemaker.get_execution_role()\n", + "role_name = role.split('/')[-1]\n", + "print('IMPORTANT: Make sure this execution role has the AWS Managed policy AmazonGroundTruthExecution attached.')\n", + "print('********************************************************************************')\n", + "print('The IAM execution role name:', role_name)\n", + "print('The IAM execution role ARN:', role)\n", + "print('********************************************************************************')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sess = sagemaker.Session()\n", + "BUCKET = '<< YOUR S3 BUCKET NAME >>'\n", + "if(BUCKET=='<< YOUR S3 BUCKET NAME >>'):\n", + " BUCKET=sess.default_bucket()\n", + "region = boto3.session.Session().region_name\n", + "s3 = boto3.client('s3')\n", + "# Make sure the bucket is in the same region as this notebook.\n", + "bucket_region = s3.head_bucket(Bucket=BUCKET)['ResponseMetadata']['HTTPHeaders']['x-amz-bucket-region']\n", + "assert bucket_region == region, f'Your S3 bucket {BUCKET} and this notebook need to be in the same region.'\n", + "print(f'IMPORTANT: make sure the role {role_name} has the access to read and write to this bucket.')\n", + "print('********************************************************************************************************')\n", + "print(f'This notebook will use the following S3 bucket: {BUCKET}')\n", + "print('********************************************************************************************************')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create SNS Topics for Input and Output Data " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can send data objects to your streaming labeling job using Amazon Simple Notification Service (Amazon SNS). Amazon SNS is a web service that coordinates and manages the delivery of messages to and from endpoints (for example, an email address or AWS Lambda function). An Amazon SNS topic acts as a communication channel between two or more endpoints. You use Amazon SNS to send, or publish, new data objects to the topic specified in the CreateLabelingJob parameter SnsTopicArn in InputConfig. \n", + "\n", + "The following cells will create a name for your labeling job and use this name to create Amazon SNS input and output topics. This labeling job name and these topics will be used in your `CreateLabelingJob` request later in this notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Job Name\n", + "LABELING_JOB_NAME = 'GroundTruth-streaming-' + str(int(time.time()))\n", + "print('Your labeling job name will be :', LABELING_JOB_NAME)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Make sure role has \"Sns:CreateTopic\" access\n", + "sns = boto3.client('sns')\n", + "\n", + "# Create Input Topic\n", + "input_response = sns.create_topic(Name= LABELING_JOB_NAME + '-Input')\n", + "INPUT_SNS_TOPIC_ARN = input_response['TopicArn']\n", + "print('input_sns_topic :', INPUT_SNS_TOPIC_ARN)\n", + "\n", + "# Create Output Topic\n", + "output_response = sns.create_topic(Name= LABELING_JOB_NAME + '-Output')\n", + "OUTPUT_SNS_TOPIC_ARN = output_response['TopicArn']\n", + "print('output_sns_topic :', OUTPUT_SNS_TOPIC_ARN)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Choose Labeling Job Type\n", + "\n", + "Ground Truth supports a variety of built-in task types which streamline the process of creating image, text, video, video frame, and 3D point cloud labeling jobs. You can use this notebook on *default* mode if you do not want to bring your own input data.\n", + "\n", + "If you have input data and an input manifest file in an S3 bucket, set `DEFAULT` to `False` and, optionally, choose the **Labeling Job Task Type** you want to use below. To learn more about each task type, see [Built-in Task Types](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-task-types.html)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Choose Labeling Job Built-In Task Type\n", + "\n", + "Copy one of the following task types and use it to set the value for `task_type`. If you set **`DEFAULT`** to `True`, at the beginning of this notebook, the image bounding box task type will be used by default. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "## Choose from following:\n", + "## Bounding Box\n", + "## Image Classification (Single Label)\n", + "## Image Classification (Multi-label)\n", + "## Image Semantic Segmentation\n", + "## Text Classification (Single Label)\n", + "## Text Classification (Multi-label)\n", + "## Named Entity Recognition\n", + "## Video Classification\n", + "## Video Frame Object Detection\n", + "## Video Frame Object Tracking\n", + "## 3D Point Cloud Object Detection\n", + "## 3D Point Cloud Object Detection\n", + "## 3D Point Cloud Semantic Segmentation\n", + "\n", + "task_type = \"<>\"\n", + "if(DEFAULT):\n", + " task_type = \"Bounding Box\"\n", + "print(f'Your task type: {task_type}')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The following cells will configure the lambda functions Ground Truth uses to pre-process your input data and output data. These cells will configure your [PreHumanTaskLambdaArn](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HumanTaskConfig.html#sagemaker-Type-HumanTaskConfig-PreHumanTaskLambdaArn) and [AnnotationConsolidationLambdaArn](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AnnotationConsolidationConfig.html#sagemaker-Type-AnnotationConsolidationConfig-AnnotationConsolidationLambdaArn)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "task_type_map = {\n", + "\"Bounding Box\" : \"BoundingBox\",\n", + "\"Image Classification (Single Label)\" : \"ImageMultiClass\",\n", + "\"Image Classification (Multi-label)\" : \"ImageMultiClassMultiLabel\",\n", + "\"Image Semantic Segmentation\" : \"SemanticSegmentation\",\n", + "\"Text Classification (Single Label)\" : \"TextMultiClass\",\n", + "\"Text Classification (Multi-label)\" : \"TextMultiClassMultiLabel\",\n", + "\"Named Entity Recognition\" : \"NamedEntityRecognition\",\n", + "\"Video Classification\" : \"VideoMultiClass\",\n", + "\"Video Frame Object Detection\" : \"VideoObjectDetection\",\n", + "\"Video Frame Object Tracking\" : \"VideoObjectTracking\",\n", + "\"3D Point Cloud Object Detection\" : \"3DPointCloudObjectDetection\",\n", + "\"3D Point Cloud Object Tracking\" : \"3DPointCloudObjectTracking\",\n", + "\"3D Point Cloud Semantic Segmentation\" : \"3DPointCloudSemanticSegmentation\"\n", + "}\n", + "\n", + "\n", + "arn_region_map = {'us-west-2': '081040173940',\n", + " 'us-east-1': '432418664414',\n", + " 'us-east-2': '266458841044',\n", + " 'eu-west-1': '568282634449',\n", + " 'eu-west-2': '487402164563',\n", + " 'ap-northeast-1': '477331159723',\n", + " 'ap-northeast-2': '845288260483',\n", + " 'ca-central-1': '918755190332',\n", + " 'eu-central-1': '203001061592',\n", + " 'ap-south-1': '565803892007',\n", + " 'ap-southeast-1': '377565633583',\n", + " 'ap-southeast-2': '454466003867'\n", + " }" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "task_type_suffix = task_type_map[task_type]\n", + "region_account = arn_region_map[region]\n", + "PRE_HUMAN_TASK_LAMBDA = f'arn:aws:lambda:{region}:{region_account}:function:PRE-{task_type_suffix}'\n", + "POST_ANNOTATION_LAMBDA = f'arn:aws:lambda:{region}:{region_account}:function:ACS-{task_type_suffix}'\n", + "print(PRE_HUMAN_TASK_LAMBDA)\n", + "print(POST_ANNOTATION_LAMBDA)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "3D point cloud and video frame task types have special requirements. The following variables will be used to configure your labeling job for these task types. To learn more, see the following topics in the documentation:\n", + "* [3D Point Cloud Labeling Jobs Overview](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-point-cloud-general-information.html)\n", + "* [Video Frame Labeling Job Overview](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-video-overview.html)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "point_cloud_task = re.search(r'Point Cloud', task_type) is not None\n", + "video_frame_task = re.search(r'Video Frame', task_type) is not None" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create Custom Labeling Workflow\n", + "\n", + "If you want to create a custom labeling workflow, you can create your own lambda functions to pre-process your input data and post-process the labels returned from workers. To learn more, see [Step 3: Processing with AWS Lambda](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-custom-templates-step3.html).\n", + "\n", + "To use this notebook to run a custom flow, set `CUSTOM` to `True` and specify your pre- and post-processing lambdas below." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "CUSTOM = False\n", + "if(CUSTOM):\n", + " PRE_HUMAN_TASK_LAMBDA = ''\n", + " POST_ANNOTATION_LAMBDA = ''" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Specify Labels\n", + "\n", + "You specify the labels that you want workers to use to annotate your data in a label category configuration file. When you create a 3D point cloud or video frame labeling job, you can add label category attributes to your labeling category configruation file. Workers can assign one or more attributes to annotations to give more information about that object. \n", + "\n", + "For all task types, you can use the following cell to identify the labels you use for your labeling job. To create a label category configuration file with label category attributes, see [Create a Labeling Category Configuration File with Label Category Attributes\n", + "](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-label-cat-config-attributes.html) in the Amazon SageMaker developer guide. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Add label categories of your choice\n", + "LABEL_CATEGORIES = []\n", + "if(DEFAULT):\n", + " LABEL_CATEGORIES = ['Pedestrian', 'Street Car', 'Biker']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The following cell will create a label category configuration file using the labels specified above. \n", + "\n", + "**IMPORTANT**: Make sure you have added label categories above and they appear under `labels` when you run the following cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Specify labels and this notebook will upload and a label category configuration file to S3. \n", + "json_body = {\n", + " \"document-version\": \"2018-11-28\",\n", + " 'labels': [{'label': label} for label in LABEL_CATEGORIES]\n", + "}\n", + "with open('class_labels.json', 'w') as f:\n", + " json.dump(json_body, f)\n", + " \n", + "print(\"Your label category configuration file:\")\n", + "print(\"\\n\",json.dumps(json_body, indent=2))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "s3.upload_file('class_labels.json', BUCKET, 'class_labels.json')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "LABEL_CATEGORIES_S3_URI = f's3://{BUCKET}/class_labels.json'\n", + "print(f'You should now see class_labels.json in {LABEL_CATEGORIES_S3_URI}')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create A Worker Task Template\n", + "\n", + "Part or all of your images will be annotated by human annotators. It is essential to provide good instructions. Good instructions are:\n", + "\n", + "1. Concise. We recommend limiting verbal/textual instruction to two sentences and focusing on clear visuals.\n", + "2. Visual. In the case of object detection, we recommend providing several labeled examples with different numbers of boxes.\n", + "3. When used through the AWS Console, Ground Truth helps you create the instructions using a visual wizard. When using the API, you need to create an HTML template for your instructions. \n", + "\n", + "NOTE: If you use any images in your template (as we do), they need to be publicly accessible. You can enable public access to files in your S3 bucket through the S3 Console, as described in S3 Documentation.\n", + "\n", + "### Specify Resources Used for Human Task UI\n", + "\n", + "The human task user interface (UI) is the interface that human workers use to label your data. Depending on the type of labeling job you create, you will specify a resource that is used to generate the human task UI in the `UiConfig` parameter of `CreateLabelingJob`. \n", + "\n", + "For 3D point cloud and video frame labeling tasks, you will specify a pre-defined `HumanTaskUiARN`. For all other labeling job task types, you will specify a `UiTemplateS3Uri`. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Bounding Box Image Labeling Job (Default) \n", + "\n", + "If you set `DEFAULT` to `True`, use the following to create a worker task template and upload it to your S3 bucket. Ground Trust uses this template to generate your human task UI. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from IPython.core.display import HTML, display\n", + "\n", + "def make_template(save_fname='instructions.template'):\n", + " template = r\"\"\"\n", + " \n", + " \n", + " \n", + "\n", + "
    \n", + "
  1. Inspect the image
  2. \n", + "
  3. Determine if the specified label is/are visible in the picture.
  4. \n", + "
  5. Outline each instance of the specified label in the image using the provided “Box” tool.
  6. \n", + "
\n", + " \n", + "\n", + "
\n", + " \n", + "
    \n", + "
  • Boxes should fit tightly around each object
  • \n", + "
  • Do not include parts of the object are overlapping or that cannot be seen, even though you think you can interpolate the whole shape.
  • \n", + "
  • Avoid including shadows.
  • \n", + "
  • If the target is off screen, draw the box up to the edge of the image.
  • \n", + "
\n", + "
\n", + " \n", + "
\n", + "\n", + " \"\"\".format()\n", + " with open(save_fname, 'w') as f:\n", + " f.write(template)\n", + "if(DEFAULT): \n", + " make_template(save_fname='instructions.template')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if(DEFAULT):\n", + " result = s3.upload_file('instructions.template', BUCKET, 'instructions.template')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Image, Text, and Custom Labeling Jobs (Non Default) \n", + "\n", + "For all image and text based built-in task types, you can find a sample worker task template on that task type page. Find the page for your task type on [Built-in Task Types](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-task-types.html). You will see an example template under the section **Create a {Insert-Task-Type} Job (API)**. \n", + "\n", + "Update `` and ``. Add your template to the following code block and run the code blocks below to generate your worker task template and upload it to your S3 bucket.\n", + "\n", + "For custom labeling workflows, you can provide a custom HTML worker task template using Crowd HTML Elements. To learn more, see [Step 2: Creating your custom labeling task template](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-custom-templates-step2.html).\n", + "\n", + "Ground Trust uses this template to generate your human task UI. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from IPython.core.display import HTML, display\n", + "\n", + "def make_template(save_fname='instructions.template'):\n", + " template = r\"\"\"\n", + " <<>>\n", + " \"\"\".format()\n", + " with open(save_fname, 'w') as f:\n", + " f.write(template)\n", + "\n", + "#This will upload your template to S3 if you are not running on DEFAULT mode, and if your take type\n", + "#does not use video frames or 3D point clouds. \n", + "if(not DEFAULT and not video_frame_task and not point_cloud_task):\n", + " make_template(save_fname='instructions.html')\n", + " s3.upload_file('instructions.template', BUCKET, 'instructions.template')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 3D Point Cloud and Video Frame Task Types\n", + "\n", + "If you are creating a 3D point cloud or video frame task type, your worker UI is configured by Ground Truth. If you chose one of these task types above, the following cell will specify the correct template. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "import re\n", + "\n", + "if(not DEFAULT):\n", + " if (point_cloud_task):\n", + " task_type_suffix_humanuiarn = task_type_suffix.split('3D')[-1]\n", + " HUMAN_UI_ARN = f'arn:aws:sagemaker:{region}:394669845002:human-task-ui/{task_type_suffix_humanuiarn}'\n", + " if (video_frame_task):\n", + " HUMAN_UI_ARN = f'arn:aws:sagemaker:{region}:394669845002:human-task-ui/{task_type_suffix}'\n", + " print(f'The Human Task UI ARN is: {HUMAN_UI_ARN}')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## (Optional) Create an Input Manifest File\n", + "\n", + "You can optionally specify an input manifest file Amazon S3 URI in ManifestS3Uri when you create the streaming labeling job. Ground Truth sends each data object in the manifest file to workers for labeling as soon as the labeling job starts.\n", + "\n", + "Each line in an input manifest file is an entry containing an object, or a reference to an object, to label. An entry can also contain labels from previous jobs and for some task types, additional information.\n", + "\n", + "To learn how to create an input manifest file, see [Use an Input Manifest File](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-input-data-input-manifest.html). Copy the S3 URI of the file below." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# [Optional] The path in Amazon S3 to your input manifest file. \n", + "INPUT_MANIFEST = ''" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Specify Parameters for Labeling Job\n", + "\n", + "If you set `DEFAULT` to `False`, you must specify the following parameters. These will be used to configure and create your lableing job. If you set `DEFAULT` to `True`, default parameters will be used.\n", + "\n", + "To learn more about these parameters, use the following documentation:\n", + "* [TaskTitle](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HumanTaskConfig.html#sagemaker-Type-HumanTaskConfig-TaskTitle)\n", + "* [TaskDescription](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HumanTaskConfig.html#sagemaker-Type-HumanTaskConfig-TaskDescription)\n", + "* [TaskKeywords](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HumanTaskConfig.html#sagemaker-Type-HumanTaskConfig-TaskKeywords)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "TASK_TITLE = '<>'\n", + "if(DEFAULT):\n", + " TASK_TITLE = 'Add bounding boxes to detect objects in an image'\n", + " \n", + "TASK_DESCRIPTION = '<>'\n", + "if(DEFAULT):\n", + " TASK_DESCRIPTION = 'Categorize images into classes using bounding boxes' \n", + "\n", + "# Keywords for your task, in a string-array. ex) ['image classification', 'image dataset']\n", + "TASK_KEYWORDS = ['<>']\n", + "if(DEFAULT):\n", + " TASK_KEYWORDS = ['bounding box', 'image dataset']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Run the following to specify the rest of the parameters required to configure your labeling job. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# The path in Amazon S3 to your worker task template or human task UI\n", + "HUMAN_UI = []\n", + "if(point_cloud_task or video_frame_task):\n", + " HUMAN_TASK_UI_ARN = HUMAN_UI_ARN\n", + " HUMAN_UI.append(HUMAN_TASK_UI_ARN)\n", + " UI_CONFIG_PARAM = 'HumanTaskUiArn'\n", + "else:\n", + " UI_TEMPLATE_S3_URI = f's3://{BUCKET}/instructions.template'\n", + " HUMAN_UI.append(UI_TEMPLATE_S3_URI)\n", + " UI_CONFIG_PARAM = 'UiTemplateS3Uri'\n", + " \n", + "print(f'{UI_CONFIG_PARAM} resource that will be used: {HUMAN_UI[0]}')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# If you want to store your output manifest in a different folder, provide an OUTPUT_PATH. \n", + "OUTPUT_FOLDER_PREFIX = '/gt-streaming-demo-output'\n", + "OUTPUT_BUCKET = 's3://' + BUCKET + OUTPUT_FOLDER_PREFIX\n", + "print(\"Your output data will be stored in:\", OUTPUT_BUCKET)\n", + "\n", + "# An IAM role with AmazonGroundTruthExecution policies attached.\n", + "# This must be the same role that you used to create this notebook instance. \n", + "ROLE_ARN = role" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Use the CreateLabelingJob API to create a streaming labeling job" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if(re.search(r'Semantic Segmentation', task_type) is not None or re.match(r'Object Tracking', task_type) is not None or video_frame_task):\n", + " LABEL_ATTRIBUTE_NAME = LABELING_JOB_NAME + '-ref'\n", + "else:\n", + " LABEL_ATTRIBUTE_NAME = LABELING_JOB_NAME\n", + "\n", + "human_task_config = {\n", + " \"PreHumanTaskLambdaArn\": PRE_HUMAN_TASK_LAMBDA,\n", + " \"MaxConcurrentTaskCount\": 100, # Maximum of 100 objects will be available to the workteam at any time\n", + " \"NumberOfHumanWorkersPerDataObject\": 1, # We will obtain and consolidate 1 human annotationsfor each image.\n", + " \"TaskAvailabilityLifetimeInSeconds\": 21600, # Your workteam has 6 hours to complete all pending tasks.\n", + " \"TaskDescription\": TASK_DESCRIPTION,\n", + " # If using public workforce, specify \"PublicWorkforceTaskPrice\"\n", + " \"WorkteamArn\": WORKTEAM_ARN,\n", + " \"AnnotationConsolidationConfig\": { \n", + " \"AnnotationConsolidationLambdaArn\": POST_ANNOTATION_LAMBDA\n", + " },\n", + " \"TaskKeywords\": TASK_KEYWORDS,\n", + " \"TaskTimeLimitInSeconds\": 600, # Each image must be labeled within 10 minutes.\n", + " \"TaskTitle\": TASK_TITLE,\n", + " \"UiConfig\": {\n", + " UI_CONFIG_PARAM : HUMAN_UI[0]\n", + " }\n", + "}\n", + "\n", + "#if you are using the Amazon Mechanical Turk workforce, specify the amount you want to pay a \n", + "#worker to label a data object. See https://aws.amazon.com/sagemaker/groundtruth/pricing/ for recommendations. \n", + "if (not private_work_team):\n", + " human_task_config[\"PublicWorkforceTaskPrice\"] = {\n", + " \"AmountInUsd\": {\n", + " \"Dollars\": 0,\n", + " \"Cents\": 3,\n", + " \"TenthFractionsOfACent\": 6,\n", + " }\n", + " } \n", + " human_task_config[\"WorkteamArn\"] = WORKTEAM_ARN\n", + "else:\n", + " human_task_config[\"WorkteamArn\"] = WORKTEAM_ARN\n", + "\n", + "ground_truth_request = {\n", + " \"InputConfig\": {\n", + " \"DataSource\": {\n", + " \"SnsDataSource\": {\n", + " \"SnsTopicArn\": INPUT_SNS_TOPIC_ARN\n", + " }\n", + " }\n", + " },\n", + " \"HumanTaskConfig\" : human_task_config,\n", + " \"LabelAttributeName\": LABEL_ATTRIBUTE_NAME,\n", + " \"LabelCategoryConfigS3Uri\" : LABEL_CATEGORIES_S3_URI,\n", + " \"LabelingJobName\": LABELING_JOB_NAME,\n", + " \"OutputConfig\": {\n", + " \"S3OutputPath\": OUTPUT_BUCKET,\n", + " \"SnsTopicArn\": OUTPUT_SNS_TOPIC_ARN\n", + " },\n", + " \"RoleArn\": ROLE_ARN\n", + "}\n", + "\n", + "if(INPUT_MANIFEST is not ''):\n", + " ground_truth_request[\"InputConfig\"][\"DataSource\"][\"S3DataSource\"] = {\"ManifestS3Uri\": INPUT_MANIFEST}\n", + " \n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### DataAttributes\n", + "You should not share explicit, confidential, or personal information or protected health information with the Amazon Mechanical Turk workforce. \n", + "\n", + "If you are using Amazon Mechanical Turk workforce, you must verify that your data is free of personal, confidential, and explicit content and protected health information using this code cell. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if (not private_work_team):\n", + " ground_truth_request[\"InputConfig\"][\"DataAttributes\"]={\"ContentClassifiers\": [\"FreeOfPersonallyIdentifiableInformation\",\"FreeOfAdultContent\"]}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(\"Your create labeling job request:\\n\",json.dumps(ground_truth_request,indent=4))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sagemaker_client = boto3.client('sagemaker')\n", + "sagemaker_client.create_labeling_job(**ground_truth_request)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Use the DescribeLabelingJob API to describe a streaming labeling job" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sagemaker_client.describe_labeling_job(LabelingJobName=LABELING_JOB_NAME)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Wait until the labeling job status equals `InProgress` before moving forward in this notebook" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "sagemaker_client.describe_labeling_job(LabelingJobName=LABELING_JOB_NAME)['LabelingJobStatus']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Check for LabelingJobStatus and interpreting describe response\n", + "\n", + "* If you specified \"S3DataSource.ManifestS3Uri\" in the above request, the objects in the S3 file will automatically make their way to the labeling job. You will see counters incrementing from the objects from the file. \n", + "* Streaming jobs create a SQS queue in your account. You can check for existence of the queue by name \"GroundTruth-LABELING_JOB_NAME\" via console or through below command" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sqs = boto3.client('sqs')\n", + "response = sqs.get_queue_url(QueueName='GroundTruth-' + LABELING_JOB_NAME.lower())\n", + "print(\"Queue url is :\", response['QueueUrl'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Publish a new object to your labeling job once it has started\n", + "\n", + "Once you start a labeling job, you an publish a new request to it using Amazon SNS. \n", + "\n", + "### Configure your Request\n", + "\n", + "You will need to specify `REQUEST` in the following format: \n", + "\n", + "**For non-text objects**\n", + "\n", + "First, make sure that your object is located in `s3_bucket_location`\n", + "\n", + "`{\"source-ref\": \"s3_bucket_location\"}`\n", + "\n", + "**For text objects**\n", + "\n", + "`{\"source\": \"Lorem ipsum dolor sit amet\"}`\n", + "\n", + "Modify one of these examples to specify your request in the next cell. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "REQUEST = '' " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you set `Default` to `True` use the following cell upload a sample-image to your S3 bucket and send that image to labeling job. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "if(DEFAULT): \n", + " !wget https://aws-ml-blog.s3.amazonaws.com/artifacts/gt-labeling-job-resources/example-image.jpg\n", + " s3.upload_file('example-image.jpg', BUCKET, 'example-image.jpg')\n", + " REQUEST = str({\"source-ref\": f\"s3://{BUCKET}/example-image.jpg\"})\n", + "print(f'Your request: {REQUEST}')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Publish Your Request\n", + "\n", + "First, check the `LabelCounters` variable for your labeling job using `DescribeLabelingJob`. After you publish your request, you'll see `Unlabeled` increases to `1` (or the number of objects you send to your labeling job)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "sagemaker_client.describe_labeling_job(LabelingJobName=LABELING_JOB_NAME)['LabelCounters']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The following will publish your request to your Amazon SNS input topic." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(f'Your Request: {REQUEST}\\n')\n", + "if(REQUEST != ''):\n", + " published_message = sns.publish(TopicArn=INPUT_SNS_TOPIC_ARN,Message=REQUEST)\n", + " print(f'Published Message: {published_message}')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You may need to wait 1 to 2 minutes for your request to appear in `LabelCounters`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sagemaker_client.describe_labeling_job(LabelingJobName=LABELING_JOB_NAME)['LabelCounters']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Call StopLabelingJob for your previously launched job\n", + "\n", + "To stop your Streaming job, call StopLabelingJob with the `LABELING_JOB_NAME`.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sagemaker_client.stop_labeling_job(LabelingJobName=LABELING_JOB_NAME)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "conda_python3", + "language": "python", + "name": "conda_python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.5" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +}