Working with the dataset

The first thing to do is to filter your data, removing all engines except for engine one, and saving the new dataset in the S3 bucket that we just created. First, we need to install the dependencies. From the command console, install the AWS Python libraries:

$ pip install boto3
$ pip install sagemaker

We can do this either in the online SageMaker Jupyter or in the locally installed Jupyter Notebook:

  1. The code to download the S3 files is as follows:
import boto3
import sagemaker

#role = 'your username if you run jupyter offline'
role = sagemaker.get_execution_role()
region = boto3.Session().region_name

print(region)
print(role)

# Test bucket name immersionday-sagemaker-test
bucket_name = 'iiot-book-data'
file_name_train = 'train_FD001.txt'
file_name_test = 'test_FD001.txt'

import boto3

s3 = boto3.resource('s3')
s3.Bucket(bucket_name).download_file(file_name_train, file_name_train)
s3.Bucket(bucket_name).download_file(file_name_test,file_name_test)
  1. We can now filter just for engine=1 using pandas:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# step 1: read the dataset
columns = ['unitid', 'time', 'set_1','set_2','set_3']
columns.extend(['sensor_' + str(i) for i in range(1,22)])
df = pd.read_csv(file_name_train, delim_whitespace=True,names=columns)
df_test = pd.read_csv(file_name_test, delim_whitespace=True,names=columns)
# only engine 1
i=1

# prepare model
columns_feature=['sensor_4','sensor_7']

file_name_train = 'train.csv'
file_name_test = 'test.csv'

dataset_train=df[(df.unitid ==i)]
dataset_test=df_test[(df_test.unitid ==i)]
  1. Finally, we can save the CSV files to the S3 bucket, as follows:
import boto3
import os

s3 = boto3.resource('s3')
target_bucket = s3.Bucket(bucket_name)

np.savetxt(file_name_train, dataset_train, delimiter=",")
np.savetxt(file_name_test, dataset_test, delimiter=",")

with open(file_name_train, 'rb') as data:
target_bucket.upload_fileobj(data,
'input/data/training/{}'.format(file_name_train))

with open(file_name_test, 'rb') as data:
target_bucket.upload_fileobj(data, file_name_test)

We have now completed our first exercise with SageMaker, working with the S3 repository and the SageMaker notebook.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset