Ticket

The code of the Ticket feature is not immediately clear, but we can do some guesses and try to group them. After looking at the Ticket feature, you may get these clues:

  • Almost a quarter of the tickets begin with a character while the rest consist of only numbers.
  • The number part of the ticket code seems to have some indications about the class of the passenger. For example, numbers starting with 1 are usually first class tickets, 2 are usually second, and 3 are third. I say usually because it holds for the majority of examples, but not all. There are also ticket numbers starting with 4-9, and those are rare and almost exclusively third class.
  • Several people can share a ticket number, which might indicate a family or close friends traveling together and acting like a family.

The following code tries to analyze the ticket feature code to come up with preceding clues:

# Helper function for constructing features from the ticket variable
def process_ticket():
global df_titanic_data

df_titanic_data['TicketPrefix'] = df_titanic_data['Ticket'].map(lambda y: get_ticket_prefix(y.upper()))
df_titanic_data['TicketPrefix'] = df_titanic_data['TicketPrefix'].map(lambda y: re.sub('[.?/?]', '', y))
df_titanic_data['TicketPrefix'] = df_titanic_data['TicketPrefix'].map(lambda y: re.sub('STON', 'SOTON', y))

df_titanic_data['TicketPrefixId'] = pd.factorize(df_titanic_data['TicketPrefix'])[0]

# binarzing features for each ticket layer
if keep_binary:
prefixes = pd.get_dummies(df_titanic_data['TicketPrefix']).rename(columns=lambda y: 'TicketPrefix_' + str(y))
df_titanic_data = pd.concat([df_titanic_data, prefixes], axis=1)

df_titanic_data.drop(['TicketPrefix'], axis=1, inplace=True)

df_titanic_data['TicketNumber'] = df_titanic_data['Ticket'].map(lambda y: get_ticket_num(y))
df_titanic_data['TicketNumberDigits'] = df_titanic_data['TicketNumber'].map(lambda y: len(y)).astype(np.int)
df_titanic_data['TicketNumberStart'] = df_titanic_data['TicketNumber'].map(lambda y: y[0:1]).astype(np.int)

df_titanic_data['TicketNumber'] = df_titanic_data.TicketNumber.astype(np.int)

if keep_scaled:
scaler_processing = preprocessing.StandardScaler()
df_titanic_data['TicketNumber_scaled'] = scaler_processing.fit_transform(
df_titanic_data.TicketNumber.reshape(-1, 1))


def get_ticket_prefix(ticket_value):
# searching for the letters in the ticket alphanumerical value
match_letter = re.compile("([a-zA-Z./]+)").search(ticket_value)
if match_letter:
return match_letter.group()
else:
return 'U'


def get_ticket_num(ticket_value):
# searching for the numbers in the ticket alphanumerical value
match_number = re.compile("([d]+$)").search(ticket_value)
if match_number:
return match_number.group()
else:
return '0'
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset