If a dataset contains a numeric column that has thousand numbers formatted by a comma or any other delimiter, the default data type for such a column is a string or object. The problem is that it is actually a numeric field and it needs to be read as a numeric field to be used further:
pd.read_csv('tmp.txt',sep='|')
We get the following output:
Data with a level column with thousand format numbers
data.level.dtype returns dtype('O')
To overcome this problem, the thousands parameter can be used while reading:
pd.read_csv('tmp.txt',sep='|',thousands=',')
data.level.dtype now returns dtype('int64')