Data Preprocessing

http://d2l.ai/chapter_preliminaries/pandas.html

1 Like

Are the Data-Preprocessing techniques provided here sufficient for most of the (real)datasets? Or an external reading is required? :thinking:

Hey @gpk2000, great question! We didn’t explain the whole data preprocessing since we are focusing on deep learning :wink: While you are interested to data preprocessing, here is a great resource: https://pandas.pydata.org/pandas-docs/stable/user_guide/basics.html

2 Likes

Thank you for the reply. I will look into it. :+1:

While executing this snippet:
inputs = inputs.fillna(inputs.mean())

Getting warning:
/tmp/ipykernel_7756/763590840.py:1: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError. Select only valid columns before calling the reduction. inputs.fillna(inputs.mean())

As solution I would suggest to add these parameters:
inputs.mean(skipna=True, numeric_only=True)

More info about mean function here

1 Like

inputs, outputs = data.iloc[:, 0:2], data.iloc[:, 2]
inputs = inputs.fillna(inputs.mean((skipna=True, numeric_only=True)))
print(inputs)

File “”, line 2
inputs = inputs.fillna(inputs.mean((skipna=True, numeric_only=True)))
^
SyntaxError: invalid syntax

Hi khushboo! You can try
inputs = inputs.fillna(inputs.mean(skipna=True, numeric_only=True))
instead of
inputs = inputs.fillna(inputs.mean((skipna=True, numeric_only=True)))
There are too much parentheses

Thank you to help me solve the problem!!