Data Preprocessing

mli · May 22, 2020, 3:43am

http://d2l.ai/chapter_preliminaries/pandas.html

gpk2000 · August 23, 2020, 1:52pm

Are the Data-Preprocessing techniques provided here sufficient for most of the (real)datasets? Or an external reading is required?

goldpiggy · August 24, 2020, 4:54am

Hey @gpk2000, great question! We didn’t explain the whole data preprocessing since we are focusing on deep learning While you are interested to data preprocessing, here is a great resource: https://pandas.pydata.org/pandas-docs/stable/user_guide/basics.html

gpk2000 · August 24, 2020, 9:40am

Thank you for the reply. I will look into it.

Def255 · July 18, 2021, 2:07pm

While executing this snippet:
inputs = inputs.fillna(inputs.mean())

Getting warning:
/tmp/ipykernel_7756/763590840.py:1: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError. Select only valid columns before calling the reduction. inputs.fillna(inputs.mean())

As solution I would suggest to add these parameters:
inputs.mean(skipna=True, numeric_only=True)

More info about mean function here

khushboo · August 4, 2021, 11:33pm

inputs, outputs = data.iloc[:, 0:2], data.iloc[:, 2]
inputs = inputs.fillna(inputs.mean((skipna=True, numeric_only=True)))
print(inputs)

File “”, line 2
inputs = inputs.fillna(inputs.mean((skipna=True, numeric_only=True)))
^
SyntaxError: invalid syntax

Def255 · August 5, 2021, 2:03pm

Hi khushboo! You can try
inputs = inputs.fillna(inputs.mean(skipna=True, numeric_only=True))
instead of
inputs = inputs.fillna(inputs.mean((skipna=True, numeric_only=True)))
There are too much parentheses

sonreikou · June 13, 2023, 2:06am

Thank you to help me solve the problem!!