Data Preprocessing

Hi, in the exercise, I use data.isna and ‘max’ function to find which column has the most missing value, do you guys have any other easy way to do?

Well, in my case, I use data.isnull().sum() to find the number of NaN in each columns and find the most one by using max(), then I use dropna() to delete to the column that I want.

I defined a function to find the column with the max na:

def get_max_col_name(data):
    max = data.isnull().sum().max()
    for i in range(len(data.columns)):
        if data.iloc[:,i].isnull().sum() == max:
            return data.iloc[:,i].name

Then used the function in dataframe.drop:
data_dropna = data.drop(columns=[get_max_col_name(data)], axis=1)

inputs.isnull().sum().index[inputs.isnull().sum().values == inputs.isnull().sum().max()]

this might work