Predicting House Prices on Kaggle

The methods from np.ndarray to torch.tensor may be changed, so I use another method to convert np.ndarray to torch.tensor.

        train_features = torch.from_numpy(all_features[:n_train].values.astype(float))
        test_features = torch.from_numpy(all_features[n_train:].values.astype(float))
        train_labels = torch.from_numpy(self.train_data.SalePrice.values.reshape(-1, 1)).to(dtype=torch.float32)

my exercise:

  1. Average validation log mse=0.182 , score= 0.41115
    for textbook naive linear regression;
  2. No. Selection effect, house data in one side of price distribution may be hard to collect.
  3. Tuned max_epochs=20, other hyper-parameters fixed. log mse=0.12, score=0.34241;
    Tuned max_epochs=50, log mse=0.068, score=0.26531. increasing training did increased model’s performance;
    or tuned lr=0.03, max_epoch=30, log mse=0.057, score=0.22289, but too large lr will decrease the performance of model.
  4. MLP with one hidden layer (num_hiddens=256), lr=0.002, max_epoch=100, log mse=0.078, score= 0.27778, better than linear regression with same lr*max_epoch;
    Add dropout: dropout=0.5, log mse=0.0944, score=0.2935, meaning our model still underfit?
    Add Weight Decay: wd=0.1, log mse=0.0797, score=0.27613
  5. No positive value, can’t log.