Deep Factorization Machines

astonzhang · June 29, 2020, 10:57pm

https://d2l.ai/chapter_recommender-systems/deepfm.html

StevenJokes · August 11, 2020, 4:19pm

Why is my cpu quicker(more examples each second) than your two GPUs?

StevenJokes/d2l-en-read/blob/master/Ch16_Recommender_Systems/16-10.ipynb

{
 "metadata": {
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.8-final"
  },
  "orig_nbformat": 2,
  "kernelspec": {
   "name": "python37864bitmxnetconda84e8bee5b66a4c7fa50e670e85772498",
   "display_name": "Python 3.7.8 64-bit ('mxnet': conda)"
  }
 },

This file has been truncated. show original

Sorry to miss a number…

goldpiggy · August 11, 2020, 7:54pm

Hi @StevenJokes, great question! It is always to point out that CPU may outperform GPU on small networks and less dimension feature inputs.

So you may wonder how to define “a small network”? A simple example: a 100 unit MLP trained on 10 input features may count as a small network. Since there are only 100x10 = 1000 parameters that needs to trained in this network. However, if the input feature dimension increases to 1 million, then the neural net may need more compute resources provided by GPU.

StevenJokes · August 12, 2020, 4:27am

Sorry to miss a number…
GPU is quicker.

mengqi_liu · March 11, 2021, 9:52am

I found a question. When I initialized parameters in your way, I could get a similar result as yours. But, if
I done nothing for neural network’ s parameters, the acc is much higher. And for binary classification
question，shouldn’ t its acc exceed 0.5 at least?

StevenJokess · March 11, 2021, 4:38pm

I don’t know how doing nothing influences the parameters.
Are all the parameters 0 at start?

OK! I found it!

MXNet will use the default random initialization method: weight parameters for each element are randomly sampled from -0.07 to 0.07 evenly distributed, and all deviation parameters are cleared to zero.
This isn’t binary classification question.

noora_saeed · July 10, 2021, 3:08pm

please ,I need this code in TensorFlow

JungGun_Lim · October 19, 2021, 11:46am

I’m just curious…
Test accuracy below 0.5 means… it is worse than just flip the coin. So is this accuracy worth it??

ysw · October 25, 2021, 4:42am

Why use num_inputs = int(sum(field_dims)) as embedding input?
I think the right way should be this: use each dim in field_dims to embedding input feature, and then use each of the embedded vector to do the sum_of_square and square_of_sum(for FM layer) and feed to the NN(NN layer).

ysw · October 25, 2021, 4:43am

I think the code has bug. The correct way is to embedding each of the input sparse features, see my comment below.