Skip to content
This repository has been archived by the owner on Nov 1, 2021. It is now read-only.

Different results due to optimize(true) #122

Open
mys007 opened this issue May 21, 2016 · 3 comments
Open

Different results due to optimize(true) #122

mys007 opened this issue May 21, 2016 · 3 comments

Comments

@mys007
Copy link

mys007 commented May 21, 2016

I have encountered weird behavior when turning on optimization.

The following code computes a simple MLP with cross-entropy loss. The loss can be computed in two ways (CrossEntropyCriterion or LogSoftMax&ClassNLLCriterion). The optimization is turned on. Unexpectedly, this produces different results. However, when I turn off optimization, the printouts are the same. Also, when I move the definition of df into the loop with the optimization turned on (i.e. grad is recomputed each time), the results are then the same as well.

PS: A piggy-backed issue is that loss.crossEntropy doesn't support batch mode due to util.logSumExp not supporting it.

Thank you for your help.

t = require 'torch'
grad = require 'autograd'

t.manualSeed(11)
grad.optimize(true) --COMMENT ME OUT

local params = {
   W = {
      t.randn(50,50),
      t.randn(50,10),
   }
}

local ces = grad.nn.CrossEntropyCriterion()
local cnl = grad.nn.ClassNLLCriterion()
local lsm = grad.nn.LogSoftMax()

local f = function(params, x, y)
   local h1 = t.tanh(x * params.W[1])
   local h2 = t.tanh(h1 * params.W[2])
   return ces(h2,y)
end

local g = function(params, x, y)
   local h1 = t.tanh(x * params.W[1])
   local h2 = t.tanh(h1 * params.W[2])
   return cnl(lsm(h2),y)
end

local df = grad(f)  --OR MOVE ME INTO THE LOOP

local dg = grad(g)


local inputs = torch.Tensor(100,50):normal(0,1)
local targets = torch.Tensor(100):fill(1)

for i=1,10 do
    local graddF, lossdF = df(params, inputs, targets)
    local graddG, lossdG = dg(params, inputs, targets)
    print(lossdF, graddF.W[1]:norm(), graddF.W[2]:norm())
    print(lossdG, graddG.W[1]:norm(), graddG.W[2]:norm())

    params.W[1]:add(-1e-3, graddF.W[1])
    params.W[2]:add(-1e-3, graddF.W[2])
end
@alexbw
Copy link
Collaborator

alexbw commented May 21, 2016

That's pretty strange. I don't see any of the usual culprits.
cc @luketwitter

@szagoruyko
Copy link
Contributor

this bugged me again. CrossEntropyCriterion cannot be used with optimize = true, I remember myself trying to debug it without any luck a few months ago.

@szagoruyko
Copy link
Contributor

apparently codegen first calls backward and then forward starting from the second time it's called. First time is fine.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants