Random number-Set.seed(N) in R
I realize that one uses
set.seed()in R for pseudo-random number generation. I also realize that using the same number, like
set.seed(123)insures you can reproduce results.
But what I don't get is what do the values themselves mean. I am playing with several functions, and some use
set.seed(12345). What does that number mean (if anything)- and when should I use a different one.
Example, in a book I am working through- they use
set.seed(12345)when creating a training set for decision trees. Then in another chapter, they are using
set.seed(300)for creating a Random Forest.
Just don't get the number.
does this help? http://stackoverflow.com/questions/14684437/what-does-the-integer-while-setting-the-seed-mean Also ?set.seed() within R provides pretty good information.
The main point of using the seed is to be able to reproduce a particular sequence of 'random' numbers. Generally speaking, if you don't need to be able to do that, you *wouldn't* set the seed. The seed itself carries no inherent meaning except it's a way of telling the random number generator 'where to start'. You might think of it a bit like the relationship between a PIN number and your bank account. The PIN is associated with a long string of numbers (your account number), but it's not inherently an interpretable quantity (there is *an* interpretation, but in setting it, you ignore that).
Just a comment: I recommend to set random generator only (i) to debug a script, to find some particular errors, etc. or (ii) to send/publish results so they can be checked.
The seed number you choose is the starting point used in the generation of a sequence of random numbers, which is why (provided you use the same pseudo-random number generator) you'll obtain the same results given the same seed number. As far as your second question is concerned, this short snippet from the description of the equivalent functionality in Stata might be helpful:
We cannot emphasize this enough: Do not set the seed too often. To see why this is such a bad idea, consider the limiting case: You set the seed, draw one pseudorandom number, reset the seed, draw again, and so continue. The pseudorandom numbers you obtain will be nothing more than the seeds you run through a mathematical function. The results you obtain will not pass for random unless the seeds you choose pass for random. If you already had such numbers, why are you even bothering to use the pseudorandom-number generator?