Random number-Set.seed(N) in R

  • I realize that one uses set.seed() in R for pseudo-random number generation. I also realize that using the same number, like set.seed(123) insures you can reproduce results.

    But what I don't get is what do the values themselves mean. I am playing with several functions, and some use set.seed(1) or set.seed(300) or set.seed(12345). What does that number mean (if anything)- and when should I use a different one.

    Example, in a book I am working through- they use set.seed(12345) when creating a training set for decision trees. Then in another chapter, they are using set.seed(300) for creating a Random Forest.

    Just don't get the number.

    The main point of using the seed is to be able to reproduce a particular sequence of 'random' numbers. Generally speaking, if you don't need to be able to do that, you *wouldn't* set the seed. The seed itself carries no inherent meaning except it's a way of telling the random number generator 'where to start'. You might think of it a bit like the relationship between a PIN number and your bank account. The PIN is associated with a long string of numbers (your account number), but it's not inherently an interpretable quantity (there is *an* interpretation, but in setting it, you ignore that).

    For the record, 42 is always the right seed

    Just a comment: I recommend to set random generator only (i) to debug a script, to find some particular errors, etc. or (ii) to send/publish results so they can be checked.

  • Corcovado

    Corcovado Correct answer

    7 years ago

    The seed number you choose is the starting point used in the generation of a sequence of random numbers, which is why (provided you use the same pseudo-random number generator) you'll obtain the same results given the same seed number. As far as your second question is concerned, this short snippet from the description of the equivalent functionality in Stata might be helpful:

    We cannot emphasize this enough: Do not set the seed too often. To see why this is such a bad idea, consider the limiting case: You set the seed, draw one pseudorandom number, reset the seed, draw again, and so continue. The pseudorandom numbers you obtain will be nothing more than the seeds you run through a mathematical function. The results you obtain will not pass for random unless the seeds you choose pass for random. If you already had such numbers, why are you even bothering to use the pseudorandom-number generator?


    Who knew Stata had such interesting documentation: "Others try to make up a random number, figuring if they include enough digits, the result just has to be random. This is a variation on the five-second rule for dropped food, and we admit to using both of these rules"

License under CC-BY-SA with attribution

Content dated before 6/26/2020 9:53 AM