How do I get the number of rows of a data.frame in R?

  • After reading a dataset:

    dataset <- read.csv("forR.csv")
    
    • How can I get R to give me the number of cases it contains?
    • Also, will the returned value include of exclude cases omitted with na.omit(dataset)?

    I also recommend taking a look at `str()` as it provides other useful details about your object. Can often explain why a column isn't behaving as it should (factor instead of numeric, etc).

    Please read the R guide of Owen first (http://cran.r-project.org/doc/contrib/Owen-TheRGuide.pdf), and if possible, Introduction to R (http://cran.r-project.org/doc/manuals/R-intro.pdf). Both are on the official website of R. You're incredibly lucky you actually get an answer. On the r-help list one would redirect you to the manual in less elegant terms. No offense meant.

    @Joris - Point taken (without offence), but it was my impression that SE sites were designed to foster problem/solution learning in a way not afforded by manuals. Additionally, this question will now be available for other beginners. Thanks for the links though.

    If you're looking for pure code solutions, stackoverflow might be more appropriate. Although, all the R gurus present @ SO are also here (not counting myself). :)

    I disagree with your assertion that this question will be helpful for other beginners, *especially* if they don't skim the manual. They will just create a duplicate question.

    @JorisMeys: thanks for the link to the R guide.. hadn't come across that yet in my learning of R and it's exactly what I'd been looking for.

    And, four years later, this is the second hit I got on Google trying to find an answer to this question. No need for me to create a duplicate (@JoshuaUlrich).

    @Richard Just noticed that (6 years on) this question has 100 upvotes and is consequently well within the top 0.1% of questions on the site. I find this very interesting.

  • dataset will be a data frame. As I don't have forR.csv, I'll make up a small data frame for illustration:

    set.seed(1)
    dataset <- data.frame(A = sample(c(NA, 1:100), 1000, rep = TRUE),
                          B = rnorm(1000))
    
    > head(dataset)
       A           B
    1 26  0.07730312
    2 37 -0.29686864
    3 57 -1.18324224
    4 91  0.01129269
    5 20  0.99160104
    6 90  1.59396745
    

    To get the number of cases, count the number of rows using nrow() or NROW():

    > nrow(dataset)
    [1] 1000
    > NROW(dataset)
    [1] 1000
    

    To count the data after omitting the NA, use the same tools, but wrap dataset in na.omit():

    > NROW(na.omit(dataset))
    [1] 993
    

    The difference between NROW() and NCOL() and their lowercase variants (ncol() and nrow()) is that the lowercase versions will only work for objects that have dimensions (arrays, matrices, data frames). The uppercase versions will work with vectors, which are treated as if they were a 1 column matrix, and are robust if you end up subsetting your data such that R drops an empty dimension.

    Alternatively, use complete.cases() and sum it (complete.cases() returns a logical vector [TRUE or FALSE] indicating if any observations are NA for any rows.

    > sum(complete.cases(dataset))
    [1] 993
    

License under CC-BY-SA with attribution


Content dated before 6/26/2020 9:53 AM