This is a research project at Wikiversity. |
As of 2022-04-14, there were 19,029 active packages on the Comprehensive R Archive Network (CRAN). On July 7, 2017, John Nash[1] noted “There are now over 9000 packages on CRAN, with many more in Bioconductor, on Github, and other repositories. How can or should R users navigate this large and unruly collection of packages to find the tools they need and use them effectively?”
Almost eight years earlier, I had published the “sos” package that allowed users to search for packages, not just help pages as had been available with the previous RSiteSearch{utils} function.[2]
However, “sos” is still a command line solution and has largely been replaced by newer tools like the CRANsearcher addin for RStudio, crantastic, and RDocumentation. Only 2.2 percent of respondents in Julia Silge's recent survey[3] said they used “R packages built for search such as the sos package.” Some “sos” features could be improved, but R users might benefit more from using that effort to improve more popular search capabilities.
The following table summarizes our understanding of search capabilities devoted to R. The "base::readLines, vkR::getURLs" column summarizes the results of searching for those two terms in the existing search alternatives. The benchmarking done here suggests a strong preference for RDocumentation for most web-based searches, followed by Rseek. The sos package can create an Excel workbook with summary results by package. However, Jonathan Baron plans to stop maintaining his "RSiteSearch" database next year, because other options are better. This will also obsolete the RSiteSearch{utils} function and the sos package unless someone else decides to modify them to use one of the existing databases, e.g., RDocumentation.
search capability | introduced | comments | base::readLines, vkR::getURLs | FOSS |
---|---|---|---|---|
Getting Help with R[4] | early | Official overview of various help facilities recognized by the R Core Team, updated to mention help(), vignettes, demo(), apropos(), help.search(), help.start(), CRAN Task Views, FAQs, Stack Overflow, and R Email Lists. | help(), demo(), apropos(), and help.search() access only locally installed documentation.[5] | Y |
R site search[6] | (before 2004) | Search email lists and help pages of contributed packages | Clumsy relative to RDocumentation.org for many things. If you want a URL for a package and function in the R Site Search database, you can often get it similar to http://finzi.psych.upenn.edu/R/library/base/html/readLines.html.[5] | Y |
Rseek[7] | 2007 | Search email lists and R web sites including, e.g., RDocumentation.org | Rseek found these help files quickly. (It found them in RDocumentation.)[8] | ? |
Nabble R Forum[9] | early | Search email lists and R web sites | Searching for "readLines" and "base::readLines" produced nothing useful.[10] | N |
Google Advanced Search[11] | 2010s | the advanced search feature of the Google search engine | A naive search for "readLines", "getURLs", "base::readLines" and "vkR::getURLs" got similar functions in other languages in addition to possibly relevant resources in R.[5] | N |
RSiteSearch[12] | 2004 | R function in the “utils” package that searches a database of email lists plus all the help pages in packages on CRAN and Bioconductor plus a few others. | Same results as with with the "R Site Search" web site above, but with the query entered from within R.[5] | Y |
sos[13] | 2009 | R package to search for packages, not just help pages, in the RSiteSearch database.[2] | Found 'readLines' and 'getURLs', but it but took longer than some of the other options. "???base::readLines" and "???vkR::getURLs" threw errors.[5] | Y |
Metacran[14] | 2010s | Includes "Featured packages", "Most downloaded", "Trending", "Most depended upon", and "Recently updated", but CRAN only. Invites contributions for further development on their github site.[15] | Searching for "base" or "readLines", "base::readLines", "getURLs" and "vkR::getURLs" returned, "did not match any packages."[5] | Y |
crantastic[16] | 2010s | Includes "Most popular packages" and "Recent activity" with user reviews, but CRAN only. | Searches for "readLines" and "getURLs" returned, "no results were found." Searches for "base::readLines" and "vkR::getURLs" returned, "The page you were looking for doesn't exist."[5] | Y |
CRANsearcher[17] | since 2011 | RStudio addin - CRAN only with user reviews. | Searching for "base" or "readLines", "base::readLines", "getURLs" and "vkR::getURLs" returned, "Showing 0 to 0 of 0 entries".[5] | |
RDocumentation[18] | 2010s | Includes "Top 5 packages", "Top 5 authors", and "Newest packages" on CRAN, BioConductor and Github. Invites users to contribute (a) new examples and (b) ideas and code for further development on their github site.[19] | Searching for "readLines" and "getURLs" produced the desired help files, including a URL (Uniform Resource Locator) that I could use in an R Markdown vignette. Similar searches for other packages and functions seemed to produce something useful. This seems to create a strong preference for RDocumentation over other options, especially since Metacran, crantastic, and CRANsearcher all failed to return anything useful for this request. When this fails, try Rseek.[20] | N |
depsy[21] | 2012[22] | Text-mines papers for mentions of software they use, revealing impacts invisible to citation indexes like Google Scholar. Also analyzes code from over half a million GitHub repositories to find how packages are reused by other software projects and assigns fractional credit to contributors based on designated authorship, number of commits, and repo ownership.[23] | Searching for "base" or "readLines", "base::readLines", "getURLs" and "vkR::getURLs" nothing.[5] | Y |
rdrr.io[24] | 2016 | An index of R packages and documentation from CRAN, Bioconductor, GitHub and R-Forge by Ian Howson. The site allows to run R code only "Snippets Run R code online. Over 9,000 packages are preinstalled!" | The "base" package and "readLines" help were easily found on 2018-03-30, but searching for package "vkR" returned, "No results found." (Also, as of 2018-02-26, the "Ecdat" package that has been on CRAN since 2009 was not found in 'rdrr.io'.) | ? |
Key questions from this comparison:
These notes are being published on Wikiversity to invite anyone to add their own thoughts, either directly in this article or in the associated “Discuss” page.
Now it's your turn, dear reader: What would you like to see in a search capability for R?
These notes are posted on Wikiversity precisely to invite others to edit them directly or add comments on the associated “Discuss” page.
This discussion was inspired by the plenary session on "Navigating the R package universe" in the international useR!2017 conference in Brussels, Belgium, July 4-7, 2017.