Introduction and Set Up

Learning Objectives

  • Understand the difference between CRAN and Bioconductor
  • Be able to install Bioinformatics packages

Getting Set Up

RStudio provides a useful feature called Projects which act like a container for your work. As you use R more, you will find it useful to make sure your files and environment for one real-world project are kept together and separate from other projects.

Let’s create a new project now.

  1. Go to File > New Project
  2. In Create project from menu choose New Directory
  3. Choose Project Type New Project
  4. Make sure Create project as subdirectory of: is pointing to Desktop (or whatever your preferred location is)
  5. Call your new directory r_bioinformatics_lesson
  6. Select the check box that says Open in New Session
  7. Inside your new project, create folders called data and figures

What is Bioconductor?

In this lesson, we’ll be working with a number of bioinformatics packages along with the tidyverse family of packages. Many R packages come from CRAN (Comprehensive R Archive Network). Packages from CRAN can be installed either by using the Install Packages widget in RStudio (lower-right pane) or with the function install.packages().

Bioconductor is an open-source project that provides tools for the analysis and comprehension of high-throughput biological data, built on the R programming language. It includes a large ecosystem of packages for tasks such as sequence analysis, genomic data visualization, and statistical modeling in bioinformatics. Bioconductor emphasizes reproducibility, interoperability, and the integration of biological metadata, making it especially well-suited for omics research.

To use packages from Bioconductor, we must first install and load the Biocmanager package from CRAN. Biocmanager provides an interface to the Bioconductor repository.

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

library(BiocManager)

Next, let’s install the packages we will be using in this session. From CRAN, we’ll be installing

Package Name Purpose
tidyverse Wrangling and visualizing data
rentrez Accessing data from NCBI databases
ape Phylogenetic analysis
install.packages(c("tidyverse", "rentrez", "ape"))

Then we can install our Bioconductor packages with Biocmanager

Package Name Purpose
Biostrings Manipulating biological sequences
pwalign Pairwise Alignment
DECIPHER Multiple Sequence Alignment
BiocManager::install(c("Biostrings", "pwalign", "DECIPHER"))

NOTE: You only need to install a package once on your system (and after updates), but you will want to load the packages into your R session with the library() function.

library(ape)
library(Biostrings)
library(DECIPHER)
library(pwalign)
library(rentrez)
library(tidyverse)

Don’t be afraid if you see a lot of text and messages printing out in the console as you go through this process. You may be asked to update other packages that the ones we are trying to install are dependent on. You will also be warned if some of the packages have functions with the same name.