Intro to R

Adapted from simplilearn.com and psychologicalscience.org 

R is an open-source programming language often used as a data analysis and statistical software tool. It’s particularly useful for machine learning operations and data wrangling. If R is widely used, what are some potential advantages and drawbacks and is it really worth learning? 

Here are some advantages: 

  • It’s open source. No fees or licenses are needed, which is why if you’re developing a new program, it’s a low-risk venture. 
  • It’s platform-independent, which means that it runs on all operating systems. This allows developers to only create one program that can work on competing systems. This is also why R is cost-effective. 
  • R is great for statistics – it can do many things from regression equations to frequentist and Bayesian statistics. It can bootstrap, simulate, randomize and resample your data which is why it’s widely used in the data sciences. 
  • It’s well suited for Machine Learning. R is ideal for machine learning operations such as regression and classification.  
  • R lets you perform data wrangling which involves turning unstructured, messy data into a structured format. This involves merging data sets, cleaning data and identifying important rows or columns. R also creates formatted tables, complete with significance stars. For this reason, R is often used in financial tech industries and in academia.  
  • It visualizes data well because there are packages dedicated to making pretty plots.  

Drawbacks include: 

  • It has a steep learning curve. It’s best suited for people who have some previous programming experience. 
  • It’s not as secure. R doesn’t have basic security measures. Consequently, it’s not a good choice for making web-safe applications and it can’t be embedded in web browsers. 
  • It’s slow. R is slower than other programming languages like Python or MATLAB. 
  • It has bad memory management, so it takes up a lot of memory when running code. R’s data must be stored in physical memory however, the increasing use of cloud-based memory may eventually lessen the negative effects of this drawback.  
  • It doesn’t have consistent documentation/package quality. Docs and packages can be patchy and inconsistent, or incomplete because it doesn’t have official support and instead is maintained and added to by the community.  

Some stat classes at Lawrence already teach people how to use R like STAT 255, CMSC 205 and ECON 380. You can also install RStudio yourself since it’s free and use online teaching and coding resources like Datacamp to do so as well.