Lawrence Berkeley National Laboratory

May 03-04, 2018

9:00am - 4:00pm

Instructors: Nima Hejazi, Adam Orr

Helpers: Nurgul Kaplan, Shi Wang

General Information

Data Carpentry aims to help researchers get their work done in less time and with less pain by teaching them basic research computing skills. This hands-on workshop will cover basic concepts and tools, including program design, version control, data management, and task automation. Participants will be encouraged to help one another and to apply what they have learned to their own research problems.

For more information on what we teach and why, please see our paper "Best Practices for Scientific Computing".

Who: The course is aimed at graduate students and other researchers. You don't need to have any previous knowledge of the tools that will be presented at the workshop.

Where: 1 Cyclotron Road, Berkeley, CA 94720, Building 54, Room 130. Get directions with OpenStreetMap or Google Maps.

When: May 03-04, 2018. Add to your Google Calendar.

Requirements: Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (listed below). They are also required to abide by Data Carpentry's Code of Conduct.

Accessibility: We are committed to making this workshop accessible to everybody. The workshop organizers have checked that:

Materials will be provided in advance of the workshop and large-print handouts are available if needed by notifying the organizers in advance. If we can help making learning easier for you (e.g. sign-language interpreters, lactation facilities) please get in touch (using contact details below) and we will attempt to provide them.

Contact: Please email , or for more information.


Please be sure to complete these surveys before and after the workshop.

Pre-workshop Survey

Post-workshop Survey


Day 1

Early Morning Project organization and management
Coffee Break
Late Morning Introduction to the command line
Lunch Break
Afternoon Command line continued
Coffee Break
Evening Command line continued

Day 2

Early Morning Data wrangling and processing
Coffee Break
Late MorningData wrangling continued
Lunch Break
AfternoonData wrangling continued
Coffee Break
Evening Introduction to cloud computing for genomics

We will use this collaborative document for chatting, taking notes, and sharing URLs and bits of code.


Project Organization

  • Data Tidiness
  • Planning for NGS Projects
  • Examining Data on the NCBI SRA Database
  • Reference...

Introduction to the command line

  • Files and directories
  • History and tab completion
  • Pipes and redirection
  • Creating and running shell scripts
  • Reference...

Data Wrangling

  • Assessing Read Quality
  • Trimming and Filtering
  • Variant Calling Workflow
  • Automating a Variant Calling Workflow
  • Reference...

Intro to Cloud Computing

  • Why of cloud computing
  • Logging onto Cloud
  • Fine tuning your Cloud Setup
  • Data roundtripping
  • Which Cloud for my data?
  • Reference...


This workshop is designed to be run on pre-imaged Amazon Web Services (AWS) instances. All the software and data used in the workshop are hosted on an Amazon Machine Image (AMI). For information about how to use the workshop materials, see the setup instructions on the main workshop page.

Windows users should download and install PuTTY.

Spreadsheet Software

The project organization lesson requires a working spreadsheet program. If you don’t have a spreadsheet program already, you can use LibreOffice. It’s a free, open source spreadsheet program.


Mac OS X