The Visualization Core Lab will host an Introduction to SQL for Data Science workshop.
Visualization lab staff will provide an introduction to SQL for data science designed for
learners with little or no previous experience with SQL programming.
Topics covered will include the following.
Sorting and removing duplicates
Calculating new values
Creating and modifying data
Programming with databases-Python
Programming with databases-R
Next steps for more advanced SQL training for data science.
This hands-on lesson is part of the Introduction to Data Science Workshop Series
being offered by the KAUST Research Computing Core Labs as part of our on-going efforts to build
capacity in core data science skills at KAUST. The workshop curriculum largely follows the
curriculum developed by
Software Carpentry, a volunteer project dedicated to helping
researchers get their work done in less time and with less pain by teaching them basic research
This is a live-coding based workshop and learners are expected to bring their own laptops with
the required software already downloaded and installed.
The course is aimed at graduate students (MSc and PhD), Post-docs, faculty and other research staff at KAUST.
You don't need to have any previous knowledge of the tools that will be presented at the workshop.
Requirements: Participants must bring a laptop with a
Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (listed below).
Code of Conduct: Everyone who participates in Carpentries activities is required to conform to the Code of Conduct. This document also outlines how to report an incident if needed.
Click on "Next" four times (two times if you've previously
installed Git). You don't need to change anything
in the Information, location, components, and start menu screens.
Select "Use the nano editor by default" and click on "Next".
Keep "Use Git from the Windows Command Prompt" selected and click on "Next".
If you forgot to do this programs that you need for the workshop will not work properly.
If this happens rerun the installer and select the appropriate option.
Click on "Next".
Keep "Checkout Windows-style, commit Unix-style line endings" selected and click on "Next".
Select "Use Windows' default console window" and click on "Next".
Click on "Install".
Click on "Finish".
If your "HOME" environment variable is not set (or you don't know what this is):
Open command prompt (Open Start Menu then type cmd and press [Enter])
Type the following line into the command prompt window exactly as shown:
setx HOME "%USERPROFILE%"
Press [Enter], you should see SUCCESS: Specified value was saved.
Quit command prompt by typing exit then pressing [Enter]
This will provide you with both Git and Bash in the Git Bash program.
The default shell in all versions of macOS is Bash, so no
need to install anything. You access Bash from the Terminal
See the Git installation video tutorial
for an example on how to open the Terminal.
You may want to keep
Terminal in your dock for this workshop.
The default shell is usually Bash, but if your
machine is set up differently you can run it by opening a
terminal and typing bash. There is no need to
Python is a popular language for research computing, and
great for general-purpose programming as well. While there are many different ways to install
Python, we recommend installing the 64-bit Python 3 version of
We will teach Python using the JupyterLab,
a programming environment that runs in a web browser. For this to work you will need a reasonably
up-to-date browser. The current versions of the Chrome, Safari and
Firefox browsers are all supported.
Download the 64-bit Python 3 installer for Linux. The installation requires using
the shell. If you aren't comfortable doing the installation yourself stop here and
request help at the workshop.
Open a terminal window.
and then press Tab. The name of the file
you just downloaded should appear. If it does not, navigate to the folder where you
downloaded the file, for example with:
Then, try again.
Press Return. You will follow the text-only prompts. To move through
the text, press Spacebar. Type yes and press enter to approve
the license. Press enter to approve the default location for the files. Type
yes and press enter to prepend Miniconda to your PATH
(this makes the Miniconda distribution the default Python).
Close the terminal window.
R is a programming language
that is especially powerful for data exploration, visualization, and
statistical analysis. To interact with R, we use
Install R by downloading and running
this .exe file
Also, please install the
Note that if you have separate user and admin accounts, you should run the
installers as administrator (right-click on .exe file and select "Run as
administrator" instead of double-clicking). Otherwise problems may occur later,
for example when installing R packages.
You can download the binary files for your distribution
from CRAN. Or
you can use your package manager (e.g. for Debian/Ubuntu
run sudo apt-get install r-base and for Fedora run
sudo dnf install R). Also, please install the
SQL is a specialized programming language used with databases. We
use a simple database manager called
SQLite in our lessons.