The Visualization Core Lab will host an Introduction to SQL for Data Science workshop.
Visualization lab staff will provide an introduction to SQL for data science designed for
learners with little or no previous experience with SQL programming.
Topics covered will include the following.
Selecting data
Sorting and removing duplicates
Filtering
Calculating new values
Missing data
Aggregation
Combining data
Data hygiene
Creating and modifying data
Programming with databases-Python
Programming with databases-R
Next steps for more advanced SQL training for data science.
This hands-on lesson is part of the Introduction to Data Science Workshop Series
being offered by the KAUST Research Computing Core Labs as part of our on-going efforts to build
capacity in core data science skills at KAUST. The workshop curriculum largely follows the
curriculum developed by
Software Carpentry, a volunteer project dedicated to helping
researchers get their work done in less time and with less pain by teaching them basic research
computing skills.
This is a live-coding based workshop and learners are expected to bring their own laptops with
the required software already downloaded and installed.
Who:
The course is aimed at graduate students (MSc and PhD), Post-docs, faculty and other research staff at KAUST.
You don't need to have any previous knowledge of the tools that will be presented at the workshop.
Where:
Auditorium 215 (between bldg. 2-3, level 0).
Get directions with
OpenStreetMap
or
Google Maps.
Requirements: Participants must bring a laptop with a
Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (listed below).
Accessibility: We are committed to making this workshop
accessible to everybody.
The workshop organizers have checked that:
The room is wheelchair / scooter accessible.
Accessible restrooms are available.
Materials will be provided in advance of the workshop and
large-print handouts are available if needed by notifying the
organizers in advance. If we can help making learning easier for
you (e.g. sign-language interpreters, lactation facilities) please
get in touch (using contact details below) and we will
attempt to provide them.
Everyone who participates in Carpentries activities is required to conform to the Code of Conduct.This document also outlines how to report an incident if needed.
Collaborative Notes
We will use this collaborative document for chatting, taking notes, and sharing URLs and bits of code.
Surveys
Please be sure to complete these surveys before and after the workshop.
To participate in a
Software Carpentry
workshop,
you will need access to the software described below.
In addition, you will need an up-to-date web browser.
Click on "Next" four times (two times if you've previously
installed Git). You don't need to change anything
in the Information, location, components, and start menu screens.
Select "Use the nano editor by default" and click on "Next".
Keep "Use Git from the Windows Command Prompt" selected and click on "Next".
If you forgot to do this programs that you need for the workshop will not work properly.
If this happens rerun the installer and select the appropriate option.
Click on "Next".
Keep "Checkout Windows-style, commit Unix-style line endings" selected and click on "Next".
Select "Use Windows' default console window" and click on "Next".
Click on "Install".
Click on "Finish".
If your "HOME" environment variable is not set (or you don't know what this is):
Open command prompt (Open Start Menu then type cmd and press [Enter])
Type the following line into the command prompt window exactly as shown:
setx HOME "%USERPROFILE%"
Press [Enter], you should see SUCCESS: Specified value was saved.
Quit command prompt by typing exit then pressing [Enter]
This will provide you with both Git and Bash in the Git Bash program.
The default shell in all versions of macOS is Bash, so no
need to install anything. You access Bash from the Terminal
(found in
/Applications/Utilities).
See the Git installation video tutorial
for an example on how to open the Terminal.
You may want to keep
Terminal in your dock for this workshop.
The default shell is usually Bash, but if your
machine is set up differently you can run it by opening a
terminal and typing bash. There is no need to
install anything.
Python (Optional)
Python is a popular language for research computing, and
great for general-purpose programming as well. While there are many different ways to install
Python, we recommend installing the 64-bit Python 3 version of
Miniconda.
We will teach Python using the JupyterLab,
a programming environment that runs in a web browser. For this to work you will need a reasonably
up-to-date browser. The current versions of the Chrome, Safari and
Firefox browsers are all supported.
Download the 64-bit Python 3 installer for Linux. The installation requires using
the shell. If you aren't comfortable doing the installation yourself stop here and
request help at the workshop.
Open a terminal window.
Type
bash Miniconda3-
and then press Tab. The name of the file
you just downloaded should appear. If it does not, navigate to the folder where you
downloaded the file, for example with:
cd ~/Downloads
Then, try again.
Press Return. You will follow the text-only prompts. To move through
the text, press Spacebar. Type yes and press enter to approve
the license. Press enter to approve the default location for the files. Type
yes and press enter to prepend Miniconda to your PATH
(this makes the Miniconda distribution the default Python).
Close the terminal window.
R (Optional)
R is a programming language
that is especially powerful for data exploration, visualization, and
statistical analysis. To interact with R, we use
RStudio.
Install R by downloading and running
this .exe file
from CRAN.
Also, please install the
RStudio IDE.
Note that if you have separate user and admin accounts, you should run the
installers as administrator (right-click on .exe file and select "Run as
administrator" instead of double-clicking). Otherwise problems may occur later,
for example when installing R packages.
You can download the binary files for your distribution
from CRAN. Or
you can use your package manager (e.g. for Debian/Ubuntu
run sudo apt-get install r-base and for Fedora run
sudo dnf install R). Also, please install the
RStudio IDE.
SQLite
SQL is a specialized programming language used with databases. We
use a simple database manager called
SQLite in our lessons.
Copy the following curl https://kaust-vislab.github.io/2020-03-09-kaust-vislab/getsql.sh | bash
Paste it into the window that git bash opened. If you're unsure, ask an instructor for help
You should see something like 3.27.2 2019-02-25 16:06:06 ...
If you want to do this manually, download sqlite3, make a bin directory in the user's home directory, unzip sqlite3, move it into the bin directory, and then add the bin directory to the path.
SQLite comes pre-installed on macOS.
SQLite comes pre-installed on Linux.
In case of problems: register for an account at Python Anywhere