The Visualization Core Lab will host an Introduction to SQL for Data Science workshop.
Visualization lab staff will provide an introduction to SQL for data science designed for
learners with little or no previous experience with SQL programming.
Topics covered will include the following.
Selecting data
Sorting and removing duplicates
Filtering
Calculating new values
Missing data
Aggregation
Combining data
Data hygiene
Creating and modifying data
Programming with databases-Python
Next steps for more advanced SQL training for data science.
This hands-on lesson is part of the Introduction to Data Science Workshop Series
being offered by KVL as part of our on-going efforts to build
capacity in core data science skills both at KAUST and within the Kingdom of Saudi Arabia (KSA).
The workshop curriculum largely follows the
curriculum developed by
Software Carpentry, a volunteer project dedicated to helping
researchers get their work done in less time and with less pain by teaching them basic research
computing skills.
This is a live-coding based workshop and learners are expected to work along with the instructor using freely
available cloud resources provided by the Binder project.
Who:
The course is aimed at graduate students (MSc and PhD), post-docs, scientists, faculty, and industry researchers and practioners at KAUST and within the Kingdom of Saudi Arabia (KSA).
Everyone who participates in Carpentries activities is required to conform to the Code of Conduct.This document also outlines how to report an incident if needed.
Collaborative Notes
We will use this collaborative document for chatting, taking notes, and sharing URLs and bits of code.
Surveys
Please be sure to complete these surveys before and after the workshop.
To participate in a
Software Carpentry
workshop,
you will need access to the software described below.
In addition, you will need an up-to-date web browser.
Click on "Next" four times (two times if you've previously
installed Git). You don't need to change anything
in the Information, location, components, and start menu screens.
Select "Use the nano editor by default" and click on "Next".
Keep "Use Git from the Windows Command Prompt" selected and click on "Next".
If you forgot to do this programs that you need for the workshop will not work properly.
If this happens rerun the installer and select the appropriate option.
Click on "Next".
Keep "Checkout Windows-style, commit Unix-style line endings" selected and click on "Next".
Select "Use Windows' default console window" and click on "Next".
Click on "Install".
Click on "Finish".
If your "HOME" environment variable is not set (or you don't know what this is):
Open command prompt (Open Start Menu then type cmd and press [Enter])
Type the following line into the command prompt window exactly as shown:
setx HOME "%USERPROFILE%"
Press [Enter], you should see SUCCESS: Specified value was saved.
Quit command prompt by typing exit then pressing [Enter]
This will provide you with both Git and Bash in the Git Bash program.
The default shell in all versions of macOS is Bash, so no
need to install anything. You access Bash from the Terminal
(found in
/Applications/Utilities).
See the Git installation video tutorial
for an example on how to open the Terminal.
You may want to keep
Terminal in your dock for this workshop.
The default shell is usually Bash, but if your
machine is set up differently you can run it by opening a
terminal and typing bash. There is no need to
install anything.
Python (Optional)
Python is a popular language for research computing, and
great for general-purpose programming as well. While there are many different ways to install
Python, we recommend installing the 64-bit Python 3 version of
Miniconda.
We will teach Python using the JupyterLab,
a programming environment that runs in a web browser. For this to work you will need a reasonably
up-to-date browser. The current versions of the Chrome, Safari and
Firefox browsers are all supported.
Download the 64-bit Python 3 installer for Linux. The installation requires using
the shell. If you aren't comfortable doing the installation yourself stop here and
request help at the workshop.
Open a terminal window.
Type
bash Miniconda3-
and then press Tab. The name of the file
you just downloaded should appear. If it does not, navigate to the folder where you
downloaded the file, for example with:
cd ~/Downloads
Then, try again.
Press Return. You will follow the text-only prompts. To move through
the text, press Spacebar. Type yes and press enter to approve
the license. Press enter to approve the default location for the files. Type
yes and press enter to prepend Miniconda to your PATH
(this makes the Miniconda distribution the default Python).
Close the terminal window.
R (Optional)
R is a programming language
that is especially powerful for data exploration, visualization, and
statistical analysis. To interact with R, we use
RStudio.
Install R by downloading and running
this .exe file
from CRAN.
Also, please install the
RStudio IDE.
Note that if you have separate user and admin accounts, you should run the
installers as administrator (right-click on .exe file and select "Run as
administrator" instead of double-clicking). Otherwise problems may occur later,
for example when installing R packages.
You can download the binary files for your distribution
from CRAN. Or
you can use your package manager (e.g. for Debian/Ubuntu
run sudo apt-get install r-base and for Fedora run
sudo dnf install R). Also, please install the
RStudio IDE.
SQLite
SQL is a specialized programming language used with databases. We
use a simple database manager called
SQLite in our lessons.
Copy the following curl https://kaust-vislab.github.io/2021-10-12-kaust-vislab/getsql.sh | bash
Paste it into the window that git bash opened. If you're unsure, ask an instructor for help
You should see something like 3.27.2 2019-02-25 16:06:06 ...
If you want to do this manually, download sqlite3, make a bin directory in the user's home directory, unzip sqlite3, move it into the bin directory, and then add the bin directory to the path.
SQLite comes pre-installed on macOS.
SQLite comes pre-installed on Linux.
In case of problems: register for an account at Python Anywhere