diff --git a/_toc.yml b/_toc.yml index 11525b9..1b2fa36 100644 --- a/_toc.yml +++ b/_toc.yml @@ -14,4 +14,6 @@ chapters: title: "Lesson 2: Piping commands" - file: lessons/3/index title: "Lesson 3: qsv and JSON" + - file: lessons/4/index + title: "Lesson 4: Running Polars SQL queries with qsv" - file: appendix diff --git a/lessons/4/buses.csv b/lessons/4/buses.csv new file mode 100644 index 0000000..0d01fe9 --- /dev/null +++ b/lessons/4/buses.csv @@ -0,0 +1,6 @@ +id,primary_color,secondary_color,length,air_conditioner,amenities +001,black,blue,full,true,"wheelchair ramp, tissue boxes, cup holders, USB ports" +002,black,red,full,true,"wheelchair ramp, tissue boxes, USB ports" +003,white,blue,half,true,"wheelchair ramp, tissue boxes" +004,orange,blue,full,false,"wheelchair ramp, tissue boxes, USB ports" +005,black,blue,full,true,"wheelchair ramp, tissue boxes, cup holders, USB ports" diff --git a/lessons/4/exercise.ipynb b/lessons/4/exercise.ipynb new file mode 100644 index 0000000..e6fdcdb --- /dev/null +++ b/lessons/4/exercise.ipynb @@ -0,0 +1,127 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercise 4: Running Polars SQL queries with qsv\n", + "\n", + "1. Display all of the data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "qsv sqlp --help" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "2. Display the first 2 rows of data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "vscode": { + "languageId": "plaintext" + } + }, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "3. Display all the bus IDs with their lengths and whether they have air conditioning. Then render this output with `qsv table`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "vscode": { + "languageId": "plaintext" + } + }, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "4. Display all bus IDs which have air conditioning. Output the data in JSON format." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "vscode": { + "languageId": "plaintext" + } + }, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "5. Display all bus IDs which have cup holders. Output the data in JSONL format." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "vscode": { + "languageId": "plaintext" + } + }, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "6. Get the count of all buses where the primary color is either black or white." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "vscode": { + "languageId": "plaintext" + } + }, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Bash", + "language": "bash", + "name": "bash" + }, + "language_info": { + "codemirror_mode": "shell", + "file_extension": ".sh", + "mimetype": "text/x-sh", + "name": "bash" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/lessons/4/index.md b/lessons/4/index.md new file mode 100644 index 0000000..e8992be --- /dev/null +++ b/lessons/4/index.md @@ -0,0 +1,147 @@ +--- +jupytext: + text_representation: + extension: .md + format_name: myst +kernelspec: + display_name: Bash + language: bash + name: bash +--- + +# Lesson 4: Running Polars SQL queries with qsv + +![Preview of qsv sqlp help message](./media/sqlp-preview.png) + +The [Polars](https://pola.rs) library is used by qsv to enhance data engineering capabilities. One of the multiple benefits that Polars provides for qsv is the ability to run Polars SQL queries with qsv's `sqlp` command. + +```{code-cell} +:tags: ["scroll-output"] +qsv sqlp -h +``` + +There are plenty of example queries you can copy for your usage in the help message of `qsv sqlp` above. + +Note that when you run a query, you may get the shape of the output data from standard error (`stderr`) after the output. For example you may see `(5, 6)` after the output representing 5 rows and 6 columns. + +We'll hide the shape from the output by adding the `-q` or `--quiet` flag in the exercises. + +## Exercise 4: Running Polars SQL queries with qsv + +[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/dathere/100.dathere.com/main?labpath=lessons%2F4%2Fexercise.ipynb) + +Use `qsv sqlp` and its options to complete each of the following tasks on the `buses.csv` file (assume the headers are included in the output, otherwise you may usually pipe the output into `qsv behead` if needed): + +1. Display all of the data. +2. Display the first 2 rows of data. +3. Display all the bus IDs with their lengths and whether they have air conditioning. Then render this output with `qsv table`. +4. Display all bus IDs which have air conditioning. Output the data in JSON format. +5. Display all bus IDs which have cup holders. Output the data in JSONL format. +6. Get the count of all buses where the primary color is either black or white. + +> Here we show the usage text of `qsv sqlp` for your reference. Solve this exercise using [Thebe](exercises-setup:thebe), [Binder](exercises-setup:binder) or [locally](exercises-setup:local). + +```{code-cell} +:tags: ["scroll-output"] +qsv sqlp --help +``` + +::::{admonition} Solution for task 1 +:class: dropdown seealso + +```bash +qsv sqlp buses.csv 'SELECT * FROM buses' -q +``` + +You can also replace `buses` with `_t_1` as per the help message. + +The output should be: + +```csv +id,primary_color,secondary_color,length,air_conditioner,amenities +1,black,blue,full,true,"wheelchair ramp, tissue boxes, cup holders, USB ports" +2,black,red,full,true,"wheelchair ramp, tissue boxes, USB ports" +3,white,blue,half,true,"wheelchair ramp, tissue boxes" +4,orange,blue,full,false,"wheelchair ramp, tissue boxes, USB ports" +5,black,blue,full,true,"wheelchair ramp, tissue boxes, cup holders, USB ports" +``` + +:::: + +::::{admonition} Solution for task 2 +:class: dropdown seealso + +```bash +qsv sqlp buses.csv 'SELECT * FROM buses LIMIT 2' -q +``` + +```csv +id,primary_color,secondary_color,length,air_conditioner,amenities +1,black,blue,full,true,"wheelchair ramp, tissue boxes, cup holders, USB ports" +2,black,red,full,true,"wheelchair ramp, tissue boxes, USB ports" +``` + +:::: + +::::{admonition} Solution for task 3 +:class: dropdown seealso + +```bash +qsv sqlp buses.csv 'SELECT id,length,air_conditioner FROM buses' -q | qsv table +``` + +``` +id length air_conditioner +1 full true +2 full true +3 half true +4 full false +5 full true +``` + +:::: + +::::{admonition} Solution for task 4 +:class: dropdown seealso + +```bash +qsv sqlp buses.csv "SELECT id FROM buses WHERE air_conditioner = 'true'" --format json -q +``` + +```json +[{"id":1},{"id":2},{"id":3},{"id":5}] +``` + +:::: + +::::{admonition} Solution for task 5 +:class: dropdown seealso + +```bash +qsv sqlp buses.csv "SELECT id FROM buses WHERE amenities ILIKE '%cup holders%'" --format jsonl -q +``` + +```json +{"id":1} +{"id":5} +``` + +:::: + +::::{admonition} Solution for task 6 +:class: dropdown seealso + +```bash +qsv sqlp buses.csv "SELECT COUNT(*) FROM buses WHERE primary_color = 'black' OR primary_color = 'white'" -q +``` + +```csv +len +4 +``` + +Notice the output is a table with a single column named `len` and a single record with the count of `4`. How can we get just the count `4` as the output? + +One way is to pipe the command into `qsv behead`. Another way may be to not get the count within the SQL query but rather pipe the output into `qsv count`. There are often many ways to solve the same problem with qsv! + +:::: diff --git a/lessons/4/media/sqlp-preview.png b/lessons/4/media/sqlp-preview.png new file mode 100644 index 0000000..7537723 Binary files /dev/null and b/lessons/4/media/sqlp-preview.png differ