mirror of
https://github.com/dathere/100.dathere.com.git
synced 2025-12-18 16:19:26 +00:00
feat: add lesson 4, Running Polars SQL queries with qsv
Some checks failed
deploy-book / deploy-book (push) Has been cancelled
Some checks failed
deploy-book / deploy-book (push) Has been cancelled
This commit is contained in:
parent
8478b127d8
commit
7bea3dd0d2
5 changed files with 282 additions and 0 deletions
2
_toc.yml
2
_toc.yml
|
|
@ -14,4 +14,6 @@ chapters:
|
|||
title: "Lesson 2: Piping commands"
|
||||
- file: lessons/3/index
|
||||
title: "Lesson 3: qsv and JSON"
|
||||
- file: lessons/4/index
|
||||
title: "Lesson 4: Running Polars SQL queries with qsv"
|
||||
- file: appendix
|
||||
|
|
|
|||
6
lessons/4/buses.csv
Normal file
6
lessons/4/buses.csv
Normal file
|
|
@ -0,0 +1,6 @@
|
|||
id,primary_color,secondary_color,length,air_conditioner,amenities
|
||||
001,black,blue,full,true,"wheelchair ramp, tissue boxes, cup holders, USB ports"
|
||||
002,black,red,full,true,"wheelchair ramp, tissue boxes, USB ports"
|
||||
003,white,blue,half,true,"wheelchair ramp, tissue boxes"
|
||||
004,orange,blue,full,false,"wheelchair ramp, tissue boxes, USB ports"
|
||||
005,black,blue,full,true,"wheelchair ramp, tissue boxes, cup holders, USB ports"
|
||||
|
127
lessons/4/exercise.ipynb
Normal file
127
lessons/4/exercise.ipynb
Normal file
|
|
@ -0,0 +1,127 @@
|
|||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Exercise 4: Running Polars SQL queries with qsv\n",
|
||||
"\n",
|
||||
"1. Display all of the data."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"qsv sqlp --help"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"2. Display the first 2 rows of data."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"vscode": {
|
||||
"languageId": "plaintext"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"3. Display all the bus IDs with their lengths and whether they have air conditioning. Then render this output with `qsv table`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"vscode": {
|
||||
"languageId": "plaintext"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"4. Display all bus IDs which have air conditioning. Output the data in JSON format."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"vscode": {
|
||||
"languageId": "plaintext"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"5. Display all bus IDs which have cup holders. Output the data in JSONL format."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"vscode": {
|
||||
"languageId": "plaintext"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"6. Get the count of all buses where the primary color is either black or white."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"vscode": {
|
||||
"languageId": "plaintext"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Bash",
|
||||
"language": "bash",
|
||||
"name": "bash"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": "shell",
|
||||
"file_extension": ".sh",
|
||||
"mimetype": "text/x-sh",
|
||||
"name": "bash"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
147
lessons/4/index.md
Normal file
147
lessons/4/index.md
Normal file
|
|
@ -0,0 +1,147 @@
|
|||
---
|
||||
jupytext:
|
||||
text_representation:
|
||||
extension: .md
|
||||
format_name: myst
|
||||
kernelspec:
|
||||
display_name: Bash
|
||||
language: bash
|
||||
name: bash
|
||||
---
|
||||
|
||||
# Lesson 4: Running Polars SQL queries with qsv
|
||||
|
||||

|
||||
|
||||
The [Polars](https://pola.rs) library is used by qsv to enhance data engineering capabilities. One of the multiple benefits that Polars provides for qsv is the ability to run Polars SQL queries with qsv's `sqlp` command.
|
||||
|
||||
```{code-cell}
|
||||
:tags: ["scroll-output"]
|
||||
qsv sqlp -h
|
||||
```
|
||||
|
||||
There are plenty of example queries you can copy for your usage in the help message of `qsv sqlp` above.
|
||||
|
||||
Note that when you run a query, you may get the shape of the output data from standard error (`stderr`) after the output. For example you may see `(5, 6)` after the output representing 5 rows and 6 columns.
|
||||
|
||||
We'll hide the shape from the output by adding the `-q` or `--quiet` flag in the exercises.
|
||||
|
||||
## Exercise 4: Running Polars SQL queries with qsv
|
||||
|
||||
[](https://mybinder.org/v2/gh/dathere/100.dathere.com/main?labpath=lessons%2F4%2Fexercise.ipynb)
|
||||
|
||||
Use `qsv sqlp` and its options to complete each of the following tasks on the `buses.csv` file (assume the headers are included in the output, otherwise you may usually pipe the output into `qsv behead` if needed):
|
||||
|
||||
1. Display all of the data.
|
||||
2. Display the first 2 rows of data.
|
||||
3. Display all the bus IDs with their lengths and whether they have air conditioning. Then render this output with `qsv table`.
|
||||
4. Display all bus IDs which have air conditioning. Output the data in JSON format.
|
||||
5. Display all bus IDs which have cup holders. Output the data in JSONL format.
|
||||
6. Get the count of all buses where the primary color is either black or white.
|
||||
|
||||
> Here we show the usage text of `qsv sqlp` for your reference. Solve this exercise using [Thebe](exercises-setup:thebe), [Binder](exercises-setup:binder) or [locally](exercises-setup:local).
|
||||
|
||||
```{code-cell}
|
||||
:tags: ["scroll-output"]
|
||||
qsv sqlp --help
|
||||
```
|
||||
|
||||
::::{admonition} Solution for task 1
|
||||
:class: dropdown seealso
|
||||
|
||||
```bash
|
||||
qsv sqlp buses.csv 'SELECT * FROM buses' -q
|
||||
```
|
||||
|
||||
You can also replace `buses` with `_t_1` as per the help message.
|
||||
|
||||
The output should be:
|
||||
|
||||
```csv
|
||||
id,primary_color,secondary_color,length,air_conditioner,amenities
|
||||
1,black,blue,full,true,"wheelchair ramp, tissue boxes, cup holders, USB ports"
|
||||
2,black,red,full,true,"wheelchair ramp, tissue boxes, USB ports"
|
||||
3,white,blue,half,true,"wheelchair ramp, tissue boxes"
|
||||
4,orange,blue,full,false,"wheelchair ramp, tissue boxes, USB ports"
|
||||
5,black,blue,full,true,"wheelchair ramp, tissue boxes, cup holders, USB ports"
|
||||
```
|
||||
|
||||
::::
|
||||
|
||||
::::{admonition} Solution for task 2
|
||||
:class: dropdown seealso
|
||||
|
||||
```bash
|
||||
qsv sqlp buses.csv 'SELECT * FROM buses LIMIT 2' -q
|
||||
```
|
||||
|
||||
```csv
|
||||
id,primary_color,secondary_color,length,air_conditioner,amenities
|
||||
1,black,blue,full,true,"wheelchair ramp, tissue boxes, cup holders, USB ports"
|
||||
2,black,red,full,true,"wheelchair ramp, tissue boxes, USB ports"
|
||||
```
|
||||
|
||||
::::
|
||||
|
||||
::::{admonition} Solution for task 3
|
||||
:class: dropdown seealso
|
||||
|
||||
```bash
|
||||
qsv sqlp buses.csv 'SELECT id,length,air_conditioner FROM buses' -q | qsv table
|
||||
```
|
||||
|
||||
```
|
||||
id length air_conditioner
|
||||
1 full true
|
||||
2 full true
|
||||
3 half true
|
||||
4 full false
|
||||
5 full true
|
||||
```
|
||||
|
||||
::::
|
||||
|
||||
::::{admonition} Solution for task 4
|
||||
:class: dropdown seealso
|
||||
|
||||
```bash
|
||||
qsv sqlp buses.csv "SELECT id FROM buses WHERE air_conditioner = 'true'" --format json -q
|
||||
```
|
||||
|
||||
```json
|
||||
[{"id":1},{"id":2},{"id":3},{"id":5}]
|
||||
```
|
||||
|
||||
::::
|
||||
|
||||
::::{admonition} Solution for task 5
|
||||
:class: dropdown seealso
|
||||
|
||||
```bash
|
||||
qsv sqlp buses.csv "SELECT id FROM buses WHERE amenities ILIKE '%cup holders%'" --format jsonl -q
|
||||
```
|
||||
|
||||
```json
|
||||
{"id":1}
|
||||
{"id":5}
|
||||
```
|
||||
|
||||
::::
|
||||
|
||||
::::{admonition} Solution for task 6
|
||||
:class: dropdown seealso
|
||||
|
||||
```bash
|
||||
qsv sqlp buses.csv "SELECT COUNT(*) FROM buses WHERE primary_color = 'black' OR primary_color = 'white'" -q
|
||||
```
|
||||
|
||||
```csv
|
||||
len
|
||||
4
|
||||
```
|
||||
|
||||
Notice the output is a table with a single column named `len` and a single record with the count of `4`. How can we get just the count `4` as the output?
|
||||
|
||||
One way is to pipe the command into `qsv behead`. Another way may be to not get the count within the SQL query but rather pipe the output into `qsv count`. There are often many ways to solve the same problem with qsv!
|
||||
|
||||
::::
|
||||
BIN
lessons/4/media/sqlp-preview.png
Normal file
BIN
lessons/4/media/sqlp-preview.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 118 KiB |
Loading…
Add table
Add a link
Reference in a new issue