# Sierra SQL Tips And Tricks

```{contents}

```

TODO: MOAR!

## Tables of Note

### `sierra_view.record_metadata`

* This table is *very* useful as it contains:
  * `record_type_code`: defines record as being bib (b), item (i), patron (p), etc.
  * `record_num`: record number used by the SDA and the webpac / encore
  * `creation_date_gmt`: date the record entered the system (different from a cataloged date)
  * `deletion_date_gmt`: deleted records will have no other information in the system other than info left in this table
  * `record_last_updated_gmt` : when the record was last modified
  * `campus_code`: useful for telling if the record is from ILL
    * **Note** that this field **won't** have a `NULL` value; instead, it's blank or `''` in cases where there is no `campus_code`

* It's useful to JOIN to this table, as it can be used to filter out records that don't "belong" to the library (ILL) using the `campus_code` as shown below (note the use of `campus_code` being blank as the filtering condition in the WHERE clause):

```sql
-- find titles that have been cataloged in the last 7 days
SELECT
r.record_type_code || r.record_num || 'a' as bib_record_num,
r.creation_date_gmt,
b.cataloging_date_gmt
FROM
sierra_view.record_metadata as r
JOIN
sierra_view.bib_record as b
ON
  b.record_id = r.id
WHERE
-- concatinating these together will exclude bib records that are ILL
r.record_type_code || r.campus_code = 'b'
AND r.deletion_date_gmt IS NULL
AND b.is_suppressed IS FALSE
AND AGE(b.cataloging_date_gmt) <= '7 days'::INTERVAL
```

`sierra_view.record_metadata` is also useful for obtaining the `record_num` from a `record_id` (`record_num` is used in the SDA, webpac / encore for identifying a record). We can JOIN `sierra_view.record_metadata`, appending the columns (generic checkdigit character, `a` can be added to the end for cases where a checkdigit is needed):

```sql
sierra_view.record_type_code || sierra_view.record_number || 'a'
```

Example:

```sql
SELECT
r.record_type_code || r.record_num || 'a' as record_number

FROM
sierra_view.record_metadata as r

WHERE
r.record_type_code || r.campus_code = 'p'
and r.deletion_date_gmt is null

limit 5
```


### `sierra_view.*record`, `sierra_view.*property`

* Each record type has one of each table
  * bib_view, bib_record, bib_record_property
* Record table contains majority of fixed fields
* Record_property table contains additional descriptive fields
  * Including useful values such as call_number, title and barcode

### `sierra_view.*view`

* View table combines fields from multiple tables Convenience comes at the expense of efficiency (use them sparingly, if at all if possible!)

Example: `sierra_view.bib_view` combines data from multiple tables, but query speeds can be slower than accessing other tables more directly.

```sql
SELECT
*
FROM
sierra_view.bib_view

LIMIT 1000
```

`(1.6 seconds)`

VS

```sql
SELECT
*
FROM
sierra_view.bib_record as b

JOIN
sierra_view.bib_record_property as p
ON
  p.bib_record_id = b.record_id

LIMIT 1000
```

`(543 milliseconds)`

### `sierra_view.*myuser`

* A `my_user` table exists for each fixed field in the system
* Contains code and name values for their respective field
* Use to provide translations for system codes

```sql
SELECT
p.name
FROM
sierra_view.item_record i
JOIN
sierra_view.itype_property_myuser p
ON i.itype_code_num = p.code
```

### `sierra_view.iii user_desktop_option`

* `desktop_option_id` isn't documented, but you may be able to figure out the option by a combination of `value`, and `iii_user_id` or `user_name` ... good luck!

```sql
DROP TABLE IF EXISTS temp_users;

CREATE TEMP TABLE temp_users as
SELECT
iii_user_id,
user_name
FROM
sierra_view.iii_user_application_myuser

GROUP BY
iii_user_id,
user_name
;

DROP INDEX IF EXISTS temp_users_iii_user_id;
CREATE INDEX temp_users_iii_user_id ON temp_users (iii_user_id);

SELECT
t.user_name,
t.iii_user_id,
o.desktop_option_id,
o.value

FROM
sierra_view.iii_user_desktop_option as o

LEFT OUTER JOIN
temp_users as t
ON
  t.iii_user_id = o.iii_user_id

ORDER BY
t.user_name,
o.desktop_option_id
;
```

### `sierra_view.varfield`

* using `occ_num` is a good method for picking the first value of (possibly) repeated values (especially where you may not be expecting or wanting multiple values)

```sql
--- get patron record id and the first barcode listed on their account
SELECT
r.record_id,
(
    SELECT
    v.field_content
    FROM
    sierra_view.varfield as v
    WHERE
    v.record_id = r.record_id
    AND v.varfield_type_code = 'b'
    ORDER BY
    v.occ_num

    LIMIT 1
) as barcode
FROM
sierra_view.patron_record as r

LIMIT 100
```

### `sierra_view.bool_info`

* This table has a column called `sql_query` that could be useful for seeing the underlying SQL for create list searches.

```sql
--Count the number of owning libraries for each title in a review file
--Pass query a review file of items to be weeded
--Check to see which other locations own those titles and how they've performed system wide prior to deleting records

SELECT
id2reckey(b.id) || 'a' AS "bib_record",
b.title AS "bib_title",
count(bi.id) AS "item_count",
count(distinct(i.agency_code_num)) AS "location_count",
array_to_string(
    ARRAY(
        SELECT 
        loc.location_code
        FROM
        sierra_view.bib_record_location AS loc
        WHERE 
        b.id = loc.bib_record_id
        ORDER BY 1
    ),
    ','
) AS "location codes",
SUM(i.checkout_total) AS "total checkouts",
SUM(i.renewal_total) AS "total renewals"
FROM
sierra_view.bib_view as b 
JOIN sierra_view.bool_set as sb ON b.id = sb.record_metadata_id AND sb.bool_info_id = '369'
JOIN Sierra_view.bib_record_item_record_link as bi ON b.id = bi.bib_record_id
JOIN sierra_view.item_record as i ON bi.item_record_id = i.id
JOIN sierra_view.record_metadata AS rm ON bi.item_record_id = rm.id
group by 1, 2, 5;
```

___

## Account for possible NULL Values

`NULL` is a special value, and you should account for cases where a value can be `NULL`, as it can result in unexpected behavior.

Use `CONCAT()` function, which is NULL safe, or `COALESCE()` function, which returns the **first** non-null value.

```sql
SELECT
-- string concatenatination with operator ||
NULL || 'where did this go?!?!' AS ex_1,

-- string concatenatination with operator ||
COALESCE(NULL, '', 'we don''t get here') || 'remember that COALESCE returns first non-null value!' AS ex_2,

-- string concatenation with CONCAT function (which is NULL safe)
CONCAT(NULL, 'mind your ', NULL, 'values!') AS ex_3
```

Output:

| ex1 | ex2 | ex3 |
| --- | --- | --- |
| `NULL` | "remember that COALESCE returns first non-null value!" | "mind your values!" |

___

## Picking the Correct Key to JOIN On

Picking the correct primary / foreign keys for joins can be tricky.

* Make sure you carefully examine the sierra DNA documentation
* Columns named `id` generally are generally the **primary key**, as is the case in `sierra_view.record_metadata`. This means that it'll appear in other tables as the **foreign key**. For example:

**INCORRECT:**

```sql
SELECT
r.id,
b.id,
r.record_num
FROM
sierra_view.record_metadata as r

JOIN
sierra_view.bib_record as b
ON
  -- THIS IS NOT CORRECT AND WILL RETURN BAD DATA!!!
  b.id = r.id

LIMIT 100
;
```

**CORRECT:**

```sql
SELECT
r.id,
b.id,
r.record_num
FROM
sierra_view.record_metadata as r

JOIN
sierra_view.bib_record as b
ON
  -- THIS IS CORRECT (foreign key, `record_id` JOINed to primary key, `id` )
  b.record_id = r.id

LIMIT 100
;
```

___

## Unique SQL Queries (can't easily be done via the SDA / CreateLists)

```sql
-- identifies order records across accounting units that lack a location code
-- due to downloading incomplete data from a vendor
SELECT
r.record_type_code || r.record_num || 'a' as order_record_num,
r.creation_date_gmt,
o.accounting_unit_code_num

FROM
sierra_view.order_record as o

JOIN
sierra_view.record_metadata as r
ON
  r.id = o.record_id

JOIN
sierra_view.order_record_cmf as c
ON
  c.order_record_id = o.record_id

WHERE
c.location_code = 'none'
;
```

___

## Using Indexes Effectively

Using an index can significantly speed up your query.

We can determine what indexes exist (or don't exist) on table with a query similar to the following:

```sql
-- find indexes on phrase_entry ...
-- or, find all indexes by removing
-- tablename = 'phrase_entry'
-- and uncommenting tablename NOTE LIKE 'pg%'
--
SELECT
*
FROM
pg_indexes
WHERE
tablename = 'phrase_entry'
-- tablename NOT LIKE 'pg%'
ORDER BY
schemaname,
tablename,
indexname
;
```

which tells us that one of the following indexes exist:

```sql
CREATE INDEX idx_phrase_entry ON iiirecord.phrase_entry USING btree ((((index_tag)::text || (index_entry)::text)), type2, insert_title, record_key)
```

The following query will take advantage of the index, and result in a fast, efficient search (this particular example will search for barcode and return the record it is associated with):

```sql
SELECT
e.record_id
FROM
sierra_view.phrase_entry as e
WHERE
e.index_tag || e.index_entry = 'b' || LOWER('A000052469475')
-- 64 msec execution time
-- EXPLAIN:
-- Index Scan using idx_phrase_entry on phrase_entry  (cost=0.69..586.22 rows=143 width=8)
--  Index Cond: (((index_tag)::text || (index_entry)::text) = 'ba000052469475'::text)
```

This is the INCORRECT way (just to demonstrate how drastic the difference is of using an index VS not using one)

```sql
SELECT
e.record_id
FROM
sierra_view.phrase_entry as e
WHERE
e.index_tag = 'b' 
AND e.index_entry = LOWER('A000052469475')
-- 28 secs execution time!!!!
-- EXPLAIN:
-- Seq Scan on phrase_entry  (cost=0.00..1783427.78 rows=20 width=8)
--  Filter: (((index_tag)::text = 'b'::text) AND ((index_entry)::text = 'a000052469475'::text))

--
-- sequential scan = bad!

```

___

## Unexpected One-to-Many Situations

Sometimes we only expect 1 row of data to be returned, but we get multiple rows back.

For example, this query:

```sql
SELECT
p.record_id,
v.field_content

FROM
sierra_view.patron_record as p

JOIN
sierra_view.varfield as v
ON
  v.record_id = p.record_id
  AND v.varfield_type_code = 'b'

WHERE
p.record_id = 481037682097
```

returns the following data:

```text
481037682097;12345679
481037682097;12345678 / PREV ID
```

If we only want one row of data per patron record in our output, we have two options.

1 Grab the first occurrence of the barcode only:

```sql
SELECT
p.record_id,
(
    -- subquery grabs the first occurrence of the barcode
    SELECT
    v.field_content
    FROM
    sierra_view.varfield as v
    WHERE
    v.record_id = p.record_id
    AND v.varfield_type_code = 'b'
    ORDER BY
    v.occ_num
    LIMIT 1
) as field_content

FROM
sierra_view.patron_record as p

WHERE
p.record_id = 481037682097
```

```text
481037682097;12345679
```

2 Aggregate all the barcodes into one column of data:

```sql
SELECT
p.record_id,
(
    -- subquery grabs all barcodes
    SELECT
    string_agg(v.field_content, ',' ORDER BY v.occ_num)
    FROM
    sierra_view.varfield as v
    WHERE
    v.record_id = p.record_id
    AND v.varfield_type_code = 'b'
) as fields_content

FROM
sierra_view.patron_record as p

WHERE
p.record_id = 481037682097
```

```text
481037682097;12345679,12345678 / PREV ID
```

___

## Helpful / useful queries

### Get info about indexes / tables postgres database

```sql
-- find indexes!
-- NOTE that these indexes are defined not on the view,
-- but on the iiirecord. The tables and views are assumed 
-- to be similarly named, but that's not guaranteed
SELECT
*
FROM
pg_indexes
WHERE
tablename = 'phrase_entry'
-- tablename NOT LIKE 'pg%' -- use this to find ALL indexes
ORDER BY
schemaname,
tablename,
indexname
;
```

```sql
-- show detailed view of `sierra_view` table_schema
SELECT
*
FROM
information_schema.columns

WHERE
table_schema = 'sierra_view'

ORDER BY
table_name,
ordinal_position
;

```

```sql
-- show all tables
SELECT
*
FROM
pg_catalog.pg_tables
ORDER BY
tablename
;
```

___

## Additional Resources

* Sierra Specific SQL Collections
  * **Ray Voelker** - Public Library of Cincinnati And Hamilton County Sierra SQL Wiki: [github.com/plch/sierra-sql/wiki](https://github.com/plch/sierra-sql/wiki)
  * **Jeremy Goldstein** - Minuteman Library Network Sierra SQL Wiki: [github.com/jmgold/SQL-Queries/wiki](https://github.com/jmgold/SQL-Queries/wiki)
  * **UNC Libraries** Sierra SQL Wiki: [github.com/UNC-Libraries/III-Sierra-SQL/wiki](https://github.com/UNC-Libraries/III-Sierra-SQL/wiki)
* General SQL Tutorials / Instruction
  * Codecademy: https://www.codecademy.com/catalog/language/sql
  * WiseOwl Training Videos: https://www.wiseowl.co.uk/sql/videos sql-selecting-data/
  * Stanford free online mini-courses in SQL: https://lagunita.stanford.edu/courses/DB/2014/SelfPaced/about