Sierra SQL Tips And Tricks ¶

Contents

Sierra SQL Tips And Tricks

TODO: MOAR!

Tables of Note ¶

`sierra_view.record_metadata`¶

This table is very useful as it contains:
- record_type_code: defines record as being bib (b), item (i), patron (p), etc.
- record_num: record number used by the SDA and the webpac / encore
- creation_date_gmt: date the record entered the system (different from a cataloged date)
- deletion_date_gmt: deleted records will have no other information in the system other than info left in this table
- record_last_updated_gmt : when the record was last modified
- campus_code: useful for telling if the record is from ILL
  - Note that this field won’t have a NULL value; instead, it’s blank or '' in cases where there is no campus_code
It’s useful to JOIN to this table, as it can be used to filter out records that don’t “belong” to the library (ILL) using the campus_code as shown below (note the use of campus_code being blank as the filtering condition in the WHERE clause):

-- find titles that have been cataloged in the last 7 days
SELECT
r.record_type_code || r.record_num || 'a' as bib_record_num,
r.creation_date_gmt,
b.cataloging_date_gmt
FROM
sierra_view.record_metadata as r
JOIN
sierra_view.bib_record as b
ON
  b.record_id = r.id
WHERE
-- concatinating these together will exclude bib records that are ILL
r.record_type_code || r.campus_code = 'b'
AND r.deletion_date_gmt IS NULL
AND b.is_suppressed IS FALSE
AND AGE(b.cataloging_date_gmt) <= '7 days'::INTERVAL

sierra_view.record_metadata is also useful for obtaining the record_num from a record_id (record_num is used in the SDA, webpac / encore for identifying a record). We can JOIN sierra_view.record_metadata, appending the columns (generic checkdigit character, a can be added to the end for cases where a checkdigit is needed):

sierra_view.record_type_code || sierra_view.record_number || 'a'

Example:

SELECT
r.record_type_code || r.record_num || 'a' as record_number

FROM
sierra_view.record_metadata as r

WHERE
r.record_type_code || r.campus_code = 'p'
and r.deletion_date_gmt is null

limit 5

`sierra_view.record`, `sierra_view.property`¶

Each record type has one of each table
- bib_view, bib_record, bib_record_property
Record table contains majority of fixed fields
Record_property table contains additional descriptive fields
- Including useful values such as call_number, title and barcode

`sierra_view.*view`¶

View table combines fields from multiple tables Convenience comes at the expense of efficiency (use them sparingly, if at all if possible!)

Example: sierra_view.bib_view combines data from multiple tables, but query speeds can be slower than accessing other tables more directly.

SELECT
*
FROM
sierra_view.bib_view

LIMIT 1000

(1.6 seconds)

VS

SELECT
*
FROM
sierra_view.bib_record as b

JOIN
sierra_view.bib_record_property as p
ON
  p.bib_record_id = b.record_id

LIMIT 1000

(543 milliseconds)

`sierra_view.*myuser`¶

A my_user table exists for each fixed field in the system
Contains code and name values for their respective field
Use to provide translations for system codes

SELECT
p.name
FROM
sierra_view.item_record i
JOIN
sierra_view.itype_property_myuser p
ON i.itype_code_num = p.code

`sierra_view.iii user_desktop_option`¶

desktop_option_id isn’t documented, but you may be able to figure out the option by a combination of value, and iii_user_id or user_name … good luck!

DROP TABLE IF EXISTS temp_users;

CREATE TEMP TABLE temp_users as
SELECT
iii_user_id,
user_name
FROM
sierra_view.iii_user_application_myuser

GROUP BY
iii_user_id,
user_name
;

DROP INDEX IF EXISTS temp_users_iii_user_id;
CREATE INDEX temp_users_iii_user_id ON temp_users (iii_user_id);

SELECT
t.user_name,
t.iii_user_id,
o.desktop_option_id,
o.value

FROM
sierra_view.iii_user_desktop_option as o

LEFT OUTER JOIN
temp_users as t
ON
  t.iii_user_id = o.iii_user_id

ORDER BY
t.user_name,
o.desktop_option_id
;

`sierra_view.varfield`¶

using occ_num is a good method for picking the first value of (possibly) repeated values (especially where you may not be expecting or wanting multiple values)

--- get patron record id and the first barcode listed on their account
SELECT
r.record_id,
(
    SELECT
    v.field_content
    FROM
    sierra_view.varfield as v
    WHERE
    v.record_id = r.record_id
    AND v.varfield_type_code = 'b'
    ORDER BY
    v.occ_num

    LIMIT 1
) as barcode
FROM
sierra_view.patron_record as r

LIMIT 100

`sierra_view.bool_info`¶

This table has a column called sql_query that could be useful for seeing the underlying SQL for create list searches.

--Count the number of owning libraries for each title in a review file
--Pass query a review file of items to be weeded
--Check to see which other locations own those titles and how they've performed system wide prior to deleting records

SELECT
id2reckey(b.id) || 'a' AS "bib_record",
b.title AS "bib_title",
count(bi.id) AS "item_count",
count(distinct(i.agency_code_num)) AS "location_count",
array_to_string(
    ARRAY(
        SELECT 
        loc.location_code
        FROM
        sierra_view.bib_record_location AS loc
        WHERE 
        b.id = loc.bib_record_id
        ORDER BY 1
    ),
    ','
) AS "location codes",
SUM(i.checkout_total) AS "total checkouts",
SUM(i.renewal_total) AS "total renewals"
FROM
sierra_view.bib_view as b 
JOIN sierra_view.bool_set as sb ON b.id = sb.record_metadata_id AND sb.bool_info_id = '369'
JOIN Sierra_view.bib_record_item_record_link as bi ON b.id = bi.bib_record_id
JOIN sierra_view.item_record as i ON bi.item_record_id = i.id
JOIN sierra_view.record_metadata AS rm ON bi.item_record_id = rm.id
group by 1, 2, 5;

Account for possible NULL Values ¶

NULL is a special value, and you should account for cases where a value can be NULL, as it can result in unexpected behavior.

Use CONCAT() function, which is NULL safe, or COALESCE() function, which returns the first non-null value.

SELECT
-- string concatenatination with operator ||
NULL || 'where did this go?!?!' AS ex_1,

-- string concatenatination with operator ||
COALESCE(NULL, '', 'we don''t get here') || 'remember that COALESCE returns first non-null value!' AS ex_2,

-- string concatenation with CONCAT function (which is NULL safe)
CONCAT(NULL, 'mind your ', NULL, 'values!') AS ex_3

Output:

ex1	ex2	ex3
`NULL`	“remember that COALESCE returns first non-null value!”	“mind your values!”

Picking the Correct Key to JOIN On ¶

Picking the correct primary / foreign keys for joins can be tricky.

Make sure you carefully examine the sierra DNA documentation
Columns named id generally are generally the primary key, as is the case in sierra_view.record_metadata. This means that it’ll appear in other tables as the foreign key. For example:

INCORRECT:

SELECT
r.id,
b.id,
r.record_num
FROM
sierra_view.record_metadata as r

JOIN
sierra_view.bib_record as b
ON
  -- THIS IS NOT CORRECT AND WILL RETURN BAD DATA!!!
  b.id = r.id

LIMIT 100
;

CORRECT:

SELECT
r.id,
b.id,
r.record_num
FROM
sierra_view.record_metadata as r

JOIN
sierra_view.bib_record as b
ON
  -- THIS IS CORRECT (foreign key, `record_id` JOINed to primary key, `id` )
  b.record_id = r.id

LIMIT 100
;

Unique SQL Queries (can’t easily be done via the SDA / CreateLists)¶

-- identifies order records across accounting units that lack a location code
-- due to downloading incomplete data from a vendor
SELECT
r.record_type_code || r.record_num || 'a' as order_record_num,
r.creation_date_gmt,
o.accounting_unit_code_num

FROM
sierra_view.order_record as o

JOIN
sierra_view.record_metadata as r
ON
  r.id = o.record_id

JOIN
sierra_view.order_record_cmf as c
ON
  c.order_record_id = o.record_id

WHERE
c.location_code = 'none'
;

Using Indexes Effectively ¶

Using an index can significantly speed up your query.

We can determine what indexes exist (or don’t exist) on table with a query similar to the following:

-- find indexes on phrase_entry ...
-- or, find all indexes by removing
-- tablename = 'phrase_entry'
-- and uncommenting tablename NOTE LIKE 'pg%'
--
SELECT
*
FROM
pg_indexes
WHERE
tablename = 'phrase_entry'
-- tablename NOT LIKE 'pg%'
ORDER BY
schemaname,
tablename,
indexname
;

which tells us that one of the following indexes exist:

CREATE INDEX idx_phrase_entry ON iiirecord.phrase_entry USING btree ((((index_tag)::text || (index_entry)::text)), type2, insert_title, record_key)

The following query will take advantage of the index, and result in a fast, efficient search (this particular example will search for barcode and return the record it is associated with):

SELECT
e.record_id
FROM
sierra_view.phrase_entry as e
WHERE
e.index_tag || e.index_entry = 'b' || LOWER('A000052469475')
-- 64 msec execution time
-- EXPLAIN:
-- Index Scan using idx_phrase_entry on phrase_entry  (cost=0.69..586.22 rows=143 width=8)
--  Index Cond: (((index_tag)::text || (index_entry)::text) = 'ba000052469475'::text)

This is the INCORRECT way (just to demonstrate how drastic the difference is of using an index VS not using one)

SELECT
e.record_id
FROM
sierra_view.phrase_entry as e
WHERE
e.index_tag = 'b' 
AND e.index_entry = LOWER('A000052469475')
-- 28 secs execution time!!!!
-- EXPLAIN:
-- Seq Scan on phrase_entry  (cost=0.00..1783427.78 rows=20 width=8)
--  Filter: (((index_tag)::text = 'b'::text) AND ((index_entry)::text = 'a000052469475'::text))

--
-- sequential scan = bad!

Unexpected One-to-Many Situations ¶

Sometimes we only expect 1 row of data to be returned, but we get multiple rows back.

For example, this query:

SELECT
p.record_id,
v.field_content

FROM
sierra_view.patron_record as p

JOIN
sierra_view.varfield as v
ON
  v.record_id = p.record_id
  AND v.varfield_type_code = 'b'

WHERE
p.record_id = 481037682097

returns the following data:

481037682097;12345679
481037682097;12345678 / PREV ID

If we only want one row of data per patron record in our output, we have two options.

1 Grab the first occurrence of the barcode only:

SELECT
p.record_id,
(
    -- subquery grabs the first occurrence of the barcode
    SELECT
    v.field_content
    FROM
    sierra_view.varfield as v
    WHERE
    v.record_id = p.record_id
    AND v.varfield_type_code = 'b'
    ORDER BY
    v.occ_num
    LIMIT 1
) as field_content

FROM
sierra_view.patron_record as p

WHERE
p.record_id = 481037682097

481037682097;12345679

2 Aggregate all the barcodes into one column of data:

SELECT
p.record_id,
(
    -- subquery grabs all barcodes
    SELECT
    string_agg(v.field_content, ',' ORDER BY v.occ_num)
    FROM
    sierra_view.varfield as v
    WHERE
    v.record_id = p.record_id
    AND v.varfield_type_code = 'b'
) as fields_content

FROM
sierra_view.patron_record as p

WHERE
p.record_id = 481037682097

481037682097;12345679,12345678 / PREV ID

Helpful / useful queries ¶

Get info about indexes / tables postgres database ¶

-- find indexes!
-- NOTE that these indexes are defined not on the view,
-- but on the iiirecord. The tables and views are assumed 
-- to be similarly named, but that's not guaranteed
SELECT
*
FROM
pg_indexes
WHERE
tablename = 'phrase_entry'
-- tablename NOT LIKE 'pg%' -- use this to find ALL indexes
ORDER BY
schemaname,
tablename,
indexname
;

-- show detailed view of `sierra_view` table_schema
SELECT
*
FROM
information_schema.columns

WHERE
table_schema = 'sierra_view'

ORDER BY
table_name,
ordinal_position
;

-- show all tables
SELECT
*
FROM
pg_catalog.pg_tables
ORDER BY
tablename
;

Additional Resources ¶

Sierra Specific SQL Collections
- Ray Voelker - Public Library of Cincinnati And Hamilton County Sierra SQL Wiki: github.com/plch/sierra-sql/wiki
- Jeremy Goldstein - Minuteman Library Network Sierra SQL Wiki: github.com/jmgold/SQL-Queries/wiki
- UNC Libraries Sierra SQL Wiki: github.com/UNC-Libraries/III-Sierra-SQL/wiki
General SQL Tutorials / Instruction
- Codecademy: https://www.codecademy.com/catalog/language/sql
- WiseOwl Training Videos: https://www.wiseowl.co.uk/sql/videos sql-selecting-data/
- Stanford free online mini-courses in SQL: https://lagunita.stanford.edu/courses/DB/2014/SelfPaced/about