Sierra SQL Tips And Tricks¶
TODO: MOAR!
Tables of Note¶
sierra_view.record_metadata
¶
This table is very useful as it contains:
record_type_code
: defines record as being bib (b), item (i), patron (p), etc.record_num
: record number used by the SDA and the webpac / encorecreation_date_gmt
: date the record entered the system (different from a cataloged date)deletion_date_gmt
: deleted records will have no other information in the system other than info left in this tablerecord_last_updated_gmt
: when the record was last modifiedcampus_code
: useful for telling if the record is from ILLNote that this field won’t have a
NULL
value; instead, it’s blank or''
in cases where there is nocampus_code
It’s useful to JOIN to this table, as it can be used to filter out records that don’t “belong” to the library (ILL) using the
campus_code
as shown below (note the use ofcampus_code
being blank as the filtering condition in the WHERE clause):
-- find titles that have been cataloged in the last 7 days
SELECT
r.record_type_code || r.record_num || 'a' as bib_record_num,
r.creation_date_gmt,
b.cataloging_date_gmt
FROM
sierra_view.record_metadata as r
JOIN
sierra_view.bib_record as b
ON
b.record_id = r.id
WHERE
-- concatinating these together will exclude bib records that are ILL
r.record_type_code || r.campus_code = 'b'
AND r.deletion_date_gmt IS NULL
AND b.is_suppressed IS FALSE
AND AGE(b.cataloging_date_gmt) <= '7 days'::INTERVAL
sierra_view.record_metadata
is also useful for obtaining the record_num
from a record_id
(record_num
is used in the SDA, webpac / encore for identifying a record). We can JOIN sierra_view.record_metadata
, appending the columns (generic checkdigit character, a
can be added to the end for cases where a checkdigit is needed):
sierra_view.record_type_code || sierra_view.record_number || 'a'
Example:
SELECT
r.record_type_code || r.record_num || 'a' as record_number
FROM
sierra_view.record_metadata as r
WHERE
r.record_type_code || r.campus_code = 'p'
and r.deletion_date_gmt is null
limit 5
sierra_view.*record
, sierra_view.*property
¶
Each record type has one of each table
bib_view, bib_record, bib_record_property
Record table contains majority of fixed fields
Record_property table contains additional descriptive fields
Including useful values such as call_number, title and barcode
sierra_view.*view
¶
View table combines fields from multiple tables Convenience comes at the expense of efficiency (use them sparingly, if at all if possible!)
Example: sierra_view.bib_view
combines data from multiple tables, but query speeds can be slower than accessing other tables more directly.
SELECT
*
FROM
sierra_view.bib_view
LIMIT 1000
(1.6 seconds)
VS
SELECT
*
FROM
sierra_view.bib_record as b
JOIN
sierra_view.bib_record_property as p
ON
p.bib_record_id = b.record_id
LIMIT 1000
(543 milliseconds)
sierra_view.*myuser
¶
A
my_user
table exists for each fixed field in the systemContains code and name values for their respective field
Use to provide translations for system codes
SELECT
p.name
FROM
sierra_view.item_record i
JOIN
sierra_view.itype_property_myuser p
ON i.itype_code_num = p.code
sierra_view.iii user_desktop_option
¶
desktop_option_id
isn’t documented, but you may be able to figure out the option by a combination ofvalue
, andiii_user_id
oruser_name
… good luck!
DROP TABLE IF EXISTS temp_users;
CREATE TEMP TABLE temp_users as
SELECT
iii_user_id,
user_name
FROM
sierra_view.iii_user_application_myuser
GROUP BY
iii_user_id,
user_name
;
DROP INDEX IF EXISTS temp_users_iii_user_id;
CREATE INDEX temp_users_iii_user_id ON temp_users (iii_user_id);
SELECT
t.user_name,
t.iii_user_id,
o.desktop_option_id,
o.value
FROM
sierra_view.iii_user_desktop_option as o
LEFT OUTER JOIN
temp_users as t
ON
t.iii_user_id = o.iii_user_id
ORDER BY
t.user_name,
o.desktop_option_id
;
sierra_view.varfield
¶
using
occ_num
is a good method for picking the first value of (possibly) repeated values (especially where you may not be expecting or wanting multiple values)
--- get patron record id and the first barcode listed on their account
SELECT
r.record_id,
(
SELECT
v.field_content
FROM
sierra_view.varfield as v
WHERE
v.record_id = r.record_id
AND v.varfield_type_code = 'b'
ORDER BY
v.occ_num
LIMIT 1
) as barcode
FROM
sierra_view.patron_record as r
LIMIT 100
sierra_view.bool_info
¶
This table has a column called
sql_query
that could be useful for seeing the underlying SQL for create list searches.
--Count the number of owning libraries for each title in a review file
--Pass query a review file of items to be weeded
--Check to see which other locations own those titles and how they've performed system wide prior to deleting records
SELECT
id2reckey(b.id) || 'a' AS "bib_record",
b.title AS "bib_title",
count(bi.id) AS "item_count",
count(distinct(i.agency_code_num)) AS "location_count",
array_to_string(
ARRAY(
SELECT
loc.location_code
FROM
sierra_view.bib_record_location AS loc
WHERE
b.id = loc.bib_record_id
ORDER BY 1
),
','
) AS "location codes",
SUM(i.checkout_total) AS "total checkouts",
SUM(i.renewal_total) AS "total renewals"
FROM
sierra_view.bib_view as b
JOIN sierra_view.bool_set as sb ON b.id = sb.record_metadata_id AND sb.bool_info_id = '369'
JOIN Sierra_view.bib_record_item_record_link as bi ON b.id = bi.bib_record_id
JOIN sierra_view.item_record as i ON bi.item_record_id = i.id
JOIN sierra_view.record_metadata AS rm ON bi.item_record_id = rm.id
group by 1, 2, 5;
Account for possible NULL Values¶
NULL
is a special value, and you should account for cases where a value can be NULL
, as it can result in unexpected behavior.
Use CONCAT()
function, which is NULL safe, or COALESCE()
function, which returns the first non-null value.
SELECT
-- string concatenatination with operator ||
NULL || 'where did this go?!?!' AS ex_1,
-- string concatenatination with operator ||
COALESCE(NULL, '', 'we don''t get here') || 'remember that COALESCE returns first non-null value!' AS ex_2,
-- string concatenation with CONCAT function (which is NULL safe)
CONCAT(NULL, 'mind your ', NULL, 'values!') AS ex_3
Output:
ex1 |
ex2 |
ex3 |
---|---|---|
|
“remember that COALESCE returns first non-null value!” |
“mind your values!” |
Picking the Correct Key to JOIN On¶
Picking the correct primary / foreign keys for joins can be tricky.
Make sure you carefully examine the sierra DNA documentation
Columns named
id
generally are generally the primary key, as is the case insierra_view.record_metadata
. This means that it’ll appear in other tables as the foreign key. For example:
INCORRECT:
SELECT
r.id,
b.id,
r.record_num
FROM
sierra_view.record_metadata as r
JOIN
sierra_view.bib_record as b
ON
-- THIS IS NOT CORRECT AND WILL RETURN BAD DATA!!!
b.id = r.id
LIMIT 100
;
CORRECT:
SELECT
r.id,
b.id,
r.record_num
FROM
sierra_view.record_metadata as r
JOIN
sierra_view.bib_record as b
ON
-- THIS IS CORRECT (foreign key, `record_id` JOINed to primary key, `id` )
b.record_id = r.id
LIMIT 100
;
Unique SQL Queries (can’t easily be done via the SDA / CreateLists)¶
-- identifies order records across accounting units that lack a location code
-- due to downloading incomplete data from a vendor
SELECT
r.record_type_code || r.record_num || 'a' as order_record_num,
r.creation_date_gmt,
o.accounting_unit_code_num
FROM
sierra_view.order_record as o
JOIN
sierra_view.record_metadata as r
ON
r.id = o.record_id
JOIN
sierra_view.order_record_cmf as c
ON
c.order_record_id = o.record_id
WHERE
c.location_code = 'none'
;
Using Indexes Effectively¶
Using an index can significantly speed up your query.
We can determine what indexes exist (or don’t exist) on table with a query similar to the following:
-- find indexes on phrase_entry ...
-- or, find all indexes by removing
-- tablename = 'phrase_entry'
-- and uncommenting tablename NOTE LIKE 'pg%'
--
SELECT
*
FROM
pg_indexes
WHERE
tablename = 'phrase_entry'
-- tablename NOT LIKE 'pg%'
ORDER BY
schemaname,
tablename,
indexname
;
which tells us that one of the following indexes exist:
CREATE INDEX idx_phrase_entry ON iiirecord.phrase_entry USING btree ((((index_tag)::text || (index_entry)::text)), type2, insert_title, record_key)
The following query will take advantage of the index, and result in a fast, efficient search (this particular example will search for barcode and return the record it is associated with):
SELECT
e.record_id
FROM
sierra_view.phrase_entry as e
WHERE
e.index_tag || e.index_entry = 'b' || LOWER('A000052469475')
-- 64 msec execution time
-- EXPLAIN:
-- Index Scan using idx_phrase_entry on phrase_entry (cost=0.69..586.22 rows=143 width=8)
-- Index Cond: (((index_tag)::text || (index_entry)::text) = 'ba000052469475'::text)
This is the INCORRECT way (just to demonstrate how drastic the difference is of using an index VS not using one)
SELECT
e.record_id
FROM
sierra_view.phrase_entry as e
WHERE
e.index_tag = 'b'
AND e.index_entry = LOWER('A000052469475')
-- 28 secs execution time!!!!
-- EXPLAIN:
-- Seq Scan on phrase_entry (cost=0.00..1783427.78 rows=20 width=8)
-- Filter: (((index_tag)::text = 'b'::text) AND ((index_entry)::text = 'a000052469475'::text))
--
-- sequential scan = bad!
Unexpected One-to-Many Situations¶
Sometimes we only expect 1 row of data to be returned, but we get multiple rows back.
For example, this query:
SELECT
p.record_id,
v.field_content
FROM
sierra_view.patron_record as p
JOIN
sierra_view.varfield as v
ON
v.record_id = p.record_id
AND v.varfield_type_code = 'b'
WHERE
p.record_id = 481037682097
returns the following data:
481037682097;12345679
481037682097;12345678 / PREV ID
If we only want one row of data per patron record in our output, we have two options.
1 Grab the first occurrence of the barcode only:
SELECT
p.record_id,
(
-- subquery grabs the first occurrence of the barcode
SELECT
v.field_content
FROM
sierra_view.varfield as v
WHERE
v.record_id = p.record_id
AND v.varfield_type_code = 'b'
ORDER BY
v.occ_num
LIMIT 1
) as field_content
FROM
sierra_view.patron_record as p
WHERE
p.record_id = 481037682097
481037682097;12345679
2 Aggregate all the barcodes into one column of data:
SELECT
p.record_id,
(
-- subquery grabs all barcodes
SELECT
string_agg(v.field_content, ',' ORDER BY v.occ_num)
FROM
sierra_view.varfield as v
WHERE
v.record_id = p.record_id
AND v.varfield_type_code = 'b'
) as fields_content
FROM
sierra_view.patron_record as p
WHERE
p.record_id = 481037682097
481037682097;12345679,12345678 / PREV ID
Helpful / useful queries¶
Get info about indexes / tables postgres database¶
-- find indexes!
-- NOTE that these indexes are defined not on the view,
-- but on the iiirecord. The tables and views are assumed
-- to be similarly named, but that's not guaranteed
SELECT
*
FROM
pg_indexes
WHERE
tablename = 'phrase_entry'
-- tablename NOT LIKE 'pg%' -- use this to find ALL indexes
ORDER BY
schemaname,
tablename,
indexname
;
-- show detailed view of `sierra_view` table_schema
SELECT
*
FROM
information_schema.columns
WHERE
table_schema = 'sierra_view'
ORDER BY
table_name,
ordinal_position
;
-- show all tables
SELECT
*
FROM
pg_catalog.pg_tables
ORDER BY
tablename
;
Additional Resources¶
Sierra Specific SQL Collections
Ray Voelker - Public Library of Cincinnati And Hamilton County Sierra SQL Wiki: github.com/plch/sierra-sql/wiki
Jeremy Goldstein - Minuteman Library Network Sierra SQL Wiki: github.com/jmgold/SQL-Queries/wiki
UNC Libraries Sierra SQL Wiki: github.com/UNC-Libraries/III-Sierra-SQL/wiki
General SQL Tutorials / Instruction
Codecademy: https://www.codecademy.com/catalog/language/sql
WiseOwl Training Videos: https://www.wiseowl.co.uk/sql/videos sql-selecting-data/
Stanford free online mini-courses in SQL: https://lagunita.stanford.edu/courses/DB/2014/SelfPaced/about