Next: Optimization
Up: 4660
Previous: Algorithm to Decompose R
  Contents
Direct
sql_select all from S where 'S.S#=="S3"'
Range
sql_select all from S
where '(S.CITY>="London")&&(S.CITY<="Paris")'
Problem
- How to perform these queries efficiently?
- How to avoid reading the entire file?
Solution
- Direct access: use an index of keys in the table
- Index: B-tree
- Sequential access: use an inverted file for ranges
- Inverted file: sorted on a column
- Index-sequential: B+tree
- DDL: CREATE INDEX ON TABLE S USING KEY S#
B+tree, Inverted File [276]
Index Mechanisms: B-tree, B+tree [277]
B-tree
- balanced tree (all branches have the same depth)
- hierarchy of pages
- each page has a sorted list of keys
- between each pair of keys is a pointer
- pointer is to pages with intermediate key values
- Order of B-tree is
: branching by
pointers
- branching by
yields
depth
- 10,000,000 records with
: depth=7
- find, insert, delete:
- insert maintains balance and sorting
- delete maintains balance and sorting
B+tree
- all keys are at the leaves
- index portion: roadmap to keys at the leaves
- leaves of tree are linked for sequential access
VISUAL Btree [278]
VDB
- VISUAL Btree
- Graphical User Interface (GUI)
- uses Tcl/Tk
Demo
Insert
Overflow
New Root
Delete
Underflow
B-tree: Find, Read, Next [279]
Find(key)
- direct access
- start at root
- binary search of page for closest key
- take associated index pointer and repeat
- if key is found stop (B
-tree: continue to leaf)
- O(log_m(#keys))
Read(data)
- take associated data pointer to get the record
Next()
- sequential access
- use a B+tree
- Find(startkey) for beginning of search
- return next key in page
- or take sequential pointer to next page
- O(1)
B-tree: Insert(key) [280]
Insert(key)
- Find(key) to locate new position
- move right neighbors within page
- if overflow split into 2 pages
- promote middle key to parent page
- if parent overflows, repeat
- tree only grows at the root
- maintains balance
- O(log_m(#keys))
B-tree: Find, Insert [281]
Order: 5 (max 5 children/node; max 4 keys/node; min 2 keys/node)
Find: h, x
Insert: u (OK)
B-tree: Insert (Overflow) [282]
Insert: p (Overflow)
Split; Promote m; Overflow
Split; Promote j; New Root
B-tree: Delete(key) [283]
Delete(key)
- Find(key)
- if not on leaf, swap with successor on leaf
- delete from leaf
- if at least m/2 keys then stop else underflow
- if neighbor has enough keys, borrow keys
- else combine with neighbor
- if parent underflows, repeat
- tree only shrinks at the root
- maintains balance
- O(log_m(#keys))
B-tree: Delete [284]
Delete: h (OK)
Delete: r (Not on leaf; Swap with successor on leaf s; Delete OK)
B-tree: Delete (Underflow) [285]
Delete: p (Underflow)
Borrow from neighbor; Moveleft s t
B-tree: Delete (Combine) [286]
Delete: d (Underflow; Cannot borrow)
Combine a b c e; Remove old page; Underflow
Combine f j m t; Remove old root
Index Mechanisms: Performance [287]
DD
DD has information about indices for the database. SQL uses this information for fast queries.
Direct
sql_select all from S where 'S.S#=="S3"'
B-tree
- Find("S3") in S# index for suppliers S
- Read(data) to get the record
- O(log_m(#records)) vs. O(#records)
Range
sql_select all from S
where '(S.CITY>="London")&&(S.CITY<="Paris")'
B+tree
- Find("London") in CITY index for suppliers S
- Next(); Read(data); Next(); Read(data); ...
- continue until CITY>"Paris" (say
records)
- O(log_m(#records)+x) vs. O(#records)
Access Hierarchy [288]
[DBMS Example [289]
- SELECT * FROM P WHERE COLOR
'Red' ;
- Identify from the Data Dictionary: relation ? tuple ?
- Request the FILE MANAGER to get the 1st tuple from table 'T'
- FILE MANAGER refers to file directory to detect page 'p'
- FILE MANAGER requests DISK MANAGER to read page 'p'
- DISK MANAGER sends a block of storage that has page 'p'
- FILE MANAGER formats the data and passes it to DBMS
- DBMS accept/rejects the tuple (predicate ), requests next tuple
File Manager [290]
- DISK MANAGER allows FILE MANAGER to ignore all details of physical I/O
- FILE MANAGER thinks in terms of logical pages
- FILE MANAGER allows DBMS to ignore details of page I/O
- DBMS thinks in terms of stored relations and tuples
- Function of FILE MANAGER: stored record management
Clustering [291]
- Goal: good performance for retrieval
- Store logically related records together
- P1 and P2 together: read in a single disk access
- P1 and P2 not together: need 2 disk accesses
- DBMS may need sequential access for each relation
- DBMS may need interleaved access for tuples in S and SP
- intra file clustering or inter file clustering
- DBMS can support only one at a given time
- Good DBMS: specify different clustering for different files
Storage Layout [292]
| 0 |
Directory |
S1 |
S2 |
S3 |
S4 |
S5 |
| 6 |
P1 |
P2 |
P3 |
P4 |
P5 |
P6 |
| 12 |
S1/P1 |
S1/P2 |
S1/P3 |
S1/P4 |
S1/P5 |
S1/P6 |
| 18 |
S2/P1 |
S2/P2 |
S3/P2 |
S4/P2 |
S4/P4 |
S4/P5 |
| 24 |
|
|
|
|
|
|
| 30 |
|
|
|
|
|
|
- FILE MANAGER inserts new tuple S6 and DISK MANAGER allots page '24'
- FILE MANAGER deletes S2 and DISK MANAGER releases page '2' as Free
- FILE MANAGER inserts new tuple P7 and DISK MANAGER allots page '2'
- FILE MANAGER deletes S4 and DISK MANAGER releases page '4' as Free
Storage Layout [293]
| 0 |
Directory |
S1 |
P7 |
S3 |
empty |
S5 |
| 6 |
P1 |
P2 |
P3 |
P4 |
P5 |
P6 |
| 12 |
S1/P1 |
S1/P2 |
S1/P3 |
S1/P4 |
S1/P5 |
S1/P6 |
| 18 |
S2/P1 |
S2/P2 |
S3/P2 |
S4/P2 |
S4/P4 |
S4/P5 |
| 24 |
S6 |
|
|
|
|
|
| 30 |
|
|
|
|
|
|
- Logical adjacency is not same as physical adjacency
- Each page has a pointer to the next logical page
- DISK MANAGER manages pointers which are not visible to FILE MANAGER
Next: Optimization
Up: 4660
Previous: Algorithm to Decompose R
  Contents
Ted Billard
2001-10-31