裝配圖網(wǎng) > 圖紙專區(qū) > 課件教案 > 《數(shù)據(jù)庫系統(tǒng)》教學課件

《數(shù)據(jù)庫系統(tǒng)》教學課件

《數(shù)據(jù)庫系統(tǒng)》教學課件,數(shù)據(jù)庫系統(tǒng),數(shù)據(jù)庫,系統(tǒng),教學,課件 Relational Query OptimizationJianlin FengSchool of SoftwareSUN YAT-SEN UNIVERSITYcourtesy of Joe Hellerstein for some slidesReviewnChoice of single-table operationsqDepends on indexes,memory,stats,nJoinsqBlocked nested loops:nsimple,exploits extra memoryqIndexed nested loops:n best if 1 rel small and one indexedqSort/Merge Joinngood with small amount of memory,bad with duplicatesqHash Joinnfast(enough memory),bad with skewed datanThese are“rules of thumb”qOn their way to a more principled approachQuery Optimization OverviewSELECT S.snameFROM Reserves R,Sailors SWHERE R.sid=S.sid AND R.bid=100 AND S.rating5ReservesSailorssid=sidbid=100 rating 5snamenQuery can be converted to relational algebranRelational Algebra converts to tree,joins form branchesnEach operator has implementation choicesnOperators can also be applied in different order!(sname)(bid=100 rating 5)(Reserves Sailors)Query Optimization Overview(cont.)nPlan:Tree of Relation Algebra operations(and some others)with choice of algorithm for each operation.nThree main issues:qFor a given query,what plans are considered?qHow is the cost of a plan estimated?qHow do we“search”in the“plan space”?nIdeally:Want to find best plan.nReality:Avoid worst plans!Cost-based Query Sub-System Query ParserQuery OptimizerPlan GeneratorPlan Cost EstimatorQuery ExecutorCatalog ManagerUsually there is aheuristics-basedrewriting step beforethe cost-based steps.SchemaStatisticsSelect*From Blah BWhere B.blah=blahQueriesSchema for ExamplesnReserves:qEach tuple is 40 bytes long,100 tuples per page,1000 pages.qAssume there are 100 boatsnSailors:qEach tuple is 50 bytes long,80 tuples per page,500 pages.qAssume there are 10 different ratings nAssume we have 5 pages in our buffer pool!Sailors(sid:integer,sname:string,rating:integer,age:real)Reserves(sid:integer,bid:integer,day:dates,rname:string)Motivating ExamplenCost:500+500*1000 I/OsnBy no means the worst plan!nMisses several opportunities:qselections could bepushed downqno use made of indexesnGoal of optimization:Find faster plans that compute the same answer.SELECT S.snameFROM Reserves R,Sailors SWHERE R.sid=S.sid AND R.bid=100 AND S.rating5SailorsReservessid=sidbid=100 rating 5sname(Page-Oriented Nested loops)(On-the-fly)(On-the-fly)Plan:500,500 IOsAlternative Plans Push Selects(No Indexes)SailorsReservessid=sidbid=100 rating 5sname(Page-Oriented Nested loops)(On-the-fly)(On-the-fly)SailorsReservessid=sidrating 5sname(Page-Oriented Nested loops)(On-the-fly)(On-the-fly)bid=100(On-the-fly)250,500 IOsAlternative Plans Push Selects(No Indexes)SailorsReservessid=sidrating 5sname(Page-Oriented Nested loops)(On-the-fly)(On-the-fly)bid=100(On-the-fly)SailorsReservessid=sidbid=100sname(Page-Oriented Nested loops)(On-the-fly)rating 5(On-the-fly)(On-the-fly)500+1000+250+250*10250,500 IOs4250 IOsSailorsReservessid=sidrating 5sname(Page-Oriented Nested loops)(On-the-fly)(On-the-fly)bid=100(On-the-fly)6000 IOsSailorsReservessid=sidrating 5sname(Page-Oriented Nested loops)(On-the-fly)(On-the-fly)bid=100(On-the-fly)250,500 IOsAlternative Plans Push Selects(No Indexes)SailorsReservessid=sidrating 5sname(Page-Oriented Nested loops)(On-the-fly)bid=100(Scan&Write totemp T2)(On-the-fly)6000 IOsSailorsReservessid=sidrating 5sname(Page-Oriented Nested loops)(On-the-fly)(On-the-fly)bid=100(On-the-fly)Alternative Plans Push Selects(No Indexes)4250 IOs1000+500+250+(10*250)ReservesSailorssid=sidbid=100sname(Page-Oriented Nested loops)(On-the-fly)rating5(Scan&Write totemp T2)(On-the-fly)Alternative Plans Push Selects(No Indexes)4010 IOs500+1000+10+(250*10)SailorsReservessid=sidrating 5sname(Page-Oriented Nested loops)(On-the-fly)bid=100(Scan&Write totemp T2)(On-the-fly)4250 IOsMore Alternative Plans(No Indexes)nSort Merge JoinnWith 5 buffers,cost of plan:qScan Reserves(1000)+write temp T1(10 pages)=1010.qScan Sailors(500)+write temp T2(250 pages)=750.qSort T1(2*2*10)+sort T2(2*4*250)+merge(10+250)=2300qTotal:4060 page I/Os.nIf use BNL join,join=10+4*250,total cost=2770.nCan also push projections,but must be careful!qT1 has only sid,T2 only sid,sname:qT1 fits in 3 pgs,cost of BNL under 250 pgs,total 5(Scan;write to temp T1)(Scan;write totemp T2)(Sort-Merge Join)Summing upnThere are lots of plansqEven for a relatively simple querynPeople tend to think they can pick good ones by handqMapReduce is based on that assumptionnNot so clear thats true!qMachines are better at enumerating options than peopleqBut we will see soon how optimizers make simplifying assumptionsWhat is Needed for Optimization?nA closed set of operators qRelational ops(table in,table out)qEncapsulation(e.g.based on iterators)nPlan spaceqBased on relational equivalences,different implementationsnCost Estimation,based onqCost formulasqSize estimation,in turn based on nCatalog information on base tablesnSelectivity(Reduction Factor)estimationnA search algorithm:To sift through the plan space and find lowest cost option!Query OptimizationnWill focus on“System R”(Selinger)style optimizersHighlights of System R OptimizernImpact:qMost widely used currently;works well for 10-15 joins.nCost estimation:qVery inexact,but works OK in practice.qStatistics in system catalogs used to estimate cost of operations and result sizes.qConsiders combination of CPU and I/O costs.qSystem Rs scheme has been improved since that time.Highlights of System R Optimizer(Contd)nPlan Space:Too large,must be pruned.qMany plans share common,“overpriced”subtreesnignore them all!qIn some implementations,only the space of left-deep plans is considered.qCartesian products avoided in some implementations.Query Blocks:Units of OptimizationnBreak query into query blocksnOptimized one block at a timenUncorrelated nested blocks computed oncenCorrelated nested blocks like function callsqBut sometimes can be“decorrelated”qBeyond the scope of introductory course!SELECT S.snameFROM Sailors SWHERE S.age IN (SELECT MAX(S2.age)FROM Sailors S2 GROUP BY S2.rating)Nested blockOuter blockv For each block,the plans considered are:All available access methods,for each relation in FROM clause.All left-deep join trees right branch always a base table consider all join orders and join methodsBACDSchema for ExamplesnReserves:qEach tuple is 40 bytes long,100 tuples per page,1000 pages.100 distinct bids.nSailors:qEach tuple is 50 bytes long,80 tuples per page,500 pages.10 ratings,40,000 sids.Sailors(sid:integer,sname:string,rating:integer,age:real)Reserves(sid:integer,bid:integer,day:dates,rname:string)Translating SQL to Relational AlgebraSELECT S.sid,MIN(R.day)FROM Sailors S,Reserves R,Boats BWHERE S.sid=R.sid AND R.bid=B.bid AND B.color=“red”GROUP BY S.sidHAVING COUNT(*)=2 For each sailor with at least two reservations for red boats,find the sailor id and the earliest date on which the sailor has a reservation for a red boat.pS.sid,MIN(R.day)(HAVING COUNT(*)2(GROUP BY S.Sid(sB.color=“red”(Sailors Reserves Boats)Translating SQL to Relational AlgebraSELECT S.sid,MIN(R.day)FROM Sailors S,Reserves R,Boats BWHERE S.sid=R.sid AND R.bid=B.bid AND B.color=“red”GROUP BY S.sidHAVING COUNT(*)=2 nAllow us to choose different join orders and to push selections and projections ahead of joins.nSelections:c1cn(R)c1(cn(R)(cascade)c1(c2(R)c2(c1(R)(commute)nProjections:a1(R)a1(a1,an(R)(cascade)nCartesian ProductqR (S T)(R S)T (associative)qR S S R (commutative)qThis means we can do joins in any order.nButbeware of cartesian product!Relational Algebra EquivalencesMore EquivalencesnEager projectionqCan cascade and“push”some projections thru selectionqCan cascade and“push”some projections below one side of a joinqRule of thumb:can project anything not needed“downstream”nSelection on a cross-product is equivalent to a join.qIf selection is comparing attributes from each sidenA selection on attributes of R commutes with R S.qi.e.,(R S)(R)Sqbut only if the selection doesnt refer to S!Cost EstimationnFor each plan considered,must estimate total cost:qMust estimate cost of each operation in plan tree.nDepends on input cardinalities.nWeve already discussed this for various operators qsequential scan,index scan,joins,etc.qMust estimate size of result for each operation in tree!nUse information about the input relations.nFor selections and joins,assume independence of predicates.qIn System R,cost is boiled down to a single number consisting of#I/O+CPU-factor*#tuplesnQ:Is“cost”the same as estimated“run time”?Statistics and CatalogsnNeed infomation on relations and indexes involved.nCatalogs typically contain at least:nCatalogs updated periodically.nModern systems do moreqkeep more detailed information on data values,e.g.,histogramsStatisticMeaningNTuples#of tuples in a table(cardinality)NPages#of disk pages in a tableLow/Highmin/max value in a columnNkeys#of distinct values in a columnIHeightthe height of an indexINPages#of disk pages in an indexSize Estimation and SelectivitynMax output cardinality=product of input cardinalitiesnSelectivity(sel)associated with each term qreflects the impact of the term in reducing result size.q|output|/|input|Result cardinality=Max#tuples *seliqBook calls selectivity“Reduction Factor”(RF)nAvoid confusion:q“highly selective”in common English is opposite of a high selectivity value(|output|/|input|high!)SELECT attribute listFROM relation listWHERE term1 AND.AND termkResult Size Estimationn Result cardinality=Max#tuples *product of all RFs.nTerm col=value (given Nkeys(I)on col)RF=1/NKeys(I)nTerm col1=col2 (handy for joins too)RF =1/MAX(NKeys(I1),NKeys(I2)nTerm colvalueRF=(High(I)-value)/(High(I)-Low(I)Implicit assumptions:values are uniformly distributed and terms are independent!nNote,if missing the needed stats,assume 1/10!Enumeration of Alternative PlansnThere are two main cases:qSingle-relation plans (base case)qMultiple-relation plans (induction)nSingle-table queries include selects,projects,and grouping/aggregate operations:qConsider each available access path(file scan/index)nChoose the one with the least estimated costqSelection/Projection done on the flyqResult pipelined into grouping/aggregationCost Estimates for Single-Relation PlansnIndex I on primary key matches selection:qCost is Height(I)+1 for a B+tree.nClustered index I matching one or more selects:q(NPages(I)+NPages(R)*product of RFs of matching selects.nNon-clustered index I matching one or more selects:q(NPages(I)+NTuples(R)*product of RFs of matching selects.nSequential scan of file:qNPages(R).+Recall:Must also charge for duplicate elimination if requiredExamplenIf we have an index on rating:qCardinality=(1/NKeys(I)*NTuples(R)=(1/10)*40000 tuples qClustered index:(1/NKeys(I)*(NPages(I)+NPages(R)=(1/10)*(50+500)=55 pages are retrieved.(This is the cost.)qUnclustered index:(1/NKeys(I)*(NPages(I)+NTuples(R)=(1/10)*(50+40000)=4005 pages are retrieved.nIf we have an index on sid:qWould have to retrieve all tuples/pages.With a clustered index,the cost is 50+500,with unclustered index,50+40000.nDoing a file scan:qWe retrieve all file pages(500).SELECT S.sidFROM Sailors SWHERE S.rating=8Queries Over Multiple RelationsnA System R heuristic:only left-deep join trees considered.qRestricts the search spaceqLeft-deep trees allow us to generate all fully pipelined plans.nIntermediate results not written to temporary files.nNot all left-deep trees are fully pipelined(e.g.,SM join).BACDBACDCDBAEnumeration of Left-Deep PlansnLeft-deep plans differ in qthe order of relationsqthe access method for each relationqthe join method for each join.nEnumerated using N passes(if N relations joined):qPass 1:Find best 1-relation plan for each relation.qPass i:Find best way to join result of an(i-1)-relation plan(as outer)to the ith relation.(i between 2 and N.)nFor each subset of relations,retain only:qCheapest plan overall,plusqCheapest plan for each interesting order of the tuples.The Dynamic Programming TableSubset of tables in FROM clauseInteresting-order columnsBest planCostR,Shashjoin(R,S)1000R,Ssortmerge(R,S)1500A Note on“Interesting Orders”nAn intermediate result has an“interesting order”if it is sorted by any of:qORDER BY attributesqGROUP BY attributesqJoin attributes of yet-to-be-added(downstream)joinsEnumeration of Plans(Contd.)nMatch an i-1 way plan with another table only if a)there is a join condition between them,orb)all predicates in WHERE have been used up.ni.e.,avoid Cartesian products if possible.nORDER BY,GROUP BY,aggregates etc.handled as a final stepqvia interestingly ordered plan if chosen(free!)qor via an additional sort/hash operatornDespite pruning,this is exponential in#tables.ExampleSailors:Hash,B+on sidReserves:Clustered B+tree on bid B+on sidBoats B+on colorSelect S.sid,COUNT(*)AS numberFROM Sailors S,Reserves R,Boats BWHERE S.sid=R.sid AND R.bid=B.bid AND B.color=“red”GROUP BY S.sidReservesSailorssid=sidBoats Sid,COUNT(*)AS numbesGROUPBY sidbid=bidColor=redPass1:Best plan(s)for accessing each relationReserves,Sailors:File ScanQ:What about Clustered B+on Reserves.bid?Boats:B+tree on colorPass 1nFind best plan for each relation in isolation:qReserves,Sailors:File ScanqBoats:B+tree on colorPass 2nFor each plan in pass 1,generate plans joining another relation as the inner,using all join methods(and matching inner access methods)qFile Scan Reserves(outer)with Boats(inner)qFile Scan Reserves(outer)with Sailors(inner)qFile Scan Sailors(outer)with Boats(inner)qFile Scan Sailors(outer)with Reserves(inner)qBoats Btree on color with Sailors(inner)qBoats Btree on color with Reserves(inner)nRetain cheapest plan for each(pair of relations,order)Pass 3 and beyondnUsing Pass 2 plans as outer relations,generate plans for the next joinqE.g.Boats B+-tree on color with Reserves(bid)(sortmerge)inner Sailors(B-tree sid)sort-mergenThen,add cost for groupby/aggregate:qThis is the cost to sort the result by sid,unless it has already been sorted by a previous operator.nThen,choose the cheapest planSummarynOptimization is the reason for the lasting power of the relational systemnBut it is primitive in some ways nNew areas:many!qSmarter summary statistics(fancy histograms and“sketches”)qAuto-tuning statistics,qAdaptive runtime re-optimization(e.g.eddies),qMulti-query optimization,qAnd parallel scheduling issues,etc.

數(shù)據(jù)庫系統(tǒng)教學課件.zip

《數(shù)據(jù)庫系統(tǒng)》教學課件

lec 9 Query Optimization.ppt---(點擊預覽)

lec 9 Physical Design.ppt---(點擊預覽)

lec 8 Query Processing.ppt---(點擊預覽)

lec 7 Hash-Based Inds.ppt---(點擊預覽)

lec 6 External Sorting.ppt---(點擊預覽)

lec 5 TreeInds.ppt---(點擊預覽)

lec 4 SQL.ppt---(點擊預覽)

lec 4 calculus.ppt---(點擊預覽)

lec 4 algebra.ppt---(點擊預覽)

lec 3 Storing Data.ppt---(點擊預覽)

lec 3 File and Indexing.ppt---(點擊預覽)

lec 2 The Relational Model.ppt---(點擊預覽)

lec 2 The Entity-Relationship Model.ppt---(點擊預覽)

lec 15 final review.ppt---(點擊預覽)

lec 14 warehouse.ppt---(點擊預覽)

lec 12 recovery.ppt---(點擊預覽)

lec 11 transaction.ppt---(點擊預覽)

lec 10 norm.ppt---(點擊預覽)

lec 1 overview.ppt---(點擊預覽)

壓縮包目錄

預覽區(qū)

《數(shù)據(jù)庫系統(tǒng)》教學課件
- lec 1 overview.ppt--點擊預覽
- lec 10 norm.ppt--點擊預覽
- lec 11 transaction.ppt--點擊預覽
- lec 12 recovery.ppt--點擊預覽
- lec 14 warehouse.ppt--點擊預覽
- lec 15 final review.ppt--點擊預覽
- lec 2 The Entity-Relationship Model.ppt--點擊預覽
- lec 2 The Relational Model.ppt--點擊預覽
- lec 3 File and Indexing.ppt--點擊預覽
- lec 3 Storing Data.ppt--點擊預覽
- lec 4 algebra.ppt--點擊預覽
- lec 4 calculus.ppt--點擊預覽
- lec 4 SQL.ppt--點擊預覽
- lec 5 TreeInds.ppt--點擊預覽
- lec 6 External Sorting.ppt--點擊預覽
- lec 7 Hash-Based Inds.ppt--點擊預覽
- lec 8 Query Processing.ppt--點擊預覽
- lec 9 Physical Design.ppt--點擊預覽
- lec 9 Query Optimization.ppt--點擊預覽

請點擊導航文件預覽

編號：48634128 類型：共享資源大小：6.17MB 格式：ZIP 上傳時間：2022-01-12

30
積分

舉報

版權申訴 word格式文檔無特別注明外均可編輯修改；預覽文檔經(jīng)過壓縮，下載后原文更清晰！ 立即下載

關鍵詞：: 數(shù)據(jù)庫系統(tǒng) 數(shù)據(jù)庫系統(tǒng) 教學課件

資源描述：: 《數(shù)據(jù)庫系統(tǒng)》教學課件,數(shù)據(jù)庫系統(tǒng),數(shù)據(jù)庫,系統(tǒng),教學,課件

展開閱讀全文

溫馨提示:
1: 本站所有資源如無特殊說明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
2: 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
3.本站RAR壓縮包中若帶圖紙，網(wǎng)頁內容里面會有圖紙預覽，若沒有圖紙預覽就沒有圖紙。
4. 未經(jīng)權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
5. 裝配圖網(wǎng)僅提供信息存儲空間，僅對用戶上傳內容的表現(xiàn)方式做保護處理，對用戶上傳分享的文檔內容本身不做任何修改或編輯，并不能對任何下載內容負責。
6. 下載文件中如有侵權或不適當內容，請與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

裝配圖網(wǎng)所有資源均是用戶自行上傳分享，僅供網(wǎng)友學習交流，未經(jīng)上傳用戶書面授權，請勿作他用。

關于本文

本文標題：《數(shù)據(jù)庫系統(tǒng)》教學課件
鏈接地址：http://m.appdesigncorp.com/article/48634128.html

點擊下載此資源

《數(shù)據(jù)庫系統(tǒng)》教學課件

最新文檔

相關資源

相關搜索