《數(shù)據(jù)庫系統(tǒng)》英文教學課件
《數(shù)據(jù)庫系統(tǒng)》英文教學課件,數(shù)據(jù)庫系統(tǒng),數(shù)據(jù)庫,系統(tǒng),英文,教學,課件
Database SystemsLecture#5Guifeng ZhengSchool of Software,SYSUGuifeng Zheng,DBMS,SS/SYSU2AgendanLast time:FDsnThis time:1.Anomalies2.Normalization:BCNF&3NFnNext time:RA&SQLGuifeng Zheng,DBMS,SS/SYSU3Review:FDsnDefinition:nNotation:nRead as:Ai functionally determines BjIf two tuples agree on the attributes A1,A2,Anthen they must also agree on the attributesB1,B2,BmA1,A2,An B1,B2,BmGuifeng Zheng,DBMS,SS/SYSU4Review:Combining FDsIf some FDs are satisfied,thenothers are satisfied tooIf all these FDs are true:name colorcategory departmentcolor,category priceThen this FD also holds:name,category priceWhy?Guifeng Zheng,DBMS,SS/SYSU5Problem:find all FDsnGiven a relation instance and set of given FDsnFind all FDs satisfied by that instancenUseful if we dont get enough information from our users:need to reverse engineer a data instancenQ:How long does this take?nA:Some time for each subset of attsnQ:How many subsets?qpowersetn exponential time in worst-casenBut can often be smarterGuifeng Zheng,DBMS,SS/SYSU6Closure AlgorithmStart with X=A1,An.Repeat:if B1,Bn C is a FD and B1,Bn are all in X then add C to X.until X didnt changename,category+=name,category,color,department,pricename colorcategory departmentcolor,category priceExample:Guifeng Zheng,DBMS,SS/SYSU7Closure alg e.g.A,B CA,D BB DExample:Compute X+,for every set X(AB is shorthand for A,B):A+=A,B+=BD,C+=C,D+=DAB+=ABCD,AC+=AC,AD+=ABCD,BC+=BC,BD+=BD,CD+=CDABC+=ABD+=ACD+=ABCD(no need to computewhy?)BCD+=BCD,ABCD+=ABCDWhat are the keys?Guifeng Zheng,DBMS,SS/SYSU8Closure alg e.g.Compute A,B+X=A,B,Compute A,F+X=A,F,R(A,B,C,D,E,F)A,B CA,D EB DA,F BIn class:What are the keys?Guifeng Zheng,DBMS,SS/SYSU9Closure alg e.g.nProduct(name,price,category,color)name,category pricecategory colorFDs are:Keys are:name,categorynEnrollment(student,address,course,room,time)student addressroom,time coursestudent,course room,timeFDs are:Keys are:Guifeng Zheng,DBMS,SS/SYSU10Next topic:AnomaliesnIdentify anomalies in existing schematanDecomposition by projectionnBCNFnLossy v.losslessnThird Normal FormGuifeng Zheng,DBMS,SS/SYSU11Types of anomaliesnRedundancyqRepeat info unnecessarily in several tuplesnUpdate anomalies:qChange info in one tuple but not in anothernDeletion anomalies:qDelete some values&lose other values toonInsert anomalies:qInserting row means having to insert other,separate info/null-ing it outGuifeng Zheng,DBMS,SS/SYSU12Examples of anomaliesnRedundancy:name,maddressnUpdate anomaly:Bill movesnDelete anom.:Bill doesnt pay bills,lose phones lose Bill!nInsert anom:cant insert someone without a(non-null)phonenUnderlying cause:SSN-phone is many-manynEffect:partial dependency ssn name,maddress,qWhereas key=ssn,phoneNameSSNMailing-addressPhoneMichael123NY212-111-1111Michael123NY917-111-1111Hilary456DC202-222-2222Hilary456DC914-222-2222Bill789Chappaqua914-222-2222Bill789Chappaqua212-333-3333SSN Name,Mailing-addressSSN PhoneGuifeng Zheng,DBMS,SS/SYSU13Decomposition by projectionnSoln:replace anomalous R with projections of R onto two subsets of attributesnProjection:an operation in Relational AlgebraqCorresponds to SELECT command in SQLnProjecting R onto attributes(A1,An)means removing all other attributesqResult of projection is another relationqYields tuples whose fields are A1,AnqResulting duplicates ignoredGuifeng Zheng,DBMS,SS/SYSU14Projection for decompositionR1=projection of R on A1,.,An,B1,.,Bm R2=projection of R on A1,.,An,C1,.,Cp A1,.,An B1,.,Bm C1,.,Cp=all attributesR1 and R2 may(/not)be reassembled to produce original RR(A1,.,An,B1,.,Bm,C1,.,Cp)R1(A1,.,An,B1,.,Bm)R2(A1,.,An,C1,.,Cp)Guifeng Zheng,DBMS,SS/SYSU15Chappaqua789BillDC456HilaryNY123MichaelMailing-addressSSNNameDecomposition examplenThe anomalies are goneqNo more redundant dataqEasy to for Bill to moveqOkay for Bill to lose all phonesBreak the relation into two:NameSSNMailing-addressPhoneMichael123NY212-111-1111Michael123NY917-111-1111Hilary456DC202-222-2222Hilary456DC914-222-2222Bill789Chappaqua914-222-2222Bill789Chappaqua212-333-3333212-333-3333789914-222-2222789914-222-2222456202-222-2222456917-111-1111123212-111-1111123PhoneSSNGuifeng Zheng,DBMS,SS/SYSU16Thus:high-level strategyPersonbuysProductnamepricenamessnE/R Model:Relational Model:plus FDsNormalization:Eliminates anomaliesGuifeng Zheng,DBMS,SS/SYSU17Using FDs to produce good schemas1.Start with set of relations2.Define FDs(and keys)for them based on real world3.Transform your relations to“normal form”(normalize them)qDo this using“decomposition”nIntuitively,good design meansqNo anomaliesqCan reconstruct all(and only the)original informationGuifeng Zheng,DBMS,SS/SYSU18Decomposition terminologynProjection:eliminating certain atts from relationnDecomposition:separating a relation into two by projectionnJoin:(re)assembling two relationsqWhenever a row from R1 and a row from R2 have the same value for some atts A,join together to form a row of R3nIf exactly the original rows are reproduced by joining the relations,then the decomposition was losslessqWe join on the attributes R1 and R2 have in common(As)nIf it cant,the decomposition was lossyGuifeng Zheng,DBMS,SS/SYSU19 A decomposition is lossless if we can recover:R(A,B,C)R1(A,B)R2(A,C)R(A,B,C)should be the same as R(A,B,C)R is in general larger than R.Must ensure R=RDecomposeRecoverLossless DecompositionsLossless DecompositionsGuifeng Zheng,DBMS,SS/SYSU20Lossless decompositionnSometimes the same set of data is reproduced:n(Word,100)+(Word,WP)(Word,100,WP)n(Oracle,1000)+(Oracle,DB)(Oracle,1000,DB)n(Access,100)+(Access,DB)(Access,100,DB)NamePriceCategoryWord100WPOracle1000DBAccess100DBNamePriceWord100Oracle1000Access100NameCategoryWordWPOracleDBAccessDBGuifeng Zheng,DBMS,SS/SYSU21Lossy decompositionnSometimes its not:n(Word,WP)+(100,WP)(Word,100,WP)n(Oracle,DB)+(1000,DB)(Oracle,1000,DB)n(Oracle,DB)+(100,DB)(Oracle,100,DB)n(Access,DB)+(1000,DB)(Access,1000,DB)n(Access,DB)+(100,DB)(Access,100,DB)NamePriceCategoryWord100WPOracle1000DBAccess100DBCategoryNameWPWordDBOracleDBAccessCategoryPriceWP100DB1000DB100Whatswrong?Guifeng Zheng,DBMS,SS/SYSU22Ensuring lossless decompositionnExamples:nname price,so first decomposition was losslessncategory name and category price,and so second decomposition was lossyR(A1,.,An,B1,.,Bm,C1,.,Cp)If A1,.,An B1,.,Bm or A1,.,An C1,.,Cp Then the decomposition is losslessR1(A1,.,An,B1,.,Bm)R2(A1,.,An,C1,.,Cp)Note:dont need bothGuifeng Zheng,DBMS,SS/SYSU23Quick lossless/lossy examplenAt a glance:can we decompose into R1(Y,X),R2(Y,Z)?nAt a glance:can we decompose into R1(X,Y),R2(X,Z)?XYZ123425Guifeng Zheng,DBMS,SS/SYSU24Next topic:Normal FormsnFirst Normal Form=all attributes are atomicqAs opposed to set-valuedqAssumed all alongnSecond Normal Form(2NF)nThird Normal Form(3NF)nBoyce Codd Normal Form(BCNF)nFourth Normal Form(4NF)nFifth Normal Form(5NF)Guifeng Zheng,DBMS,SS/SYSU25BCNF definitionnA simple condition for removing anomalies from relations:nI.e.:The left side must always contain a keynI.e:If a set of attributes determines other attributes,it must determine all the attributesnSlogan:“In every FD,the left side is a superkey.”A relation R is in BCNF if:If As Bs is a non-trivial dependency in R,then As is a superkey for RGuifeng Zheng,DBMS,SS/SYSU26BCNF decomposition algorithmRepeat choose A1,Am B1,Bn that violates the BNCF condition split R into R1(A1,Am,B1,Bn)and R2(A1,Am,others)continue with both R1 and R2Until no more violationsAsOthersBsR1R2/Heuristic:choose Bs as large as possibleGuifeng Zheng,DBMS,SS/SYSU27Boyce-Codd Normal FormnName/phone example is not BCNF:qssn,phone is key qFD:ssn name,mailing-address holdsnViolates BCNF:ssn is not a superkeynIts decomposition is BCNFqOnly superkeys anything elseNameSSNMailing-addressPhoneMichael123NY212-111-1111Michael123NY917-111-1111NameSSNMailing-addressMichael123NYSSNPhoneNumber123212-111-1111123917-111-1111Guifeng Zheng,DBMS,SS/SYSU28Design/BCNF examplenConsider situation:qEntities:Emp(ssn,name,lot),Dept(id,dname,budg)qRelship:Works(E,D,since)nDraw E/RnNew rule:in each dept,everyone parks in same lotnTranslate to FDnNormalizeGuifeng Zheng,DBMS,SS/SYSU29BCNF DecompositionnLarger example:multiple decompositionsnTitle,Year,Studio,President,Pres-AddressnFDs:qTitle,Year StudioqStudio PresidentqPresident Pres-Addressq=Studio President,Pres-AddressnNo many-many this timenProblem cause:transitive FDs:qTitle,year studio presidentGuifeng Zheng,DBMS,SS/SYSU30BCNF DecompositionnIllegal:As Bs,where As not a superkeynDecompose:Studio President,Pres-AddressqAs=studioqBs=president,pres-addressqCs=title,yearnResult:1.Studios(studio,president,pres-address)2.Movies(studio,title,year)nIs(2)in BCNF?Is(1)in BCNF?qKey:StudioqFD:President Pres-AddressqQ:Does president studio?If so,president is a keyqBut if not,it violates BCNFGuifeng Zheng,DBMS,SS/SYSU31BCNF DecompositionnStudios(studio,president,pres-address)nIllegal:As Bs,where As not a superkeyn Decompose:President Pres-AddressqAs=presidentqBs=pres-addressqCs=studionStudio,President,Pres-Address becomesqPresident,Pres-AddressqStudio,PresidentGuifeng Zheng,DBMS,SS/SYSU32Decomposition algorithm examplenR(N,O,R,P)F=N O,O R,R NnKey:N,PnViolations of BCNF:N O,OR,N ORnPick N OR(on board)nCan we rejoin?(on board)nWhat happens if we pick N O instead?nCan we rejoin?(on board)NameOfficeResidencePhoneGeorgePres.WH202-GeorgePres.WH486-DickVPNO202-DickVPNO307-Guifeng Zheng,DBMS,SS/SYSU33An issue with BCNFnWe could lose FDsnRelation:R(Title,Theater,Neighborhood)nFDs:qTitle,Nhood TheaternAssume a movie shouldnt play twice in same nhoodqTheater NhoodnKeys:qTitle,NhoodqTheater,TitleTitleTheaterNhood阿凡達飛揚天河碟中碟飛揚天河Guifeng Zheng,DBMS,SS/SYSU34Losing FDsnBCNF violation:Theater NhoodnDecompose:qTheater,NHoodqTheater,TitlenResulting relations:天河飛揚NhoodTheaterR1碟中碟飛揚阿凡達飛揚TitleTheaterR2Guifeng Zheng,DBMS,SS/SYSU35Losing FDsnSuppose we add new rows to R1 and R2:nNeither R1 nor R2 enforces FD Title,Nhood Theater碟中碟天河天娛天河天河Nhood阿凡達碟中碟Title飛揚飛揚TheaterRTheaterNhood飛揚天河天娛天河TheaterTitle飛揚碟中碟飛揚阿凡達天娛碟中碟R1R2Guifeng Zheng,DBMS,SS/SYSU36Third normal form:motivationnSometimes qBCNF is not dependency-preserving,andqEfficient checking for FD violation on updates is importantnIn these cases BCNF is too severe a req.q“over-normalization”nSolution:define a weaker normal form,3NFqFDs can be checked on individual relations without performing a join(no inter-relational FDs)qrelations can be converted,preserving both data and FDsGuifeng Zheng,DBMS,SS/SYSU37BCNF lossinessn注意:BCNF decomp.is not data-lossyqResults can be rejoined to obtain the exact originalnBut:it can lose dependenciesqAfter decomp,now legal to add rows whose corresponding rows would be illegal in(rejoined)originalnData-lossy v.FD-lossyGuifeng Zheng,DBMS,SS/SYSU38Third Normal FormnNow define the(weaker)Third Normal FormqTurns out:this example was already in 3NFA relation R is in 3rd normal form if:For every nontrivial dependency A1,A2,.,An Bfor R,A1,A2,.,An is a super-key for R,or B is part of a key,i.e.,B is primeTradeoff:BCNF=no FD anomalies,but may lose some FDs3NF=keeps all FDs,but may have some anomaliesGuifeng Zheng,DBMS,SS/SYSUnConsider a set F of functional dependencies and the functional dependency in F.qAttribute A is extraneous in if A and if A is removed from,the set of functional dependencies implied by F doesnt change.Given AB C and A C then B is extraneous in ABqAttribute A is extraneous in if A and if A is removed from,the set of functional dependencies implied by F doesnt change.Given A BC and A B then B is extraneous in BCnA canonical cover Fc for F is a set of dependencies such that F logically implies all dependencies in Fc and Fc logically implies all dependencies in F,and furtherqNo functional dependency in Fc contains an extraneous attribute.qEach left side of a functional dependency in Fc is unique.n注意:注意:Canonical Cover 與書本的最小函數(shù)依賴集不完全一樣與書本的最小函數(shù)依賴集不完全一樣Canonical Cover From A CI get AB C39Guifeng Zheng,DBMS,SS/SYSUCanonical CovernCompute a canonical over for F:repeat use the union rule to replace any dependencies in F1 1 and 1 2 replaced with 1 12 Find a functional dependency with an extraneous attribute either in or in If an extraneous attribute is found,delete it from until F does not change40Guifeng Zheng,DBMS,SS/SYSUnR=(A,B,C)F=A BC B C A B AB CnCombine A BC and A B into A BCnA is extraneous in AB C because B C logically implies AB C.nC is extraneous in A BC since A BC is logically implied by A B and B C.nThe canonical cover is:A BB CExample of Computing a Canonical CoverA BCB CAB CA BCB CB CA BCB CA BB C41Guifeng Zheng,DBMS,SS/SYSU3NF Decomposition AlgorithmnLet Fc be a canonical cover for F;i:=0;for each functional dependency in Fc do if none of the schemas Rj,1=j=i contains then begini:=i+1;Rj:=;endif none of the schemas Rj,1=j=i contains a candidate key for Rthen begini:=i+1;Ri:=any candidate key for R;endreturn(R1,R2,Ri)42Guifeng Zheng,DBMS,SS/SYSUnRelation schema:Banker-info-schemabranch-name,customer-name,banker-name,office-numbernThe functional dependencies for this relation schema are:banker-name branch-name,office-numbercustomer-name,branch-name banker-namenThe key is:customer-name,branch-nameExample43Guifeng Zheng,DBMS,SS/SYSUApplying 3NF to banker-info-schemanGo through the for loop in the algorithm:banker-name branch-name,office-numberis not in any decomposed relation(no decomposed relation so far)Create a new relation:Banker-office-schema(banker-name,branch-name,office-number)customer-name,branch-name banker-nameis not in any decomposed relation(one decomposed relation so far)Create a new relation:Banker-schema(customer-name,branch-name,banker-name)nSince Banker-schema contains a candidate key for Banker-info-schema,we are done with the decomposition process.44Guifeng Zheng,DBMS,SS/SYSUComparison of BCNF and 3NFnIt is always possible to decompose a relation into relations in 3NF and qthe decomposition is losslessqdependencies are preservednIt is always possible to decompose a relation into relations in BCNF and qthe decomposition is losslessqit may not be possible to preserve dependencies45BOOT vs.PPTGuifeng Zheng,DBMS,SS/SYSU46多對一多對唯一全部參與支撐聯(lián)系、弱聯(lián)系弱實體Guifeng Zheng,DBMS,SS/SYSU47Next weeknRead ch.5.1-2(SQL)
收藏