《數(shù)據(jù)庫管理系統(tǒng)》word版.doc
《《數(shù)據(jù)庫管理系統(tǒng)》word版.doc》由會員分享,可在線閱讀,更多相關(guān)《《數(shù)據(jù)庫管理系統(tǒng)》word版.doc(20頁珍藏版)》請在裝配圖網(wǎng)上搜索。
單位代碼 01 學(xué) 號040101086 分 類 號 密 級____ ___ _ 文獻翻譯 數(shù)據(jù)庫管理系統(tǒng)概述 院(系)名稱 信息工程學(xué)院 專 業(yè) 名 稱 計算機科學(xué)與技術(shù) 學(xué) 生 姓 名 指 導(dǎo) 教 師 2008年4月15日 英文譯文 數(shù)據(jù)庫管理系統(tǒng)概述 赫克托加西亞-莫利納,杰夫烏爾曼,珍妮佛 1.2 數(shù)據(jù)庫管理系統(tǒng)概述 從圖1.1我們可以看到一個完整的數(shù)據(jù)庫管理系統(tǒng)概況。單框代表系統(tǒng)組件,而雙框代表內(nèi)存數(shù)據(jù)結(jié)構(gòu)。實線顯示控制流和數(shù)據(jù)流,而虛線僅表示數(shù)據(jù)流。由于這個圖很復(fù)雜,我們將分幾個階段來考慮細節(jié)。首先,在頂部,我們認為應(yīng)該有兩個不同的命令來源到達數(shù)據(jù)庫: (1)請求或修改數(shù)據(jù)的傳統(tǒng)用戶和應(yīng)用程序。 (2)數(shù)據(jù)庫管理員:負責數(shù)據(jù)庫結(jié)構(gòu)或模型的個人或組織。 1.2.1 數(shù)據(jù)定義語言命令 第二種命令是簡單的進程,從圖1.1的右上側(cè)開始,我們可以看見它的路徑。例如,為一所大學(xué)搞注冊的數(shù)據(jù)庫管理員,或簡稱DBA,應(yīng)該為每個學(xué)生建一張表或關(guān)系,從而說明這個學(xué)生所參加的課程以及那門課程的分數(shù)。數(shù)據(jù)庫管理員還要規(guī)定學(xué)生的成績只能是A 、B 、C 、D和F。這個結(jié)構(gòu)和約束信息就是數(shù)據(jù)庫的全部。這表明在圖1.1中,數(shù)據(jù)庫管理員必須要有特殊的權(quán)力才能執(zhí)行模式更改指令,因為這些指令對數(shù)據(jù)庫有著深遠的影響。這些模式更改數(shù)據(jù)庫定義語言指令(“DDL”代表“數(shù)據(jù)定義語言”)是由數(shù)據(jù)庫定義語言處理器解析,并傳遞給執(zhí)行引擎,經(jīng)過搜索/存檔/記錄管理,再到元數(shù)據(jù),即模型信息數(shù)據(jù)庫。 1.2.2 查詢處理概述 與數(shù)據(jù)庫管理系統(tǒng)的絕大部份交互都是沿著圖1.1左側(cè)的路徑。用戶或應(yīng)用程序啟動一些行為,并不會影響數(shù)據(jù)庫的模式,但可能會影響到數(shù)據(jù)庫的內(nèi)容(如果是一個修改命令行為),或?qū)臄?shù)據(jù)庫中提取數(shù)據(jù)(如果是一個查詢行為)。1.1節(jié)講過,用這些命令描述的語言稱為數(shù)據(jù)操縱語言(即DML),說白了就是查詢語言。我們可以使用很多數(shù)據(jù)操縱語言,但是在范例1.1 中所提到的那些數(shù)據(jù)查詢語言,是目前最常用的。DML語句由兩個獨立的子系統(tǒng)來處理,其過程如下: 查詢回復(fù) 查詢就是利用查詢編譯器進行解析和優(yōu)化。由此產(chǎn)生的查詢計劃,或數(shù)據(jù)庫管理系統(tǒng)的行為序列將會作用于對查詢的回復(fù)。執(zhí)行引擎會為小段數(shù)據(jù),特別是記錄或關(guān)系元組發(fā)送一系列響應(yīng)到資源管理器,從而讓它了解數(shù)據(jù)文件(具有的關(guān)系)、那些文檔格式和記錄大小、索引文件,這有助于快速找到數(shù)據(jù)文件的元素。 請求數(shù)據(jù)被翻譯成頁,這些請求被傳遞給緩沖管理器。我們將在1.2.3節(jié)討論緩沖區(qū)管理器的作用,但簡單來說,它的任務(wù)是把在二級存儲器里(通常是磁盤)永久保存的部分合適數(shù)據(jù)發(fā)送到主存緩沖器中。通常,頁或“磁盤塊”是緩沖器和磁盤間的傳送單元。 緩沖管理器和存儲管理器相互通信而從磁盤獲得數(shù)據(jù)。存儲管理器可能會含有一些操作系統(tǒng)指令,但更特殊的是,數(shù)據(jù)庫管理系統(tǒng)可以直接向磁盤控制器發(fā)送指令。 事物處理 查詢和其它數(shù)據(jù)操縱語言行為被劃分成事物,事物是彼此孤立必須自動執(zhí)行的單元。通常每一個查詢或修改行為自身就是一個事物。此外,事物的執(zhí)行必須是持久的,意思是任何一個完成了的事物其結(jié)果必須是恒定的,即使系統(tǒng)恰巧在事物完成時崩潰。我們把事物處理器分成兩個主要部分: (1)一個并發(fā)控制管理器,或者調(diào)度器,負責確保事物的原子性和孤立性。 (2)一個日志恢復(fù)管理器,負責確保事物的持久性。 我們將在1.2.4節(jié)進一步講述這些組件。 1.2.3存儲緩沖管理器 數(shù)據(jù)庫的數(shù)據(jù)通常放在二級存儲器,在現(xiàn)今的計算機系統(tǒng)中“二級存儲器“ 一般指磁盤。不過,要對數(shù)據(jù)執(zhí)行任何有用的操作,則數(shù)據(jù)必須在主存。存儲管理器的工作是控制數(shù)據(jù)在磁盤的存放以及數(shù)據(jù)在磁盤和主存儲器間的傳遞。 在一個簡單的數(shù)據(jù)庫系統(tǒng)中,存儲管理器或許僅僅是底層操作系統(tǒng)的文件系統(tǒng)。但是,為了提高效率,數(shù)據(jù)庫管理系統(tǒng)一般直接控制對磁盤的存儲,至少在某些情況下。存儲管理器記錄文件在磁盤上的位置,并獲得該塊或含有來自緩沖管理器回復(fù)的文件的那些塊。大家知道,磁盤一般可分為磁盤塊,這些磁盤塊是一些相鄰的區(qū)域,含有大量的字節(jié),可能是212或214(約4000至16000字節(jié))。 緩沖管理器負責把可用主存劃分成許多緩沖器,它們是頁大小的區(qū)域,能夠存放磁盤塊大小的內(nèi)容。因此,當所有的數(shù)據(jù)庫管理器組件需要來自磁盤的信息時,便直接或間接通過執(zhí)行引擎與緩沖器和緩沖管理器交互。不同組件所需要的各種信息可能包括: (1)數(shù)據(jù):數(shù)據(jù)庫本身的內(nèi)容。 (2)元數(shù)據(jù):描述數(shù)據(jù)庫結(jié)構(gòu)和約束的數(shù)據(jù)庫模型。 (3)統(tǒng)計數(shù)據(jù):數(shù)據(jù)庫管理系統(tǒng)收集和存儲的有關(guān)數(shù)據(jù)的屬性,如大小、值、各種關(guān)系以及數(shù)據(jù)庫組件。 (4)索引:支持高效訪問數(shù)據(jù)的數(shù)據(jù)結(jié)構(gòu)。 有關(guān)緩沖管理器的更完整描述及其發(fā)揮的作用將在15.7節(jié)講述。 1.2.4事物處理 把一個或更多的數(shù)據(jù)庫操作分組成一個事務(wù)是很正常的,事務(wù)就是一個必須要自動執(zhí)行并明顯脫離其它事務(wù)的工作單元。此外,數(shù)據(jù)庫管理系統(tǒng)提供持久性保證:事務(wù)一旦完成,將永遠不會消失。因此,事務(wù)管理器接受來自一個應(yīng)用的事務(wù)指令,這些指令會告訴事務(wù)管理器什么時候事務(wù)開始或結(jié)束,以及此應(yīng)用所其期望的信息。所以接受交易指令,從一個應(yīng)用,其中告訴經(jīng)理人交易時,交易的開始和結(jié)束,以及信息的期望應(yīng)用(例如,有些可能不希望請求原子數(shù))。事務(wù)處理器執(zhí)行下列任務(wù): (1)登記日志:為了保證持久性,數(shù)據(jù)庫的每一次變動都會單獨記錄在磁盤上。日志管理器遵循其中一些設(shè)計,以確保無論何時系統(tǒng)發(fā)生故障或“沖突“現(xiàn)象,恢復(fù)管理器將能夠?qū)彶槿罩镜淖兓突謴?fù)數(shù)據(jù)庫,使其狀態(tài)一致。日志管理器最初把日志記錄在緩沖器里,并與緩沖區(qū)管理器協(xié)商,以確保緩沖器里的內(nèi)容在適當?shù)臅r候?qū)懟氐酱疟P(磁盤里可以防止沖突)。 (2)并發(fā)控制:事物必須能獨立執(zhí)行。但在大多數(shù)系統(tǒng)中,事實上有許多事務(wù)同時執(zhí)行。因此,調(diào)度器(并發(fā)控制管理器)必須確保各種事務(wù)的個人行動有序進行,結(jié)果就象是這些事務(wù)是一個整體在執(zhí)行,一次一個。一個典型的調(diào)度程序,它的工作就是在某些數(shù)據(jù)庫片段保持鎖。這些鎖,是防止兩個事務(wù)訪問同一塊數(shù)據(jù),以至于交互性很差。這些鎖一般都存放在主存的鎖表里,就象圖1.1 展示的那樣。調(diào)度器通過禁止執(zhí)行引擎訪問部分鎖定的數(shù)據(jù)庫來制約查詢的執(zhí)行和其他數(shù)據(jù)庫操作。 (3)解除死瑣:當事物經(jīng)由調(diào)度器授予的鎖來競爭資源時,它們很容易陷入一種狀態(tài),在這種狀態(tài)下任何事務(wù)都不能進行,因為每一個事物都需要彼此已擁有的資源。事務(wù)管理器有責任干預(yù)和取消一個或更多的事務(wù),從而讓其它事物可以進行下去。 1.2.5查詢處理器 數(shù)據(jù)庫管理系統(tǒng)這部分,對用戶影響最大的就是查詢處理器。圖1.1中查詢處理器由兩部分組成: 1、查詢編譯器,將查詢結(jié)果翻譯成一種內(nèi)部形式,即查詢計劃。后者是對數(shù)據(jù)的一系列操作。通常這些在查詢計劃里的操作是對“關(guān)系代數(shù)“的操作,這些將在5.2節(jié)討論。往往是在一查詢計劃是實施的"關(guān)系代數(shù)"的經(jīng)營方式,這是討論在第。查詢編譯器包括三個主要單元: (1)查詢分析器,它根據(jù)文字上的形式查詢建立在一個樹結(jié)構(gòu)。 (2)查詢預(yù)處理器,它從事對查詢的語義檢查(例如,確保查詢中的所有關(guān)系都真實存在),并把分析樹轉(zhuǎn)變成一棵代表初始查詢計劃的代數(shù)運算樹。 (3)查詢優(yōu)化器,它將原始查詢計劃轉(zhuǎn)變成對實際數(shù)據(jù)操作的最佳可用序列。查詢編譯器使用元數(shù)據(jù)和統(tǒng)計數(shù)據(jù),以決定哪些操作序列可能是最快的。例如,存在著一種索引,它是提供訪問數(shù)據(jù)的一種專門數(shù)據(jù)結(jié)構(gòu)。并為那些數(shù)據(jù)的一個或多個組件賦值,可以使這些計劃速度遠遠超過另外的那些。 2、執(zhí)行引擎,它負責執(zhí)行所選定查詢計劃的每一步。執(zhí)行引擎會直接或通過緩沖器與其它大部分數(shù)據(jù)庫組件相交互。為了處理那些數(shù)據(jù),它必須將來自數(shù)據(jù)庫的數(shù)據(jù)送到緩沖器里。它需要與調(diào)度器相交互,為了防止訪問已鎖定的數(shù)據(jù),并與日志管理器相聯(lián)系,以確保所有數(shù)據(jù)庫的變化都妥當記錄。 1.3數(shù)據(jù)庫概述—系統(tǒng)研究 意念相關(guān)數(shù)據(jù)庫系統(tǒng),可分為三大類: (1)數(shù)據(jù)庫設(shè)計。怎樣創(chuàng)建一個有用的數(shù)據(jù)庫?什么樣的信息進入數(shù)據(jù)庫?這些信息是怎么組織的?要對數(shù)據(jù)項的值和類型提出什么樣的假設(shè)?數(shù)據(jù)項又是如何連接的? (2)數(shù)據(jù)庫編程。怎樣表達查詢和其它數(shù)據(jù)庫操作?在一個應(yīng)用中如何使用數(shù)據(jù)庫管理系統(tǒng)的其他功能,如事務(wù)或約束?數(shù)據(jù)庫編程和常規(guī)編程是怎樣融合的? (3)數(shù)據(jù)庫系統(tǒng)實施。如何建立一個數(shù)據(jù)庫管理系統(tǒng),包括查詢處理,事務(wù)處理以及實現(xiàn)有效訪問的組織存儲等事情? 1.3.1數(shù)據(jù)庫設(shè)計 第2章剛開始為表達數(shù)據(jù)庫設(shè)計描述了一高級概念,即實體關(guān)系模型。我們在第3章介紹了關(guān)系模型,它是數(shù)據(jù)庫管理系統(tǒng)最廣泛采用的,且我們在1.1.2節(jié)接觸過 。我們講述了如何把實體關(guān)系設(shè)計轉(zhuǎn)換成關(guān)系設(shè)計,又叫“關(guān)系數(shù)據(jù)庫模式”。以后,在6.6節(jié),我們將向大家展示如何使關(guān)系數(shù)據(jù)庫模式格式化成SQL語言的數(shù)據(jù)定義部分。 第3章還向讀者介紹了“依賴”的概念,這是格式化的描述一個關(guān)系中元組間關(guān)系的假設(shè)。依賴允許我們通過一個被稱為關(guān)系“正?;钡倪M程改進關(guān)系數(shù)據(jù)庫的設(shè)計。 在第4章我們將探討數(shù)據(jù)庫設(shè)計中的面向?qū)ο蠓椒?。那里,我們采用了ODL語言,它允許用面向?qū)ο蟮母呒壵Z句來描述數(shù)據(jù)庫。我們也在尋找將面向?qū)ο蟮脑O(shè)計與關(guān)系模型相結(jié)合的方法,從而得到一種所謂的“對象-關(guān)系”模型。最后,第四章還介紹了“半結(jié)構(gòu)化數(shù)據(jù)”,它是一種特別靈活的數(shù)據(jù)庫模型,我們可以在文檔語言XML中看到它的時尚體現(xiàn)。 1.3.2數(shù)據(jù)庫編程 第5章整個10節(jié)都涵蓋有數(shù)據(jù)庫編程。第5章首先以關(guān)系模型的一個抽象查詢方法開始,介紹了構(gòu)成“關(guān)系代數(shù)”的操作符集。 第6章介紹了有關(guān)SQL查詢和數(shù)據(jù)庫模型語句的基本思想。第七章介紹了有關(guān)數(shù)據(jù)上的約束和觸發(fā)器SQL的各方面。 第8章涵蓋了SQL編程的某些高級方面。首先,最簡單的SQL編程模型是一個獨立、通用查詢界面,在實踐中大多數(shù)SQL編程是嵌入在一個用傳統(tǒng)語言編寫的較大項目,如C語言。在第八章我們學(xué)習(xí)如何將周圍程序與SQL語句連接起來,以及怎樣將數(shù)據(jù)從數(shù)據(jù)庫傳遞給程序變量,反之亦然。本章還講述了如何利用SQL的功能,簡化事務(wù),連接客戶機到服務(wù)器,并授權(quán)非法用戶進入數(shù)據(jù)庫。 在第9章我們將注意力轉(zhuǎn)向面向?qū)ο蟮臄?shù)據(jù)庫編程標準。在這里,我們考慮兩個方向。第一、OQL(對象查詢語言),可以看作是試圖使C + + ,或其他面向?qū)ο缶幊陶Z言與高級數(shù)據(jù)庫編程需求相兼容。第二、近來在SQL標準中采用的面向?qū)ο筇卣?,可以被看作是使關(guān)系數(shù)據(jù)庫、SQL與面向?qū)ο缶幊碳嫒莸囊淮螄L試。 最后,在第10章,我們回到在第5章中開始的對抽象查詢語言的研究。在這里,我們研究邏輯語言,看看它們是如何被用于擴展現(xiàn)代SQL功能的。 1.3.3數(shù)據(jù)庫系統(tǒng)實現(xiàn) 本書的第三部分重點在如何實現(xiàn)數(shù)據(jù)庫管理系統(tǒng)。數(shù)據(jù)庫系統(tǒng)的實現(xiàn),這個課題可以大致分為三個部分: (1)存儲管理:如何有效使用二級存儲來容納數(shù)據(jù)以及實現(xiàn)它們的快速訪問。 (2)查詢處理:如何用一種很高級的語言,如SQL來表示查詢,并能實現(xiàn)高效執(zhí)行。 (3)事務(wù)管理:如何用1.2.4節(jié)中提到的ACID屬性支持事務(wù)。 這里的每個題目都涵蓋了書中的幾個章節(jié)。 存儲管理概述 第11章介紹了存儲器。不過,由于二級存儲器,尤其是磁盤,是數(shù)據(jù)庫管理系統(tǒng)管理數(shù)據(jù)的中心,所以我們要仔細研究數(shù)據(jù)存儲的方式以及在磁盤上的訪問。于是我們引入了基于磁盤數(shù)據(jù)的“塊模型”, 它幾乎影響了數(shù)據(jù)庫系統(tǒng)中所有的操作。 第12章涉及儲存的數(shù)據(jù)元素關(guān)系,元組,屬性值,以及其它數(shù)據(jù)模型里的等價物——符合數(shù)據(jù)塊模型的要求。接著我們看看用于構(gòu)建索引的重要數(shù)據(jù)結(jié)構(gòu)。索引是一個支持高效存取的數(shù)據(jù)結(jié)構(gòu)。 第13章涵蓋了重要的一維索引結(jié)構(gòu)—索引順序文件,B-樹和哈希表。這些索引通常被用于數(shù)據(jù)庫管理系統(tǒng),以支持屬性值已知并符合元組要求的查詢。B-樹也是用來訪問按給定屬性排列的關(guān)系。 第14章論述了多維索引,它們是專門應(yīng)用的數(shù)據(jù)結(jié)構(gòu),如地理數(shù)據(jù)庫,那里可以專門查詢某個地區(qū)的相關(guān)內(nèi)容。這些索引結(jié)構(gòu)也支持復(fù)雜的SQL查詢,這種查詢限定兩個或兩個以上屬性的值,而其中的這些結(jié)構(gòu)已開始在商業(yè)數(shù)據(jù)庫管理系統(tǒng)中出現(xiàn)。 查詢處理概述 第15章,涵蓋了基本的查詢執(zhí)行。我們學(xué)過一些關(guān)系代數(shù)操作的高效算法。這些算法的設(shè)計是高效的,當數(shù)據(jù)存儲在磁盤時,并在某些情況下,這些算法與主存算法有很大的差別。 在第16章,我們考慮查詢編譯器和優(yōu)化器的結(jié)構(gòu)。我們將從解析查詢以及對它們的語義檢查開始。接著,我們考慮查詢轉(zhuǎn)換,從SQL到關(guān)系代數(shù),邏輯查詢計劃的選擇,也就是,一個代數(shù)式,代表必須執(zhí)行的特殊操作,以及有關(guān)操作命令的必要約束。最后,我們探討物理查詢計劃的選擇,在此過程中,我們對特殊操作命令,用來實現(xiàn)每一步操作的算法都做了簡要概述。 事務(wù)處理概述 在第17章中,我們了解到在數(shù)據(jù)庫管理系統(tǒng)中如何實現(xiàn)事務(wù)的持久性。中心思想是設(shè)置一個能記錄數(shù)據(jù)庫所有變化的日志。任何存在于主存但不在磁盤的內(nèi)容都可能在沖突(比如,電力供應(yīng)中斷)時丟失。因此,我們必須謹慎行事,以一種恰當?shù)闹刃驅(qū)?shù)據(jù)從從緩沖區(qū)移到磁盤,無論是數(shù)據(jù)庫自身的變化還是日志的變更。這里有幾個日志策略可用,但每次都在某些方面限制了我們的行動自由。 隨后,我們在第18章談到了并發(fā)控制的獨立性和原子性。我們將事務(wù)看作是讀寫數(shù)據(jù)庫元素的操作序列。本章的主要課題是如何管理數(shù)據(jù)庫元素上的鎖:使用的不同類型的鎖,事務(wù)獲得和釋放鎖的方式。此外,本章還研究了不使用瑣而能保證事務(wù)原子性和獨立性的一系列方法。 第19章總結(jié)了我們對事務(wù)處理的學(xué)習(xí)。我們總結(jié)了日志需求間的交互,這在第17章討論過,和并發(fā)性的要求,在第18章講過。處理死鎖,事務(wù)管理器的另一項重要功能,這里也提到過。在分散的環(huán)境里延長并發(fā)控制,也會在第19章介紹。 最后,我們認為事務(wù)是“長”的是可能的,它會花費幾小時或幾天的時間,而不是數(shù)毫秒。長事務(wù)不可能鎖住數(shù)據(jù)而沒有產(chǎn)生混亂,因為有可能有其它用戶使用此數(shù)據(jù),所以這迫使我們重新思考包含長事務(wù)的應(yīng)用并發(fā)控制。 1.3.4信息集成概述 數(shù)據(jù)庫系統(tǒng)近來的許多演變都朝著允許來自不同數(shù)據(jù)源功能的方向發(fā)展,這些數(shù)據(jù)源可能是在一個更大的整體上不能被數(shù)據(jù)庫管理系統(tǒng)處理的數(shù)據(jù)庫或信息資源。在第1.1.7節(jié),我們簡要的向你介紹了這些問題。我們討論集成的主要模式,包括翻譯和集成的源拷貝,稱為“數(shù)據(jù)倉庫”,以及收集來源的虛擬“觀點”,又叫解調(diào)器。 摘自:赫克托加西亞-莫利納,杰夫烏爾曼,珍妮佛. 數(shù)據(jù)庫系統(tǒng)世界. 附:英文原文 Overview of a Database Management System Hector Garcia-Molina, Jeff Ullman, Jennifer Widom 1.2 Overview of a Database Management System In Fig. 1.1 we see an outline of a complete DBMS. Single boxes represent system components, while double boxes represent in-memory data structures. The solid lines indicate control and data flow, while dashed lines indicate data flow only. Since the diagram is complicated, we shall consider the details in several stages. First, at the top, we suggest that there are two distinct sources of commands to the DBMS: 1. Conventional users and application programs that ask for data or modify data. 2. A database administrator: a person or persons responsible for the structure or schema of the database. 1.2.1 Data-Definition Language Commands The second kind of command is the simpler to process, and we show its trail beginning at the upper right side of Fig. 1.1. For example, the database administrator, or DBA, for a university registrars database might decide that there should be a table or relation with columns for a student, a course the student has taken, and a grade for that student in that course. The DBA might also decide that the only allowable grades are A, B, C, D, and F. This structure and constraint information is all part of the schema of the database. It is shown in Fig. 1.1 as entered by the DBA, who needs special authority to execute schema-altering commands, since these can have profound effects on the database. These schema-altering DDL commands (“DDL” stands for “data-definition language”) are parsed by a DDL processor and passed to the execution engine, which then goes through the index/file/record manager to alter the metadata, that is, the schema information for the database. 1.2.2 Overview of Query Processing The great majority of interactions with the DBMS follow the path on the left side of Fig. 1.1. A user or an application program initiates some action that does not affect the schema of the database, but may affect the content of the database (if the action is a modification command) or will extract data from the database. Remember from Section 1.1 that the language in which these commands are expressed is called a data-manipulation language (DML) or somewhat colloquially a query language. There are many data-manipulation languages available, but SQL, which was mentioned in Example 1.1, is by far the most commonly used. DML statements are handled by two separate subsystems, as follows. Answering the query The query is parsed and optimized by a query compiler. The resulting query plan, or sequence of actions the DBMS will perform to answer the query, is passed to the execution engine. The execution engine issues a sequence of requests for small pieces of data, typically records or tuples of a relation, to a resource manager that knows about data files (holding relations), the format and size of records in those files, and index files, which help find elements of data files quickly. The requests for data are translated into pages and these requests are passed to the buffer manager. We shall discuss the role of the buffer manager in Section 1.2.3, but briefly, its task is to bring appropriate portions of the data from secondary storage (disk, normally) where it is kept permanently, to main memory buffers. Normally, the page or “disk block” is the unit of transfer between buffers and disk. The buffer manager communicates with a storage manager to get data from disk. The storage manager might involve operating-system commands, but more typically, the DBMS issues commands directly to the disk controller. Transaction processing Queries and other DML actions are grouped into transactions, which are units that must be executed atomically and in isolation from one another. Often each query or modification action is a transaction by itself. In addition, the execution of transactions must be durable, meaning that the effect of any completed transaction must be preserved even if the system fails in some way right after completion of the transaction. We divide the transaction processor into two major parts: 1. A concurrency-control manager, or scheduler, responsible for assuring atomicity and isolation of transactions, and 2. A logging and recovery manager, responsible for the durability of transactions. We shall consider these components further in Section 1.2.4. 1.2.3 Storage and Buffer Management The data of a database normally resides in secondary storage; in todays computer systems “secondary storage” generally means magnetic disk. However, to perform any useful operation on data, that data must be in main memory. It is the job of the storage manager to control the placement of data on disk and its movement between disk and main memory. In a simple database system, the storage manager might be nothing more than the file system of the underlying operating system. However, for efficiency purposes, DBMSs normally control storage on the disk directly, at least under some circumstances. The storage manager keeps track of the location of files on the disk and obtains the block or blocks containing a file on request from the buffer manager. Recall that disks are generally divided into disk blocks, which are regions of contiguous storage containing a large number of bytes, perhaps 212 or 214 (about 4000 to 16,000 bytes). The buffer manager is responsible for partitioning the available main memory into buffers, which are page-sized regions into which disk blocks can be transferred. Thus, all DBMS components that need information from the disk will interact with the buffers and the buffer manager, either directly or through the execution engine. The kinds of information that various components may need include: 1. Data: the contents of the database itself. 2. Metadata: the database schema that describes the structure of the database. 3. Statistics: information gathered and stored by the DBMS about data properties such as the sizes of, and values in, various relations or other components of the database. 4. Indexes: data structures that support efficient access to the data. A more complete discussion of the buffer manager and its role appears in Section 15.7. 1.2.4 Transaction Processing It is normal to group one or more database operations into a transaction, which is a unit of work that must be executed atomically and in apparent isolation from other transactions. In addition, a DBMS offers the guarantee of durability: that the work of a completed transaction will never be lost. The transaction manager therefore accepts transaction commands from an application, which tell the transaction manager when transactions begin and end, as well as information about the expectations of the application (some may not wish to require atomicity, for example). The transaction processor performs the following tasks: 1. Logging: In order to assure durability, every change in the database is logged separately on disk. The log manager follows one of several policies designed to assure that no matter when a system failure or “crash” occurs, a recovery manager will be able to examine the log of changes and restore the database to some consistent state. The log manager initially writes the log in buffers and negotiates with the buffer manager to make sure that buffers are written to disk (where data can survive a crash) at appropriate times. 2. Concurrency control: Transactions must appear to execute in isolation. But in most systems, there will in truth be many transactions executing at once. Thus, the scheduler (concurrency-control manager) must assure that the individual actions of multiple transactions are executed in such an order that the net effect is the same as if the transactions had in fact executed in their entirety, one-at-a-time. A typical scheduler does its work by maintaining locks on certain pieces of the database. These locks prevent two transactions from accessing the same piece of data in ways that interact badly. Locks are generally stored in a main-memory lock table, as suggested by Fig. 1.1. The scheduler affects the execution of queries and other database operations by forbidding the execution engine from accessing locked parts of the database. 3. Deadlock resolution: As transactions compete for resources through the locks that the scheduler grants, they can get into a situation where none can proceed because each needs something another transaction has. The transaction manager has the responsibility to intervene and cancel (“roll-back” or “abort”) one or more transactions to let the others proceed. 1.2.5 The Query Processor The portion of the DBMS that most affects the performance that the user sees is the query processor. In Fig. 1.1 the query processor is represented by two components: 1. The query compiler, which translates the query into an internal form called a query plan. The latter is a sequence of operations to be performed on the data. Often the operations in a query plan are implementations of “relational algebra” operations, which are discussed in Section 5.2. The query compiler consists of three major units: (a) A query parser, which builds a tree structure from the textual form of the query. (b) A query preprocessor, which performs semantic checks on the query (e.g., making sure all relations mentioned by the query actually exist), and performing some tree transformations to turn the parse tree into a tree of algebraic operators representing the initial query plan. (c) A query optimizer, which transforms the initial query plan into the best available sequence of operations on the actual data. The query compiler uses metadata and statistics about the data to decide which sequence of operations is likely to be the fastest. For example, the existence of an index, which is a specialized data structure that facilitates access to data, given values for one or more components of that data, can make one plan much faster than another. 2. The execution engine, which has the responsibility for executing each of the steps in the chosen query plan. The execution engine interacts with most of the other components of the DBMS, either directly or through the buffers. It must get the data from the database into buffers in order to manipulate that data. It needs to interact with the scheduler to avoid accessing data that is locked, and with the log manager to make sure that all database changes are properly logged. 1.3 Outline of Database-System Studies Ideas related to database systems can be divided into three broad categories: 1. Design of databases. How does one develop a useful database? What kinds of information go into the database? How is the information structured? What assumptions are made about types or values of data items? How do data items connect? 2. Database programming. How does one express queries and other operations on the database? How does one use other capabilities of a DBMS, such as transactions or constraints, in an application? How is database programming combined with conventional programming? 3. Database system implementation. How does one build a DBMS, including such matters as query processing, transaction processing and organizing storage for efficient access? 1.3.1 Database Design Chapter 2 begins with a high-level notation for expressing database designs, called the entity-relationship model. We introduce in Chapter 3 the relational model, which is the model used by the most widely adopted DBMSs, and which we touched upon briefly in Section 1.1.2. We show how to translate entity-relationship designs into relational designs, or “relational database schemas”. Later, in Section 6.6, we show how to render relational database schemas formally in the data-definition portion of the SQL language.Chapter 3 also introduces the reader to the notion of “dependencies”, which are formally stated assumptions about relationships among tuples in a relation. Dependencies allow us to improve relational database designs, through a process known as “normalization” of relations. In Chapter 4 we look at object-oriented approaches to database design. There, we cover the language ODL, which allows one to describe databases in a high-level, object-oriented fashion. We also look at ways in which object-oriented design has been combined with relational modeling, to yield the so-called “object-relational” model.Finally, Chapter 4 also introduces “semistructured data” as an especially flexible database model, and we see its modern embodiment in the document language XML. 1.3.2 Database Programming Chapters 5 through 10 cover database programming. We start in Chapter 5 with an abstract treatment of queries in the relational model, introducing the family of operators on relations that form “relational algebra”.Chapters 6 through 8 are devoted to SQL programming. As we mentioned, SQL is the dominant query language of the day. Chapter 6 introduces basic ideas regarding queries in SQL and the expression of database schemas in SQL.Chapter 7 covers aspects of SQL concerning constraints and triggers on the data.Chapter 8 covers certain advanced aspects of SQL programming. First, while the simplest model of SQL programming is a stand-alone, generic query interface, in practice most SQL programming is embedded in a larger program that is written in a conventional language, such as C. In Chapter 8 we learn how to connect SQL statements with a surrounding program and to pass data from the database to the programs variables and vice versa. This chapter also covers how one uses SQL features th- 1.請仔細閱讀文檔,確保文檔完整性,對于不預(yù)覽、不比對內(nèi)容而直接下載帶來的問題本站不予受理。
- 2.下載的文檔,不會出現(xiàn)我們的網(wǎng)址水印。
- 3、該文檔所得收入(下載+內(nèi)容+預(yù)覽)歸上傳者、原創(chuàng)作者;如果您是本文檔原作者,請點此認領(lǐng)!既往收益都歸您。
下載文檔到電腦,查找使用更方便
9.9 積分
下載 |
- 配套講稿:
如PPT文件的首頁顯示word圖標,表示該PPT已包含配套word講稿。雙擊word圖標可打開word文檔。
- 特殊限制:
部分文檔作品中含有的國旗、國徽等圖片,僅作為作品整體效果示例展示,禁止商用。設(shè)計者僅對作品中獨創(chuàng)性部分享有著作權(quán)。
- 關(guān) 鍵 詞:
- 數(shù)據(jù)庫管理系統(tǒng) 數(shù)據(jù)庫 管理 系統(tǒng) word
鏈接地址:http://m.appdesigncorp.com/p-8937278.html