Today目标–如虎添翼(虎指suffix下载_Word模板_16

is_995397

暂无简介

Today目标–如虎添翼(虎指suffixToday目标–如虎添翼(虎指suffix Algorithms in Bioinformatics91/10/30筆記 Today目標–如虎添翼(虎指suffix tree) * An fundamental query that significantly strengthens suffix tree --Range Minima Query (RMQ) -->>前翼: RMQ for ?sequences. -->>後翼: RMQ for general sequences. *今天介紹內容的Hierarc...

Today目标–如虎添翼(虎指suffix Algorithms in Bioinformatics91/10/30筆記 Today目標–如虎添翼(虎指suffix tree) * An fundamental query that significantly strengthens suffix tree --Range Minima Query (RMQ) -->>前翼: RMQ for ?sequences. -->>後翼: RMQ for general sequences. *今天介紹內容的Hierarchy --+/-RMQ(最底層) --LCA(第二層) --LCE、RMQ(第三層分兩邊) --Wildcard matching、Fuzzy matching (建構在LCE的第四層) --Document listing (建構在RMQ的第四層) Part 0: 簡介RMQ 1. RMQ: Range Minima Query *S: a sequence of numbers. *小(S, i, j) = k if.(k是位置i到位置j中子字串最小值的位置) --i ? k ? j, and --S[k] = min(S[i], S[i+1], …, S[j]). *例題: 123456789 S = 340141932 --小(S, 2, 6) = 3 --小(S, 4, 9) = 4 (or 6). 2. The RMQ challenge *Input: a sequence S of numbers. *Output: a data structure D for S.(建造一個D,使其可以滿足時間複雜度的要求) *Time complexity(要求要達成的目標) --Constant query time -->>Each query 小(S, i, j) for S can be answered from D and S in O(1) time. --Linear preprocessing time -->>D can be computed in O(|S|) time. 3. Naive approach *Storing the answer of 小(S, i, j) in a table for all index pairs i and j with 1 ? i ? j ? |S|.(直覺想法:將所有的”小(S, i, j)”算出來存著) *Query time = O(1). *Preprocessing time = Ω(|S|?).(不符合要求) 4. Faster Preprocessing(讓Preprocessing time縮短一點) *Assumption (without loss of generality).--|S| = 2^k for some positive integer k.(Sequence的長度必須為2的次方) *Idea: --Precomputing the values of 小(S, i, j) only for those indices i and j with j–i + 1 = 1, 2, 4, 8, …, 2^k = |S|.(不算出全部的答案,只求出2的次方長度的小(S, i, j)來儲存) *Preprocessing time --O(|S| log |S|).(還沒有達到目標,但是比Ω(|S|?)短了一些) 5. 小(S, i, j) still in O(1) time *Let k be the (unique) integer that satisfies 2^k ? j – i + 1 < 2^ (k+1). *Then, 小(S, i, j) is --x = 小(S, i, i + 2^k–1) or --y = 小(S, j–2^k + 1, j).(因為j – i + 1在2^k和2^ (k+1)之間,所以由i開始算到i+2^k-1的”小”和由j往回算2^k-1的”小”都已經在Preprocessing被算出來了,所以只要比較兩者誰小,就是i到j的小(S, i, j)了,且這兩個小所比較的範圍會將整個i到j 中所有的數值全部包括下去) --因為所有2^k的小(S, i, j)已經記住,所以query可以在constant time完成. Part 1: 前翼 *The RMQ Challenge for ?sequeneces(只針對特殊的sequences) 1. ?sequeneces *S is a ?sequence if S[i] – S[i – 1] = ?1 for each index i with 2 ? i ? |S|.(前後數值的關係不是1就是-1) *For example,(允許負數在sequence裡面) --S = 5 6 5 4 3 2 3 2 3 4 5 6 5 6 7 -- + - - - - + - + + + + - + + --S = 3 4 3 2 1 0 -1 -2 -1 0 1 2 1 -- + - - - - - - + + + + - 2. 前翼: The RMQ Challenge for ?sequeneces *Input: a ?sequence S of numbers *Output: a data structure D for S *Time complexity --Constant query time -->>Each query 小(S, i, j) for S can be answered from D and S in O(1) time. --Linear preprocessing time -->>D can be computed in O(|S|) time under the unit-cost RAM model. -->>全部假設在Unit-Cost RAM model下才能達到此time complexity的目標. 3. Unit-Cost RAM model *Operations such as add, minus, comparison on consecutive O(log n) bits can be performed in O(1) time. --在排序演算法中需要O(n*log n)的時間才能排好,但若把bit=log n的時間考慮下去,則需要O(n*(log n)?)的時間,所以一般都是省略bit的時間. 4. Idea: compression *Breaking S into blocks of length L = ? log |S|. (any constant c < 1 is OK,L的長度要為log|S|*(小於1的數值)才可以) --There are B = 2|S|/log |S| blocks.(將S切開成B個blocks,每個blocks長度為? log |S|) *Let 縮[t] be the minimum of the i-th block of S. --縮[t] = min {S[j] | (t–1) L < j ? tL} for t = 1, 2, …, B.(縮[t]記住第t個block最小的數值) --Computable in O(|S|) time. *RMQ on縮: 小(縮, x, y) --O(1) query time. --O(|S|) preprocessing time. (Why?)(所有的縮有(|S|/log |S|)個,比較彼此之間的大小需要花log(|S|/log |S|)的時間,所以處理所花的時間為O{(|S|/log |S|)*log(|S|/log |S|)}會在O(|S|)之中). 5.小(S, i, j) via 小(縮, s, t) *Suppose S[i] is in the α-th block of S. --(α–1) L>If (++error > k) then return “no”;(進入while loop則error要加1,若error大於k則表示P在此沒有fussy occurrences) -->>j += 1 + 延(i + j + 1, j + 2);(error沒有超過k,則繼續比對下去) --return “yes”. --loop只花k次的時間,所以全部只花k*|S|的時間 3. O (k|S|) time *O(|P|+|S|) = O(|S|) time: preprocessing for supporting each 延(i, j) query in O(1) time. *O(|S|) iterations, each takes time O(k). Part 6: 後翼: The RMQ (i.e., 小(S, i, j)) challenge for general sequences *Another application of lowest common ancestor 1. The RMQ challenge *Input: a sequence S of numbers *Output: a data structure D for S *Time complexity --Constant query time -->>Each query 小(S, i, j) for S can be answered from D and S in O(1) time. --Linear preprocessing time -->>D can be computed in O(|S|) time. 2. Idea: Minima Tree *建構Minima Tree 123456789 S=432417363 --先找整個string中最小值的當root,由這個位置分左半邊和右半邊,再分別尋找最小值來當root,依此類推將整個tree建構起來. *小(S,i,j)=祖(i,j). --若i和j為不同邊,不管怎麼挑i和j,則root一定是它們的最小值,若在同一邊,根據每次都挑最小的當root,則一定會找到最小的值. 3. Homework 1 (Optional)(兩週之內交) *Show how to construct a minima tree for any sequence S of numbers in O(|S|) time. *Due in two weeks. --Email your typesetted solutions to TAs, or hand in your hand-written solutions to me before the end of next lecture (in two weeks.) Part 7: Listing source strings that contains a pattern string [Muthukrishnan, SODA’02] *An application of RMQ for general sequences 1. The problem *Input: --Strings S1, S2, …, Sm, which can be preprocessed in linear time. --A string P. *Output: (問的問題可以有許多種,這只是其中的一種) --The index j of each Sj that contains P.(P若出現在Sj中,則將j output出來) 2. Preliminary attempts *Obtaining the suffix tree for S1#S2#…#Sm$. --Find all occurrences of P. -->>i.e., exact string matching for S1#S2#…#Sm$ and P. -->>Time = O(|P| + total number of occurences of P).(加上P在這個tree中出現的次數,但是這樣的時間複雜度不佳,因為若P只出現在S2中,但卻重複出現了100次,則時間的複雜度要為O(|P| + 100),但output卻只有”2”一個index,實在是太浪費時間了) *Obtaining the suffix tree for each Si.--Determining whether P occurs in Si. -->>i.e., substring problem for each pair Si and P. -->>Time = O(|P| * m).(每個S都去建一個tree,到每個tree去詢問P是否在裡面,則需要O(|P| * m)的時間) 3. The challenge *Input: --Strings S1, S2, …, Sm, which can be preprocessed in linear time. --A string P. *Output: --The index j of each Sj that contains P. *Objective --O(|P| + 現(P)) time, where 現(P) is the number of output indices.(P出現在幾個S中,則為現(P)的值) 4. The second attempt *Constructing the suffix tree for S1#S2#…#Sm$. *Keeping the distinct descendant leaf colors for each internal node.(在internal node中記住自這個點以下所包含的葉子顏色) *Query time? 符合我們所要求的目標 *Preprocessing time? 會花太多的時間 *Each query takes O(|P|+現(P)) time. (Why?) --因為P出現在哪裡,代表以下所有包含的葉子顏色都出現了P. *The preprocessing may need Ω(m|S1#S2#…#Sm$|) time. (Why?) --因為internal node要記住的顏色太多. *Q: Any suggestions for resolving this problem? --像之前一樣,將所有葉子的顏色由左至右排列,以方便查詢. 5. Compact Representation *Keeping the list 彩 of leaf colors from left to right.(將全部的葉子由左至右排列並用一個”彩”來記住所有葉子的顏色) *Each internal keeps the indices of leftmost and rightmost descendant leaves.(internal node只要記住葉子最左邊和最右邊的兩個index) *例題中共有8個葉子,所以彩也有8個,但是顏色只有3種,output時間中的現(P)希望和出現的顏色種類成正比. 6. The challenge of listing distinct colors *Input: a sequence 彩 of colors. *Output: a data structure D for 彩 such that --D is computable in O(|彩|) time. --Each 顏(i, j) = {彩(i), …, 彩(j)} query can be answered from D in O(|顏(i, j)|) time.(顏(i, j)是指在彩(i)到彩(j)中出現幾種顏色) 7. An auxiliary index array *Let 前[i] = 0 if 彩[j] ?彩[i] for all j < i. (多一個array來幫忙,若第i個彩在之前的i-1個彩中都沒有出現過,則前[i]的值給0) *Let 前[i] be the largest index j with j < i such that 彩[i] = 彩[j].(若在i之前已經出現過這個顏色,則前[i]為前一個出現這個顏色的位置) 8. An observation *A color c is in 顏(i, j) if and only there is an index k in [i, j] such that 彩[k] = c, and 前[k] < i. --在i到j中,若有一個顏色c在位置k,要output出去,則表示這個彩[k]=c且前[k]會小於i,因為若前[k]大於i則表示在i到j之間已經出現過顏色c,所以不應該再output一次. 9. The algorithm 解(i, j) *Just recursively call 破(i, j, i); *Subroutine 破(p, q, 左界): (左界為i,且自始至終都沒有改變) --If (p > q) then return; (p ? q才能往下一步走) --Let k = 小(前, p, q);(請問p到q之間最小的前在哪個位置) --If (前[k] ? 左界) then return;(若最小的前值已經在左界的右邊,則表示p到q之間已經沒有新的顏色可以output了) --Output 彩[k];(前[k]<左界表示為第一次出現的顏色) --Call 破(p, k–1, 左界);(因k而分兩邊來繼續找,所以左右邊分別呼叫破) --Call 破(k + 1, q, 左界); 10. Tracing 破(4, 8, 4)… *例題: 1 2 3 4 5 6 7 8 彩黃紅綠紅綠黃綠黃前 0 0 0 2 3 1 5 6 --判斷p是否小於等於q,因為4<8,成立往下計算 --小(前,4,8)=6 --判斷前[6]=1<左界=4,繼續往下計算 --輸出彩[6]=黃色 --call破(4,5,4) (4=p ? q=5成立,找小(前,4,5)=4,且前[4]=2<左界=4,輸出彩[4]=紅色,呼叫破(4,3,4)和破(5,5,4);破(4,3,4)因為4=p > q=3所以return,破(5,5,4) 則因5=p ? q=5成立,找小(前,5,5)=5,且前[5]=3<左界=4,輸出彩[5]=綠色,呼叫破(5,4,4)和破(6,5,4);因這兩個5=p > q=4和6=p > q=5,所以皆return) --call破(7,8,4)(p左界=4,所以return不用再找了) 11. Time = O(|顏(i, j)|) *Why?

本文档为【Today目标–如虎添翼(虎指suffix】，请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑，图片更改请在作品中右键图片并更换，文字修改请直接点击文字进行修改，也可以新增和删除文档中的内容。

[版权声明] 本站所有资料为用户分享产生，若发现您的权利被侵害，请联系客服邮件isharekefu@iask.cn，我们尽快处理。本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权，请谨慎使用。网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传，仅限个人学习分享使用，禁止用于任何广告和商用目的。

相关资料

占星书推介（书摘）

成都市质量技术监督局

宫内刨削系统参数

CT报告书写--胃肠道

广州华测职安门诊部有限公司项可行性研究报告道

祁阳县第十八届中学生体育运动会竞赛规程

红外线灯评分标准

港口生产总结

幼儿园大班美术教案《画彩蛋》教学PPT

煤矿三量及可采期计算标准规定

村干部坐班值班工作记录表

电力系统专业名次缩写及中英文翻译

中国科学院大学外语学院博士学位英语样卷

γ—L—谷氨酰—α—萘胺在乙醇（二氧六环）／水混合介质中的电离常数研究

长发---长上影线缩量十字星

烟感、温感探测范围

企业国有产权登记表

造纸专业英语常用缩写

2023年学校后勤工作计划

《突发公共卫生事件与传染病疫情监测信息报告管理办法》练习题

热门搜索

傣族语言--☆精品word☆

电子产品销售实习周记

卿本无罪.doc

三七工艺

2010版GMP无菌及非无菌要求的制剂生产洁净度要求

部门（单位）票据领用情况表

小学体育韵律操教案

念_六年级想象作文300字

竖向风井风管安装措施方案

入河排污口登记汇总表

政治的逻辑

[word格式] 扬州清曲音乐的审美特征及文化内涵

起重吊装应急预案

盛开在碗里的硬币阅读答案

傣族语言--☆精品word☆

电子产品销售实习周记

卿本无罪.doc

三七工艺

2010版GMP无菌及非无菌要求的制剂生产洁净度要求

部门（单位）票据领用情况表

小学体育韵律操教案

念_六年级想象作文300字

竖向风井风管安装措施方案

入河排污口登记汇总表

政治的逻辑

[word格式] 扬州清曲音乐的审美特征及文化内涵

起重吊装应急预案

盛开在碗里的硬币阅读答案

你可能还喜欢

地铁故障应急处理指南试题(司机、调度)DN

《比较积的大小》教学设计

语言活动秋天来了

当代中国经济期末考试试题及答案2015最新

党日活动请假说明 (4页)

以应用为导向的研究生矩阵论公共课程教学改革的探索

电梯安全管理制度】

(完整版)新老系统迁移及整合方案

盘点高铁上所用的材料

钢结构规格表

【幼师PPT培训课件】D大调儿歌即兴伴奏编配

人教版历史九下第12课《亚非拉的奋起》纳米比亚独立的时间和意义

医疗机构手术分级管理办法(卫办医政发〔2012〕94号)

山东省仿古建筑工程计价定额

Alpha谱仪实验

20xx离散数学A卷20xx离散数学A卷(郑州轻工业学院)

服装安全类别执行标准

服装安全类别执行标准

服装安全类别执行标准

服装安全类别执行标准

最新资料资料动态专题动态

免费

已有人下载

立即下载

关于我们侵权处理投诉反馈帮助中心网站地图
京ICP证000007-6 爱问文库-Copyright © 版权所有
浙公网安备 33021202002483

Today目标–如虎添翼&#40;虎指suffix

热门搜索

历史搜索

Today目标–如虎添翼(虎指suffix