为了正常的体验网站,请在浏览器设置里面开启Javascript功能!

文本语音自动翻译的并行及传输机

2017-09-02 19页 doc 147KB 18阅读

用户头像

is_589748

暂无简介

举报
文本语音自动翻译的并行及传输机文本语音自动翻译的并行及传输机 摘要 当不局限于预存的词汇时,文本语音的自动翻译是最有用的。当语音集成芯片的精密性综合了大量的控制参数时,语音输出的潜在质量已经急剧地提高了。随着文本的标准化以及音位字符串生成算法的应用,质量的提高需要更大的计算资源来处理这类程序。理论上,其中的许多程序可以同时运行。用Occam构建了多重传输机来执行自动的文本语音转换。 简介 在过去,无限制的文本语音转换的最简单的方法使用一小组字母组成词规[1,2,3],在任何方面每个词规都特定说明一个或者多个字母发音。如果这种方法不能达到足够的清晰度...
文本语音自动翻译的并行及传输机
文本语音自动翻译的并行及传输机 摘要 当不局限于预存的词汇时,文本语音的自动翻译是最有用的。当语音集成芯片的精密性综合了大量的控制参数时,语音输出的潜在质量已经急剧地提高了。随着文本的标准化以及音位字符串生成算法的应用,质量的提高需要更大的计算资源来处理这类程序。理论上,其中的许多程序可以同时运行。用Occam构建了多重传输机来执行自动的文本语音转换。 简介 在过去,无限制的文本语音转换的最简单的方法使用一小组字母组成词规[1,2,3],在任何方面每个词规都特定说明一个或者多个字母发音。如果这种方法不能达到足够的清晰度,那么计算机系统的文本语音转换的常规传输是不可行的。因为更多的具有大的发音词典或者语言智能的详细方法使用了太多的可用序列计算资源[4]。综合这类特性如重音、速度、音调、幅度以及韵律信息的复杂的语音集成芯片组的最近的引用已经进一步使转换中的处理问题恶化了。执行文本语音转换的一种典型方法需要几个计算阶段,正如图1所示。文本标准化阶段把文本转换成适合进一步处理的一种标准形式的文本。这可通过扩展略语表、处理标点以及修正非字母字符来实现。另一个阶段是生成语音规则特别的单词的发音音位字符串。这些单词需要使用一部特殊的字典。对于有标准发音的单词而言,有一个阶段是利用上下文相关规则,应用音位字符串的生成算法。利用句法信息以及转换的重音模式,生成的音位字符串需要进一步地计算从而把每个音位的正确形式口译成同位音。这样,参数生成器把已选择的音转换成合成器的语音参数,从而输出语音。 标准的发音算法阶段包括了几个过程。识别酸腐是以一个字符串中的一个字符为基础的。这个字符串可以从一个字符变化到三个字符。与每个串有关的是一个后缀和一个前缀字符。该字符可以是字母、其他字符、浊辅音字母、元音集等。最初,发音算法中的这个程序需要一个字母字符的。与每个字母字符有关的是一组规则。一旦已经确认了正确的规则组,那么必须从该规则组中找到这个特定的规则。这可通过研究字符串长度以及使前缀字符、第二个字符、第三个字符以及后缀字符与规则组中的入口匹配来实现。与每个规则入口有关的是一串音位。那么当从规则组中找到了相关的规则时,正确的音位串可以被通过到达发声区域。 图1 典型的文本语音转换系统 并行执行 文本到语音自动翻译中的许多程序,如文本标准化、特别发音规则的音位串的生成,标准的发音算法以及韵律应用可以同时执行。的确,在文本流中,与个别因素有关的发音算法中的程序应用可以同时运行。理论上,文本到语音自动翻译中处理的并发特性使它可以适合于用Occam编程以及在传输机上运行[5,6,7]。 单传输机系统 我们已决定研究标准发音算法的并发特性以及在传输机上的映射。研究的算法是以现今美式英语编辑的大约50,000字的标准文集库形成的规则为基础[8]。图2表示了已研发的初始系统。该系统是以处于IBM可兼容主机中的B004传输开发板为基础。我们用一个B004链接作为接口,通过C0011链接到语音集成处理器以及音频放大器。SP0256允许选择64 个音位且并入了模拟了声道的数字滤波器。 图2 单传输机系统 该系统的编程可利用Occam2(β+)来实现。该程序是模块化的且可选择命令。这些命令是根据简单的扩展、很好的编程实践、易用性以及易测试的观点提取出来的。 图3表示了该软件的流程图。该程序的第一个模块实现了用户接口。另一个模块剔除了无根据的字符,而缓存更新模块为后续的识别模块保留了充足的缓存空间。该识别模块是以可在一个字符串中的一个字母运行的算法为基础[9]。每个字母字符都与唯一的一组规则有关。该组的每个规则都包括了可匹配的一个字符串。这些字符的匹配可通过传输系统并行执行。元音、辅音、浊辅音、齿擦音、关联辅音、前元音和组合匹配的后缀及前缀字符也可以并行执行。在找到匹配值前,同时运行的程序可以连续地扫描规则组。这可通过使与每个规则组有关的跳变向量指向下个规则组的开始地址来执行。与该组每个规则有关的是它特殊的音位字符串标识。该程序的最后模块使链接上的音位字符串通过一个通道到达语音集成板。 不管有或没有音位串的实际输出到语音集成板,我们都引入定时器通道对程序的运行进行定时。对于不同长度的字符串及不同类型的字符,需要进行这些定时,以便能获得各音位输出间的可利用得计算时间的定量评估。该定时可用来进一步处理或者进行通信传输。通常,处理过程需要254.9个时钟周期/字符。包含音位输出的时间平均达到了1767.3个时钟周期/字符。 多重传输机系统 如此,在B004系统开发的程序被映射到图4所示的多重传输机上。该系统包括了有四个TRAM模块的B008集成块开发板,以及和前面一样的语音集成板。每个TRAMs有一个T414传输机。与两个TRAM模块相连的是1M字节RAM,其他两个是32K字节RAM。一个1M字节RAM是位于B008的连接点0并运行TDS。该程序的映射利用了处理器场模型的改编模型[10,11]。不同的处理器场模型假定工作器的线性布置是一列处理器的最优拓补,使用的模型允许数据并行传输到两个工作器。连接点0(监控器)的传输机充当传输机网络与微电脑主机间 图3 标准发音算法 的接口。它也可通过筛选输入文本以及为后续的串识别更新缓存来获得匹配的字符串。这是在位于连接点1(工作器1)和连接点3(工作器2)处的传输机同时执行的,从而使输入串与规则组中的规则匹配。至于运行速度,两个模块有所有规则组的复本。工作器1尝试匹配辅音/元音、浊辅音和齿擦音。工作器2尝试匹配前元音、组合音、关联辅音和单词匹配。这样,工作器1和2传输他们的匹配状态,传输正确或者错误到连接点2(管理器)的传输机。那么管理器确定是否已经查出后缀、前缀和单词的匹配项,把找到的匹配项和信号传回到工作器1和2。如果工作器2接受到正确信号,那么它输出音位串。如果错误信号被返回到工作器1和2,那么他们尝试匹配规则组中的下一个入口。通过使用链接通信的分布式程序的优先次序和信息长度的最大化,该系统的编程利用了多重处理器运行的最大化[12]。 图4 多重传输机系统 再次引入定时通道来获得处理时间的定量评估。通常,处理过程需要199.78个 时钟周期。 结果与讨论 根据从单传输机系统获得的时间,可以知道有86%的时间,传输机系统是闲置的, 等待语音集成部分来完成输出音位。在多重传输机系统的情况下,闲置时间进一步地 被提高了,由于更多的传输机,闲置时间超过了89%。我们发现由于语音集成芯片固 定的访问时间和音调变化的缺失,语音质量很好但是呆板。根据这些结果,我们可以 得出理论上传输机系统是适合这种类型的应用。在该领域,利用更精密的语音集成芯 片,更有效率的并行和形态规则形式实现重音和连音融合,这项工作将继续努力。利 用传输机模拟有声系统消除现在的语音合成器声音的呆板性,这项工作也在语音合成 的相关领域展开。 参考文献 [1] W.A. Ainsworth, "A system for converting English text into speech", IEEE Trans. Audio Electroacoust. Vol. AU-21, pp. 288-290. June 1974 [2] M.D. Mcllroy, "Synthetic English speech by rule".&I1 Tele. Lab.. Inc., Murray Hill, NJ, Mar. 1974 F.F. Lee. 'Reading machines : From text to speech", IEEE Trans. audio Electroacoust. Vol. AU-17, pp. 225-232, Dec 1969. [3] J. Allen, "Speech synthesis from unrestricted text". IEEE Conventions Digest, 1971, pp. 108-109 [4] D. Pountain. "Occam Occult, programming inparallel with Occam", PCW, pp 136. March 1986 I. Barron, P. Cauill and D. May, "Transputer does 5 or more MIPS even when not used in parallel". Electronics, pp. 109-113. November 17. 1983 [5] P. Wilson. "Occam architecture easa system design", Computer design, pp. 107-134. November 1983 [6]H. Kucera and W.N. Francis. "Computational analysis of present day. American English", Providencem RI : Brown Univ. Press, 1967 [7] H.S. Elouvitz et al, "Letter to sound rules for automatic translation of English text to phonetics", IEEE Trans. on Acoust. Speech and Sig. prof.,Vol. 24, N0.6, Dec 1976 [10] D. May and R. Shepherd. "Communicating Procup Computen", Inmos Technical Note 22, Inmos Ltd., Bristol 19871 [11] S . k Green and D. J. Paddon, "An extendm on the Processor Farm using a tree architecture".Proceedings 9th Technical Meeting of the Occam User Group, Bristol 1988 [12]P. Atkin. "Performance Maximisation". Inmos Technical Note 17. lnmoa Ltd.. Bristol 1987 PARALLXLISM AND THE TRANSPUTER I N THE AUTOHILTIC TRANSLATION OF TEXT TO SPEECH K.H. CURTIS, P. RACE AND A. ABDUL AZIZ Department of Electrical and Electronic Engineering University of Nottingham, England. ABSTRACT The automatic translation of text to speech is most useful when not restricted to pre-stored vocabulary. The potential quality of speech output has improved dramatically as the complexity of speech synthesis chips incorporate a greater number of control parameters. This improvement in quality necessitates greater computing resources for handling such processes as text normalisation and the applition of an allophone string generation algorithm. Many of the processes involved are ideally suited to be run concurrently. A multi-transputer system programmed in Occam has been constructed to perform this automatic text to speech conversion. INTRODUCTION In the past, the simplest approach to unrestricted text to speech translation used a small set of letter to word rules [1.2.3] each specifying a pronunciation for one or more letters in any one context. Unless this approach yielded sufficient intelligibility, the routine addition of text to spceeh translation to computer systems was unlikely since more elaborate approaches embodying large pronunciation dictionaries or linguistics use too much of the available sequential computing resources [4]. The recent introduction of complex speech synthesis chip sets incorporating such features as stress, speed, pitch, amplitude and prosody information has further compounded the processing problems involved in the translation. A typical method for performing text to speech conversion requires several computational stages as is shown in Figure 1. The text normalisation stage converts the text into a standard form of text suitable for further processing. This is achieved by expanding abreviations, handling punctuations and altering non-alphabetic charactem. Another stage is required for the generation of a phoneme string for the pronunciation of words that are exceptions to the pronunciation rule. Thew wrds require the use of a speehl dictionary. For words that have a standard pronunciation a stage is required for the application of a phoneme string generation algorithm using context sensitive ruks. The phoneme produced may require further computation to interpret the correct versions of each phoneme using syntactic information and stress pattern for conversion into allophones. The parameter generator then converts the selected sounds into the speezh parameters for the synthesiser to create speech. The standard pronunciation algorithm stage comprises revual proeeaccs. The recognition algorithm is bad around one character within a character string. This character string can vary from one to three characters long, also pssofiated with each string is a suffix and a prefix character. The characters can be letters, other characters, void consonants, vowel cluster etc. The process involved in the pronunciation algorithm initially requires the identification of a letter character. Associated with each letter character is a set of rules. Once the correct rule set has been identified, then the specific rule from within the rule set must be found. This is achieved by studying the character string lengths and matching the prefix character, second character, third character and suffix character to the entries within the rule set. Associated with each rule entry is a string of allophones, thus when the relevant rule has been found from within the rule set, the correct allophone string can be passed to the sound section. PARALLELISM IN THE TRANSLATION Many of the processes involved in the automatic translation of text to speech. such as text normalisation, exception to the pronunciation NIC phoneme string generation, standard pronunication algorithm and prosody applications can be carried out concurrently. Indeed the application of the processes involved within the pronunciation algorithm on individual elements within the text stream can be carried out concurrently. This concurrent nature of the processing involved in the automatic translation of text to speech makes it ideally suited to be programmed in Occam and run upon a transputer system [5,6,7] Single TransPuter SYSTEM It was decided to investigate the concurrent nature of the standard pronunciation algorithm and its mapping onto a transputer system. The algorithm investigated is based on rules developed around the 50,000 word standard corpus of present day edited American English [8]. The initial system developed is shown in Figure 2. The system is based upon a BOO4 transputer development board situated within an IBM compatible host computer. One of the BOO4 links is utilised to interface via a COO11 to the speech synthesis processor and audio amplifier. The SpoZ56 allow for the selection of 64 allophones and incorporates a digital filter that models the vocal tract. The programming of the system was achieved using Occam 2 (β+). The program is modular and menu driven from the point of view of easy expansion, good programming practice, care of use and care of testing. A flow diagram of the software is shown in Figure 3. The first module of the program performs the user interface. Another module strips out non-valid characters and a buffer update module keeps a buffer full for the subscqucnt recognition modules. The recognition modules are based around an algorithm operating upon a letter within a character string [9]. Each letter character has associated with it a unique set of rules, where each of the rules within the set contains a character string to be matched. These character matches are carried out in parallel by the transputer system as are the matches on the suffix and prefu characters for vowels, eonsonants, voiced consonants, sibilants, influencing consonants, front vowcb and assorted matches. The concurrently running procerues qucntially scan the rule set until the match is found. This is implemented by having a jump vector assofiated with each rule set pointing to the start location of that rule set. Associated with each rule in the set is its particular allophone string representation. The final module in the program passes the allophone string along a link via a channel to the speech synthesis board. A timer channel was introduced to time the execution of the program with and without the actual outputing of the allophone string to the speech synthesis board. These timings were carried out for strings of various lengths and types of characters, so as to be able to obtain a quantitative asscessment of available outputing time between allophone outputs. This time wuld be used for further processing or performing communication. On average it was found the processing took 254.9 timer clock periods per character. The time including allophone output averages out to 1767.3 timer clock periods per character. Multi-transputer -system The program developed on the Boo4 system was then mapped - onto a multi-transputer system as shown in Figure 4. The system Comprised a BOO8 transputer development having four TRAM modules, and the same speech synthesis board as before. The TRAMS each have a T414 transputer. Ascoeiated with two of the TRAM modules is 1 MByte of RAM and with the other two 32K Byte of RAM. One of the 1 Mbyte TRAMS is situated in slot 0 of the BOO8 and runs the TDS. The mapping of the program used an adaptation of the processor farm model [10,11]. Unlike the processor farm model which by its nature implicity assumes that a linear arrangement of workers is the optimal topology for a farm of processors. the model employed allows the passing of data to the two workers in parallel. The transputer in Slot 0 (the monitor) performs the interface between the transputer network and the microcomputer host. It also obtains the character strings to be matched by filtering the input text and updating the buffer; for subsequent string recognition. This is carried out concurrently by the transputers situated in slot 1 (worker one) and slot 3 (worker two), matching the input string to the rules within the rule set. For speed of execution both modules have a copy of all the rule sets. Worker one attempts to match on connsonant/vowel, voiced consonant and sibilant. Worker two attempts to match on front vowels, assorted oddments, influencing consonants and word match. Worker one and worker two then pass the status of their match, i.e. true or false to a transputer in slot 2 (manager). Manager then ascertains if a match has been detected for the suffix, prefix and word, ie match true and signals back to worker one and worker two. If a true is received by worker two, then it outputs the allophone string. If a false has been signalled back to worker one and worker two, then they both try and match on the next enry in the rule set. The programing of the system utilises maximisation of the multiprocessor performance by use of prioritisation of distributed programs communicating via links and the maximisation of message lengths [12]. A timer channel was again introduced to obtain a quantitative assessment of the processing time. On average it was found that the processing took 199.78 timer clock periods. DISCUSSION AND CONCLUSIONS From the timings obtained for the single transputer system it can be seen that for 86% of the time, the transputer system is idle waiting for the speech synthesis section to finish outputing allophones. In the case of the multi-transputer system. this idle time is further increased, as expected to 89% by the inclusion of more transputers. It was found that the speech quality was good though rather robotic due to the fixed accessing time of the speech synthesis chip and its lack of inflection. From thesc results, it can be concluded that a transputer system is ideally suited to this type of application. Work continues in this area using a more sophisticated speech synthesis chip, a more efficient form of parallelsim and morphological rules to facilitate the inclusion of stress and inflection. Work is also underway in the related area of speech generation using transputers to model the vocal system to eradicate the robotic nature of the sound from current speech synthesisers. REFERENCES [1] W.A. Ainsworth, "A system for converting English text into speech", IEEE Trans. Audio Electroacoust. Vol. AU-21, pp. 288-290. June 1974 [2] M.D. Mcllroy, "Synthetic English speech by rule".&I1 Tele. Lab.. Inc., Murray Hill, NJ, Mar. 1974 F.F. Lee. 'Reading machines : From text to speech", IEEE Trans. audio Electroacoust. Vol. AU-17, pp. 225-232, Dec 1969. [3] J. Allen, "Speech synthesis from unrestricted text". IEEE Conventions Digest, 1971, pp. 108-109 [4] D. Pountain. "Occam Occult, programming inparallel with Occam", PCW, pp 136. March 1986 I. Barron, P. Cauill and D. May, "Transputer does 5 or more MIPS even when not used in parallel". Electronics, pp. 109-113. November 17. 1983 [5] P. Wilson. "Occam architecture easa system design", Computer design, pp. 107-134. November 1983 [6]H. Kucera and W.N. Francis. "Computational analysis of present day. American English", Providencem RI : Brown Univ. Press, 1967 [7] H.S. Elouvitz et al, "Letter to sound rules for automatic translation of English text to phonetics", IEEE Trans. on Acoust. Speech and Sig. prof.,Vol. 24, N0.6, Dec 1976 [l0] D. May and R. Shepherd. "Communicating Procup Computen", Inmos Technical Note 22, Inmos Ltd., Bristol 19871 [ll] S . k Green and D. J. Paddon, "An extendm on the Processor Farm using a tree architecture".Proceedings 9th Technical Meeting of the Occam User Group, Bristol 1988 [12]P. Atkin. "Performance Maximisation". Inmos Technical Note 17. lnmoa Ltd.. Bristol 1987
/
本文档为【文本语音自动翻译的并行及传输机】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。 本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。 网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。

历史搜索

    清空历史搜索