为了正常的体验网站,请在浏览器设置里面开启Javascript功能!
首页 > Granger_Crosslinguistic_research

Granger_Crosslinguistic_research

2011-08-04 12页 pdf 83KB 26阅读

用户头像

is_094224

暂无简介

举报
Granger_Crosslinguistic_research 1 Comparable and translation corpora in cross-linguistic research Design, analysis and applications Sylviane Granger Centre for English Corpus Linguistics Université catholique de Louvain 1. Introduction The history of Contrastive L...
Granger_Crosslinguistic_research
1 Comparable and translation corpora in cross-linguistic research Design, analysis and applications Sylviane Granger Centre for English Corpus Linguistics Université catholique de Louvain 1. Introduction The history of Contrastive Linguistics has been characterized by a pattern of success-decline- success. Contrastive Linguistics (CL) was originally a purely applied enterprise, aiming to produce more efficient foreign language teaching methods and tools. Based on the general assumption that difference equals difficulty, CL, which in those days was called Contrastive Analysis (CA), consisted in charting areas of similarity and difference between languages and basing the teaching syllabus on the contrastive findings. Advances in the understanding of Second Language Acquisition (SLA) mechanisms led to a questioning of the very basis of CA. Interlingual factors were found to be less prevalent than other factors, among which intralingual mechanisms such as the overgeneralization of target rules and external factors such as the influence of teaching methods or personal factors like motivation. This led to the decline of CA, but not to its death. At first, it gave rise to some drastic pedagogical decisions, which in some cases culminated in a total ban of the mother tongue in FL teaching. But research (see Odlin 1989, Selinker 1992, James 1998) re-established transfer as a major – if not the major – factor in SLA, which in turn led to a progressive – albeit limited – return of contrastive considerations in teaching. More importantly, the questioning of the contrastive approach to FL teaching did not impede its extension to other fields. The globalisation of society led to an increased awareness of the importance of interlingual and intercultural communication and played a major role in the revival of CL. Another factor which helped boost contrastive studies was the emergence and rapid development of corpus linguistics and natural language processing, which are increasingly focusing on cross-linguistic issues. Large bilingual corpora gave contrastive linguists and NLP specialists a much more solid empirical basis than had ever been previously available. Previous research had been largely intuition- based. Vinay & Darbelnet (1958/1995) and Malblanc (1968) are well-known exemplars of this type of approach. As the authors had an excellent knowledge of the languages they compared, these books contain a wealth of interesting contrastive statements. However, intuitions can be misleading and a few striking differences can lead to dangerous over- generalisations. For instance, the absence in English of connectors corresponding to the French ‘or’ or ‘en effet’ has led to the general conclusion that French favours explicit linking while English tends to leave links implicit (Vinay & Darbelnet 1958: 222, Newmark 1988: 59, Hervey & Higgins 1992: 49). Like many others, this contrastive claim still awaits empirical investigation. Contrastive linguists now have a way of testing and quantifying intuition-based contrastive statements in a body of empirical data that is vastly superior – both qualitatively and quantitatively – to the type of contrastive data that had hitherto been available to them. The domain of Translation Studies (TS) underwent a similar corpus-based trend in the early 90s under the impetus of Mona Baker, who laid down the agenda for corpus-based TS (1993 and 1995) and started collecting corpora of translated texts with a view to uncovering the distinctive patterns of translation. Her investigations brought to light a number of potential ‘translation universals’ (Baker 1993) which further corpus studies are helping to confirm or 2 disprove (see Puurtinen 2007). Researchers in both CL and TS have thus come to rely on corpora to verify, refine or clarify theories that hitherto had had little or no empirical support and to achieve a higher degree of descriptive adequacy. Section 2 gives an overview of the types of corpus used in cross-linguistic studies and suggests a unified terminology. Section 3 presents the different types of corpus-based comparison and section 4 highlights the respective advantages and disadvantages of bilingual comparable vs translation corpora. Section 5 gives a brief overview of some of the applications of corpus-based cross-linguistic research and the last section offers some concluding remarks. 2. Corpora in cross-linguistic research In the corpus, scholars of contrastive linguistics and translation studies now have a common resource. Unfortunately, like in many new scientific fields, the terminology has not yet been firmly established, leading to a great deal of confusion. Contrastive linguists distinguish between two main types of corpus for use in cross-linguistic research: - corpora consisting of original texts in one language and their translations into one or more languages – let us call these translation corpora; - corpora consisting of original texts in two or more languages, matched by criteria such as the time of composition, text category, intended audience, etc. – let us call these comparable corpora. (Johansson & Hasselgård 1999). It should be noted however, that even among contrastive linguists the terminology is not entirely consistent. The term parallel corpus is sometimes used to refer to a comparable corpus (Aijmer et al 1996: 79, Schmied & Schäffler 1996: 41), a translation corpus (Hartmann 1980: 37) or a combined comparable/translation corpus (Johansson et al 1996).TS researchers, on the other hand, use the terms translation corpus, parallel corpus and comparable corpus to cover different types of texts. The term comparable corpus is used to refer to ‘two separate collections of texts in the same language: one corpus consists of original texts in the language in question and the other consists of translations in that language from a given source language or languages’ (Baker 1995: 234). The term translation (or translational) corpus is used to refer to the corpus of translated texts (see Baker 1999 and Puurtinen 2007). While in standard CL terminology, comparable corpora are usually multilingual (comparable original texts in different languages), in TS terminology they are usually monolingual (original and translated texts in the same language). Within the TS framework the term parallel corpus usually refers to ‘corpora that contain a series of source texts aligned with their corresponding translations’ (Malmkjaer 1998: 539), in other words what contrastive linguists usually refer to as translation corpora. Over and above the terminological difference, there is a more fundamental discrepancy between the two cross-linguistic approaches. In the TS framework, translated texts are considered as texts in their own right, which are analysed in order to “understand what translation is and how it works” (Baker 1993: 243). In the CL framework they are often presented as unreliable as the cross-linguistic similarities and differences that they help establish may be ‘distorted’ by the translation process, i.e. may be the result of interference from the source texts. 3 Faced with the terminological diversity that characterises current cross-linguistic research, I feel that unified terminology is desirable and would like to suggest the general typology illustrated in Figure 1. Multilingual Monolingual ComparableTranslation Comparable Uni- direction Translated texts Original texts Bi- directional Original and translated texts Native and learner texts Corpora in cross-linguistic research Figure 1: Corpora in cross-linguistic research In this typology, a primary distinction is made between multilingual and monolingual corpora. Multilingual corpora involve more than one language. They may be of two main types: (a) translation corpora (which contain source texts and their translations and may be unidirectional – from language X to language Z – or bi/multidirectional) and (b) comparable corpora (which contain non-translated or translated texts of the same genre). The monolingual corpora relevant for cross-linguistic research are all comparable corpora. They may contain (a) original and translated texts in one and the same language or (b) native and learner texts in one and the same language1. In this typology, the term parallel corpus is not used in view of its ambiguity in the literature, where it has been used to refer to corpora of source texts and their translations, comparable corpora or as a generic term to refer to any type of multilingual corpus (Teubert 1996: 245). This diagram does not include the many extralinguistic features that influence the data and therefore need to be carefully recorded, such as the translator’s status (professional or student) or the direction of the translation process (into the translator’s mother tongue or not). 3. Types of corpus-based comparison With these different corpus types, a variety of comparisons can be undertaken. Table 1 presents an overview of the different types of cross-linguistic comparison and the disciplines within which they are undertaken (see also Johansson 2007a) 1 For a description of this special type of contrastive research called Contrastive Interlanguage Analysis, see Granger 1996 and Gilquin 2000/2001. 4 Type of comparison Type of corpus Discipline 1. OLx ⇔ OLy Multilingual comparable corpus of original texts CL 2. SLx ⇔ TLy Multilingual translation corpus CL & TS 3. SLx ⇔ TLx Monolingual comparable corpus of original and translated texts TS & CL 4. TLx ⇔ TLy Multilingual comparable corpus of translated texts TS OL = original language SL = source language TL = translated language Table 1: Types of corpus-based cross-linguistic comparison The first type of comparison, between corpora of original texts in different languages (x and y), is the CL domain of expertise par excellence. However, there is a growing awareness among TS researchers of the interest of this type of research for translation studies. The second type of comparison is the most obvious meeting point between CL and TS. Researchers in both fields use the same resource but to different ends: uncovering differences and similarities between two (or more) languages for CL and capturing the distinctive features of the translation process and product for TS. The third type of comparison, which contrasts original and translated varieties of one and the same language, is the ideal method for uncovering the distinctive features of translated texts and hence seems at first sight to fall exclusively within TS. However, this type of comparison is increasingly being used by CL researchers who interpret differences between OL and TL as indirect evidence of differences between the languages involved (see Johansson & Hasselgård 1999 and Johansson 2007a). Finally, the comparison of translated varieties in different languages is quite clearly the prerogative of TS. However, it is essential that contrastive linguists pay attention to this type of study. Failing to properly understand the nature of translated texts might lead them to attribute some difference between OL and TL to interference from OL when in fact the phenomenon may simply be a manifestation of a translation universal. 4. Advantages and disadvantages of bilingual comparable and translation corpora Table 2 summarizes the advantages and disadvantages of the two main types of multilingual corpus: the comparable corpus and the translation corpus. It appears clearly from the table that what constitutes an advantage for one type of corpus constitutes a disadvantage for the other and vice versa. + / - Translation corpora Comparable corpora + Text type comparability L1-L2 equivalence Wide availability of texts Original language (reliable frequency and use) 5 - Limited availability of texts Translated language (translationese & translation universals) Text type comparability L1-L2 equivalence Table 2: Bilingual translation vs comparable corpora AVAILABILITY The most easily accessible corpora for cross-linguistic research are undoubtedly comparable corpora of original languages. English is particularly well equipped with large balanced corpora such as the British National Corpus or the Bank of English. For other languages, there are electronic text collections, notably newspaper archives, that are regularly used for cross- linguistic research, but they tend to be less representative than the English mega corpora. Less widespread languages may not have any corpus resources at all or access to them may be severely limited. As regards translation corpora, however, electronic resources are scarce. It is not always possible to find translations of all texts, either because of the text type – letters and e-mail messages, for instance, are not usually translated – or because there are more translations in one direction (English to Chinese, for instance) than in another (Chinese to English). Available translation corpora tend to include older, copyright free texts (cf. project Gutenberg2 which contains c. 30,000 free books) or alternatively, highly specialised texts such as documents from the European Union or the World Health Organization, the disadvantage of which is that it is often impossible to determine the source and target languages, a major variable for both CL and TS studies. While we are witnessing a rapid growth in the number of bilingual (and multilingual) resources, some of which can even be explored online, many high quality resources remain inaccessible to the academic community. This is the case, for instance, of the excellent English-Norwegian and English-Swedish corpora, which are only available to a limited group of researchers because of copyright strictions. dent in Chinese and which they decided to replace by a category of ‘martial arts ction’. re TEXT TYPE COMPARABILITY Translation corpora are an ideal resource for establishing equivalence between languages since they convey the same semantic content and are pragmatically and textually comparable (cf. James 1980: 178). In the case of comparable corpora, however, it is much more difficult to ensure text type comparability. Some types of text are culture-specific and simply have no exact equivalent in other languages. For example, when compiling the Lancaster Corpus of Mandarin Chinese (LCMC), McEnery & Xiao (2004) designed the corpus as an exact replica of the FLOB corpus to ensure comparability of the data. However, they encountered some difficulty, notably with the category of ‘western and adventure fiction” which has no exact correspon fi L1-L2 EQUIVALENCE Cross-linguistic comparison requires a “common platform of comparison” (Connor & Moreno 2005), a “background of sameness” (James 1980: 169) against which differences can be described. This constant, which is usually referred to as the tertium comparationis3 (TC), is relatively easy to establish in the case of translation corpora but constitutes a major stumbling block in the case of comparable corpora. In translation corpus studies, the TC is the relationship between a unit in the source language and its translation in the target language, 2 Cf. http://www.gutenberg.org 3 The term tertium comparationis has been used in a wide range of meanings in the contrastive literature. Connor & Moreno (2005), for instance, use the term TC for all levels of research, including the selection of corpora. 6 viz. translation equivalence. For example, in Aijmer’s (1999) study of epistemic modality in English and Swedish, the TC is the relationship between the English modal verb may and the corpus-attested equivalents in Swedish (modal verbs, modal adverbs or a combination of the two). With comparable corpora, however, there is no readily available tertium comparationis. And yet, researchers need to establish one if they want to make sure that they will compare like with like. As regards grammar, James (1980: 167) reminds us that “the fact that we use the labels ‘tense’ or ‘articles’ to refer to a certain grammatical category in two different languages should not be taken to mean that we are talking about the same thing”. It is therefore necessary to establish a basis for comparison. However, James (ibid: 168) hastens to point out that “comparability does not presuppose absolute identity, but merely a degree of shared similarity”. In the case of articles, the TC could be “a small class of function words that occur in pronominal position and seem to indicate the specificness or genericness of the noun” (ibid: 168). This is a thorny issue whatever the languages involved but the problem is particularly acute in the case of very different language systems, such as English and Chinese (cf. McEnery & Xiao’s 1999 comparison of aspect marking in English and Chinese). It is all the more important to establish a clear TC in areas such as phraseology where units such as ioms or collocations tend to be ill-defined. id RELIABILITY OF LANGUAGE Comparable corpora have the major advantage of representing original texts in the two (or more) languages under comparison, i.e. language spontaneously produced by native speakers of those languages. They are therefore in principle free from the influence of other languages4 and therefore arguably more reliable, especially to assess frequency and patterns of use. Translation corpora, on the other hand, display two main types of features that mark them off from original texts. On the one hand, they often contain features of what is usually referred to as ‘translationese’, i.e. “deviance in translated texts induced by the source language” (Johansson & Hofland 1994:26).5 On the other hand, they also display universal features, i.e. “features which typically occur in translated text rather than original utterances and which are not the result of interference from specific linguistic systems” (Baker 1993: 243). Gellerstam (1986) gives ample lexical evidence of translationese in translated Swedish. The main characteristics he lists are: a higher proportion of English loanwords, fewer colloquialisms, a higher frequency of standard ‘press-the-button’ translations of English words; and international words such as lokal, massiv, drastic used with new shades of meaning (for further examples of translationese, see Borin & Prütz 2001, Frankenberg-Garcia 2008, Wang & Qin 2008). In an interesting article, Rayson et al (2008) show how translationese can be detected fully automatically by comparing the frequencies of words and phrases in three ICT (Information and Communications Technology) corpora: a corpus of original Chinese texts, a corpus of translations of these texts into English by a proficient Chinese translator and a corpus of edited English, containing the versions of the Chinese translations corrected by a native speaker of English. The authors focus on multiword units and uncover interesting differences, 4 This is obviously not entirely true. Newspaper texts, for example, have often been found to contain traces of the (usually) English texts on which the journalists have based their articles. 5 The term ‘translationese’ is used in a range of meanings in contrastive and translation studies. It can be used in a neutral sense to refer to any source language-related feature that distinguishes translated language from original ack excludes language or in a clearly negative sense to refer to f
/
本文档为【Granger_Crosslinguistic_research】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。 本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。 网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。

历史搜索

    清空历史搜索