为了正常的体验网站,请在浏览器设置里面开启Javascript功能!

CLEC中国英语学习者语料库

2017-09-26 15页 doc 63KB 705阅读

用户头像

is_842972

暂无简介

举报
CLEC中国英语学习者语料库CLEC中国英语学习者语料库 CLEC收集了包括中学生、大学英语4级和6级、专业英语低年级和高年级在内的5种学生的语料一百多万词,并对言语失误进行标注。其目的就是观察各类学生的英语特征和言语失误的情况,希望通过定量和定性的方法对中国学习者英语作出较为精确的描写,为我国学生的英语教学提供有用的反馈信息。 表1 CLEC语料分布 类型 词次 ST2 208088 ST3 209043 ST4 212855 ST5 214510 ST6 226106 总计 1070602 言语失误标注 原则 1. 简单合理,易...
CLEC中国英语学习者语料库
CLEC中国英语学习者语料库 CLEC收集了包括中学生、大学英语4级和6级、专业英语低年级和高年级在内的5种学生的语料一百多万词,并对言语失误进行标注。其目的就是观察各类学生的英语特征和言语失误的情况,希望通过定量和定性的方法对中国学习者英语作出较为精确的描写,为我国学生的英语教学提供有用的反馈信息。 1 CLEC语料分布 类型 词次 ST2 208088 ST3 209043 ST4 212855 ST5 214510 ST6 226106 总计 1070602 言语失误标注 原则 1. 简单合理,易于系统操作。参与标注的人比较多,分类表过于繁复,就难于掌握。我们采取两级分类,第一级有11类:词形(fm)、动词短语(vp)、名词短语(np)、代词(pr)、形容词短语(aj)、副词(ad)、介词短语(pp)、连词(cj)、词汇(wd)、搭配(cc)、句子(sn)。每一类里再用数目字细分。如[cc]为词语搭配不当,[cc1]表示名词和名词的搭配,[cc2]表示名词和动词的搭配,[cc3]表示动词和名词的搭配,等等。 2. 分类表的类别要适中。过粗容易统一,但信息太少,不利于分析学习者的失误/过细难以统一,容易把同一种失误归到不同类别。目前我们采取的办法是对常见的失误从细(如vp和np都有9小类),对少见的失误从粗(如cj只有两小类)。现在的分类表有61个失误码,是属于中等规模的分类表。 提供足够的失误信息(失误本身、失误类型和失误发生范围)。例如In the past, [vp6, 4-] kind to each other…, 失误用方括号表示,放在失误people are 之后。 [vp6]为vp(动词)第6种(时态)失误,4-为失误发生的范围,-表示失误的位置,4表示失误前有4个词。要联系这4个词,才能判断are这个词用错了。 开放性。容许研究者根据需要对失误类型进行补充或进一步再分出细类。例如[sn8]为句子结构有缺陷,研究者可以对这种失误再分为若干细类来研究。这需要把sn8的失误全部检索出来,然后定出第三级的分类范畴,如sn81,sn82,等等。 5. 对语体或失误的来由暂不作标注,因为这需要标注者较多的主观判断,更难以统一。 言语失误分类表(总数:61) 词形 动词短语 名词短语 代词 码 类型 码 类型 码 类型 码 类型 Spelling vp1 pattern np1 pattern pr1 Reference fm1 fm2 word building vp2 set phrase np2 set phrase pr2 anticipatory it fm3 capitalization vp3 agreement np3 agreement pr3 Agreement vp4 finite/non-finite np4 case pr4 Case vp5 non-finite np5 countability pr5 wh- vp6 tense np6 number pr6 Indefinite vp7 voice np7 article vp8 mood np8 quantifiers vp9 modal/auxiliary np9 other determiners 形容词短语 副词 介词短语 连词 码 类型 码 类型 码 类型 码 类型 pattern ad1 order pp1 pattern cj1 pattern aj1 aj2 set phrase ad2 modification pp2 set phrase cj2 set phrase aj3 degree ad3 degree aj4 -ed/-ing confusion aj5 predicative /attributive 词语 搭配 句子 码 类型 码 类型 码 类型 order cc1 noun/noun sn1 run-on wd1 sentence wd2 part of speech cc2 noun/verb sn2 sentence fragment wd3 substitution cc3 verb/noun sn3 dangling modifier wd4 absence cc4 adj/noun sn4 illogical comparison wd5 redundancy cc5 verb/adv sn5 topic prominence wd6 repetition cc6 adv/adj sn6 Coordination wd7 ambiguity sn7 Subordination sn8 structural deficiency sn9 Punctuation 标注说明 码 分 类 类 别 说 明 fm1 word Spelling(拼写) spelling, coinage, abbreviation, apostrophe fm2 word word buildingderivation, inflection, compounding, (构词) plurality (noun), irregularity(verb), 3rd person singular form(verb), syllabification, hyphenation, word division or fusion fm3 word Capitalizationlower initial letter for upper initial (大小写) letter or vice versa vp1 vb phr Pattern(及物性error in transitivity(vi as vt or vice 型式) versa), transitive verb pattern/ grammatical(cf Oxford advanced learner’s dictionary of current edited by A. S. Hornby) English vp2 vb phr set phrase(固定phrasal verb and verbal phrase: error 词组) in form or use vp3 vb phr Agreement(主谓number agreement with its subject 一致性) (noun or pronoun) vp4 vb phr finite/non-finifinite verb for non-finite verb or vice te(定式) versa vp5 vb phr non-finite(不定infinitive error: form and use/ 式) infinitive for participle or vice versa/ -ed participle for -ing participle or vice versa vp6 vb phr Tense(时态) error in tense use within a sentence/ the sequence of tenses between sentences vp7 vb phr voice (语态) error in the use of voice: active for passive or vice versa vp8 vb phr Mood(语气) error in the use of mood: imperative, subjunctive/ improper structure of conditional sentences vp9 vb phr modal/auxiliarymisuse of modal/auxiliary verbs/ wrong (情态) form of modal verb(or auxiliary verb) and verb combination (e.g tense form, voice form, etc) np1 nn phr Pattern(名词型Error in combination with other 式) words/grammatical np2 nn phr set phrase(固定omission or replacement of a fixed 词组) element that goes after a certain noun np3 nn phr Agreement(主谓number agreement of a noun with its 一致性) determiner or a word that refers to it np4 nn phr Case(格) possessive case error: form or use np5 nn phr Countability(可uncountable noun used as countable 数性) noun np6 nn phr Number(数) countable noun used with no determiner or -s/ a or -s with plural noun np7 nn phr Article(冠词) confusion or a/andefinite/indefinite confusion np8 nn phr Quantifiers(数misuse or confusion between many/much, 量词) , etc (a) few/(a) little, some/anynp9 nn phr other misuse or confusion of demonstratives, determiners(其wh- determiners, numerals, etc. 他限定词) pr1 pron Reference(指称) incorrect/ambiguous pronoun reference/anaphoric pr2 pron anticipatory itimproper or wrong use of anticipatory it / it replaced by a demonstrative, (先行it) etc pr3 pron Agreement(主谓number agreement with a noun it refers 一致性) to pr4 pron Case(格) case error of any personal pronoun pr5 pron wh-(wh-代词) misuse or confusion of interrogative, relative and conjunctive pronouns pr6 pron Indefinite(不定misuse or confusion of indefinite 式) pronouns such as all/both, few/little, some/any, either/neither, etc aj1 adj Pattern(形容词error in the combination with other 型式) words/grammatical aj2 adj set phrase(固定error in the idiomatic use of an 词组) adjectival phrase/ omission or replacement of a fixed element that goes after a certain adjective aj3 adj Degree(级) adjective degree error: form and use aj4 adj -ed/-ing -ed adjective for -ing adjective or confusionvice versa (-ed/-ing混淆) aj5 adj predicative/attpredicative adjective used as ributive(谓语/attributive adjective 定语) ad1 adv Order(词序) improper adverb placement/wrong position ad2 adv Modification(修adjective modifier used as verb 饰语) modifier/ other kinds of confusion ad3 adv Degree(级) adverb degree error: form and use pp1 prep Pattern(介词型unacceptable combination with other 式) words/grammatical pp2 prep set phrase(固定error in the formation or use of an 词组) idiomatic prepositional phrase cj1 conj Pattern(连词型unacceptable combination with other 式) words/grammatical cj2 conj set phrase(固定error in the formation or use of a 词组) phrase functioning as a conjunction wd1 word Order(词序) misplacement of any word other than an adverb wd2 word part of speecherror in part of speech: right root but (词类) wrong word class wd3 word Substitution(替error in word choice: right word class 代) but wrong selection (any part of speech) wd4 word Absence(缺少) omission of a word(any part of speech) wd5 word Redundancy(冗oversuppliance of a word(any part of 余) speech) wd6 word Repetition(重unnecessary repeating of a word 复) wd7 word Ambiguity(歧义) not clear word meaning/semantic cc1 notionan/n collocationimproper noun(phrase) and l (名词/名词) noun(phrase) combination/semantic cc2 notionan/v collocationimproper noun(phrase) and l (名词/动词) verb(phrase) combination/semantic cc3 notionav/n collocationimproper verb and noun(phrase) l (动词/名词) combination/semantic cc4 notionaa/n collocationimproper adjective and noun(phrase) l (形容词/名词) combination/semantic cc5 notionav/ad improper verb and adverb (or ad/v) l collocation(动combination/semantic 词/副词) cc6 notionaad/a improper adverb and adjective l collocation(副combination/semantic 词/形容词) sn1 sentencrun-on sentenceimproper addition of clauses/fused e (不断句) sentence sn2 sentencsentence subordinate clause as a sentence/ any e fragment(片段) phrase as a sentence sn3 sentencdangling illogical adverbial modification of a e modifier(垂悬修clause 饰语) sn4 sentencillogical error in the comparison of words or e comparison(比较phrases in a sentence which can not be 不符合逻辑) compared sn5 sentenctopic the co-occurrence of an initial noun e prominence(主题phrase and its equivalent(usually a 突出) pronoun) in the same sentence sn6 sentencCoordination(并faulty parallelism of clauses (or e 列) words/phrases) in a sentence sn7 sentencSubordinationfaulty attachment of a subordinate e (主从) clause to the main clause sn8 sentencstructural error in the grammatical construction e deficiency(结构of a sentence: improper splitting, 缺陷) pattern shifting, confusing structure, etc sn9 sentencPunctuation(标overuse, absence, choice, apostrophe, e 点符号) comma splice, etc. 标准化处理后的各种失误频数及其比例 失误类型 总计 百分比(%) st2 st3 st3 st4 st5 1686. fm1 1928.8 2877.4 2112.6 1826.7 7 10432.2 17.47 fm2 349.3 448.9 438.9 226.9 328.7 1792.7 3 fm3 1474.4 731.8 405.8 694.1 174.6 3480.7 5.83 vp1 259.4 325.9 498.4 103.4 200.8 1387.9 2.32 vp2 179 139.3 61.2 104.2 22.1 505.8 0.85 vp3 374 524.6 785.2 273.1 327 2283.9 3.82 vp4 140.8 159.1 110.8 63.9 51.6 526.2 0.88 vp5 140 118.7 107.4 89.9 46.7 502.7 0.84 vp6 1165.7 356 311.6 379.8 215.6 2428.7 4.07 vp7 172.7 104.1 98.4 63.9 46.7 485.8 0.81 vp8 27.1 16.3 8.3 25.2 11.5 88.4 0.15 vp9 111.4 274.3 278.5 42.9 86.1 793.2 1.33 np1 46.9 33.5 28.9 16.8 10.7 136.8 0.23 np2 24.7 22.4 17.4 19.3 2.5 86.3 0.14 np3 202.1 247.7 249.6 210.9 186 1096.3 1.84 np4 66.8 55.9 26.4 22.7 21.3 193.1 0.32 np5 58.9 98 71.9 60.5 84.4 373.7 0.63 np6 374 654.4 481 358.8 354.1 2222.3 3.72 np7 237.9 107.5 89.3 174.8 54.9 664.4 1.11 np8 35 65.4 47.9 13.4 7.4 169.1 0.28 np9 6.4 41.3 12.4 7.6 5.7 73.4 0.12 pr1 82 236.5 205 89.9 18.9 632.3 1.06 pr2 16.7 78.3 23.1 4.2 0 122.3 0.2 pr3 52.5 54.2 172.7 28.6 60.6 368.6 0.62 pr4 74.8 37 20.7 48.7 10.7 191.9 0.32 pr5 26.3 53.3 14.1 7.6 10.7 112 0.19 pr6 9.5 2.6 5 3.4 0 20.5 0.03 aj1 6.4 18.9 15.7 5 9 55 0.09 aj2 9.5 3.4 9.9 5.9 7.4 36.1 0.06 aj3 38.2 39.6 32.2 43.7 97.5 251.2 0.42 aj4 16.7 2.6 22.3 12.6 5.7 59.9 0.1 aj5 0.8 3.4 7.4 1.7 0 13.3 0.02 ad1 35.8 96.3 39.7 27.7 15.6 215.1 0.36 ad2 42.2 37.8 12.4 9.2 4.9 106.5 0.18 ad3 7.2 12 9.9 1.7 2.5 33.3 0.06 pp1 136.1 98 43 169.7 28.7 475.5 0.8 pp2 25.5 262.3 143.8 37 27.9 496.5 0.83 cj1 27.8 20.6 18.2 21.8 12.3 100.7 0.17 cj2 4 7.7 13.2 5.9 4.9 35.7 0.06 Wd1 43.8 151.3 114.1 25.2 37.7 372.1 0.62 Wd2 324.6 929.6 772.8 226.9 242.6 2496.5 4.18 Wd3 1102 1634.7 1815 757.1 359.8 5668.6 9.49 Wd4 585.6 829.8 443.8 403.3 427 2689.5 4.5 Wd5 410.6 613.1 518.2 265.5 171.3 1978.7 3.31 Wd6 27.1 37 22.3 34.5 29.5 150.4 0.25 Wd7 261.8 430.8 261.2 228.6 209.8 1392.2 2.33 cc1 72.4 65.4 76 23.5 36.1 273.4 0.46 cc2 35 177.1 49.6 6.7 21.3 289.7 0.49 Cc3 168.7 514.2 417.4 75.6 112.3 1288.2 2.16 Cc4 64.5 94.6 134.7 42 39.3 375.1 0.63 Cc5 23.9 40.4 29.8 5 4.1 103.2 0.17 Cc6 17.5 12 6.6 2.5 1.6 40.2 0.07 Sn1 419.3 596.8 576.9 118.5 42.6 1754.1 2.94 Sn2 424.9 389.6 303.3 132.8 76.2 1326.8 2.22 Sn3 10.3 20.6 17.4 2.5 10.7 61.5 0.1 Sn4 17.5 24.9 6.6 20.2 4.9 74.1 0.12 Sn5 9.5 14.6 17.4 2.5 4.9 48.9 0.08 Sn6 84.3 41.3 39.7 41.2 1.6 208.1 0.35 Sn7 49.3 55.9 63.6 23.5 3.3 195.6 0.33 Sn8 1103.6 446.3 862.1 493.2 231.9 3137.1 5.25 Sn9 861.7 573.6 337.2 649.5 322.9 2744.9 4.6 6633. 总计 14105.2 16160.6 13935.9 8883.4 8 59718.9 100 按大类区分言语失误排列表 总计 百分比 累积百分比 st2 st3 st4 st5 st6 词形 3752.5 4058.1 2957.3 2747.7 2190 15705.6 26.299 26.299 词汇 2755.5 4626.3 3947.4 1941.1 1477.7 14748 24.696 50.995 句法 2980.4 2163.6 2224.2 1483.9 699 9551.1 15.993 66.988 动词 2570.1 2018.3 2259.8 1146.3 1008.1 9002.6 15.075 82.063 名词 1052.7 1326.1 1024.8 884.8 727 5015.4 8.398 90.461 搭配 382 903.7 714.1 155.3 214.7 2369.8 3.968 94.429 代词 261.8 461.9 440.6 182.4 100.9 1447.6 2.424 96.853 介词 161.6 360.3 186.8 206.7 56.6 972 1.628 98.481 形容词 71.6 67.9 87.5 68.9 119.6 415.5 0.696 99.177 副词 85.2 146.1 62 38.6 23 354.9 0.594 99.771 连词 31.8 28.3 31.4 27.7 17.2 136.4 0.228 99.999 总计 14105.2 16160.6 13935.9 8883.4 6633.8 59718.9 99.999 百分比 0.24 0.27 0.23 0.15 0.11 中国学习者最常见的言语失误 类型 st2 st3 st4 st5 st6 总计 百分比 fm1 1928.8 2877.4 2112.6 1826.7 1686.7 10432.2 17.47 wd3 1102 1634.7 1815 757.1 359.8 5668.6 9.49 fm3 1474.4 731.8 405.8 694.1 174.6 3480.7 5.83 sn8 1103.6 446.3 862.1 493.2 231.9 3137.1 5.25 sn9 861.7 573.6 337.2 649.5 322.9 2744.9 4.6 wd4 585.6 829.8 443.8 403.3 427 2689.5 4.5 wd2 324.6 929.6 772.8 226.9 242.6 2496.5 4.18 vp6 1165.7 356 311.6 379.8 215.6 2428.7 4.07 vp3 374 524.6 785.2 273.1 327 2283.9 3.82 np6 374 654.4 481 358.8 354.1 2222.3 3.72 wd5 410.6 613.1 518.2 265.5 171.3 1978.7 3.31 fm2 349.3 448.9 438.9 226.9 328.7 1792.7 3 sn1 419.3 596.8 576.9 118.5 42.6 1754.1 2.94 wd7 261.8 430.8 261.2 228.6 209.8 1392.2 2.33 vp1 259.4 325.9 498.4 103.4 200.8 1387.9 2.32 sn2 424.9 389.6 303.3 132.8 76.2 1326.8 2.22 cc3 168.7 514.2 417.4 75.6 112.3 1288.2 2.16 np3 202.1 247.7 249.6 210.9 186 1096.3 1.84 vp9 111.4 274.3 278.5 42.9 86.1 793.2 1.33 np7 237.9 107.5 89.3 174.8 54.9 664.4 1.11 pr1 82 236.5 205 89.9 18.9 632.3 1.06 从上表可看出, 1. 词形的3种失误(拼写、构词、大小写)均在其中,而拼写更 是居榜首,占失误中的17.47%。3种失误合并共占20.57%。 2. 词汇失误7种中有5种(替代、缺少、词类、冗余、歧义), 占失误中的23.81%。 3. 句法失误9种中有4种(结构缺陷、标点符号、不断句、片段), 占失误中的15.01%。 4. 动词词组9种中有4种(时态、主谓不一致、及物性、情态), 占失误中的11.54% 5. 名词词组9种中有3种(数、主谓不一致、冠词),占6.67%。 6. 其他失误(动词/名词搭配、代词指称),占3.22%。 中国学习者最常见拼写失误表 频数 词 频数 词 频数 词 频数 词 379 MORTALITY 23 THEMSELVES 15 LIMITED 12 WRITING 113 KNOWLEDGE 21 FESTIVAL 15 NOTICE 11 ARTICLE 78 POLLUTION 20 BELIEVE 15 OURSELVES 11 CONTRARY 76 ENVIRONMENT 20 COUNTRY 15 PERSONNEL 11 EXERCISE 69 NOWADAYS 19 ESPECIALLY 15 STUDENTS 11 FAVORITE 68 GOVERNMENT 19 FAMILIAR 14 CALENDAR 11 INSTEAD 56 MODERN 19 REMEMBER 14 CAUGHT 11 MASTER 44 PRACTICE 18 COURSE 14 CENTURY 11 PARENT 44 SOMETHING 18 EXERCISES 14 COMPETITION 11 PRACTISE 41 POLLUTED 18 HASTILY 14 FIRST 11 RESOURCE 37 BEAUTIFUL 18 INDUSTRY 14 FURTHERMORE 11 TRAVEL 36 COUNTRIES 18 OFTEN 14 MAGAZINES 10 CONDITION 36 STUDYING 18 SEVERAL 14 MEDICINE 10 DECREASED 35 CHALLENGE 18 TRADITIONAL 14 UNIVERSITY 10 ENERGY 34 TECHNOLOGY 17 CREATE 13 FINANCIAL 10 HAPPINESS 32 BENEFIT 17 GRAMMAR 13 GREAT 10 INDIVIDUALS 32 EUTHANASIA 17 NECESSARY 13 MOREOVER 10 PURSUE 30 BECAUSE 17 PEOPLE 13 OPPORTUNITY 10 RAISE 28 LANTERNS 17 SATURDAY 13 PRACTICAL 10 SHOULD 28 REALIZE 17 THEORETICAL 13 RECEIVED 10 SUCCESS 27 COLLEGE 17 THOUGHT 13 YOURSELF 10 THEREFORE 26 INTERESTING 16 CONTROL 12 EXPECTANCY 10 TRAVELING 25 COMMODITIES 16 CONVENIENT 12 FACTORIES 10 WASTE 25 LANTERN 16 POPULATION 12 OPPORTUNITIES 10 WHETHER 25 SUDDENLY 16 WILLIAM 12 PRACTICES 24 IMPORTANT 15 BEGINNING 12 TRANSPORTATION 中国学习者词汇失误表 失误类 型 St2 St3 St4 St5 St6 总计 百分比 Wd1 43.8 151.3 114.1 25.2 37.7 372.1 0.62 Wd2 324.6 929.6 772.8 226 .9 242.6 2496.5 4.18 Wd3 1102 1634.7 1815 757.1 359.8 5668.6 9.49 Wd4 585.6 829.8 443.8 403.3 427 2689.5 4.5 Wd5 410.6 613.1 518.2 265.5 171.3 1978.7 3.31 Wd6 27.1 37 22.3 34.5 29.5 150.4 0.25 Wd7 261.8 430.8 261 .2 228.6 209.8 1392.2 2.33
/
本文档为【CLEC中国英语学习者语料库】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。 本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。 网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。

历史搜索

    清空历史搜索