Title: Written Language Online
1Written Language Online
- Matt Garley
- University of Illinois at Urbana-Champaign
- LING 270 Language, Technology, and Society
- 23 April 2008
2Outline of this talk
- Languages in use on the Internet
- Language change on the Internet
- Crystal's Language and the Internet
- Thinking in terms of genre
3Languages in use on the Internet
- Obviously, English is not the only language on
the Internet - ButEnglish use is disproportionate to number of
native speakers online - Do people generally use their native language
online?
4The Hardy Boys and the Case of the Missing
Language
- http//www.glreach.com/globstats/index.php3
- Example Hindi is by some counts the 3rd most
commonly spoken language in the world, but - Hindi does not appear anywhere on Global Reach's
list of top 24 Internet Languages
5The Hardy Boys and the Case of the Missing
Language
- So, how to account for this discrepancy?
- Possibility 1 Relatively few internet
connections among speakers of Hindi. - Well, let's figure this one out with some
back-of-the-napkin math.
6The Hardy Boys and the Case of the Missing
Language
- CIA World Factbook reports 60 million internet
users in India (not the only place that Hindi is
spoken, but we'll have to work with it) - CIA World Factbook also reports that 30 of
India's population speaks Hindi as a 'primary
language' - So, we'll guesstimate about 18 million
Hindi-dominant Internet usersin India alone.
7The Hardy Boys and the Case of the Missing
Language
- So, Hindi estimated 18 million native speakers
on the Internet (probably a gross
undercalculation) - Listed on Global Reach's top internet languages
Catalan (2.9m online), Czech (4.2m online),
Finnish, (2.8m online)? - No Hindi!
8The Hardy Boys and the Case of the Missing
Language
- OK, so maybe Global Reach just overlooked all the
people who must be using Hindi online - Let's try another informal measure of language
participation online How many articles in the
Hindi Wikipedia? - http//meta.wikimedia.org/wiki/List_of_Wikipedias
There are fewer Wikipedia articles in Hindi than
there are in Latin, Breton, and Azeri.
9The Hardy Boys and the Case of the Missing
Language
- Clearly, Hindi is grossly underrepresented
online, but why? - Encoding Well, maybe in the early days, but
with Unicode, this shouldn't be a problem - Perception of English as modern or desirable to
use on the Internet now we're on to something!
10Social factors and language participation online
- India's relatively high rate of literacy in
English as well as Hindi may have something to do
with the lack of Hindi online - If the most-commonly-used language on the
Internet is English, and you speak English along
with your native language, you might choose to
use English online - This can lead to a self-reinforcing systema
'snowball effect'.
11Other factors and language participation online
- So what if, instead of Hindi, you speak a very
small minority language, perhaps even one with no
written form or a very obscure written form with
no encoding? - You're out of luck as far as using your native
language on most of the Internetbut you can
still Skype, Youtube, and other audio-based
applications in your native language
12Language participation online Summary
- Using Hindi as an example, we looked at reasons
why native languages, even very widely spoken
languages, might not be used online - Social factors preference of English as modern,
fashionable, or at least already in use online
this leads to 'snowball effect' - 'Structural' factors no technological interface
exists for some native languages.
13Onward to another aspect
- Another issue involving language and the Internet
is language change onlinedoes the Internet
change language? - YOUTUBE BREAK!
- http//www.youtube.com/watch?v6gmP4nk0EOE
14Video The web is us/ing us
- This video was really just about the Web (rather
than the multitude of services referred to as the
Internet), but it touched on a few issues
involving language change - Are we teaching the Machine, or is it teaching
us? All quasi-Orwellian nonsense aside, there
remains the following issue - Do certain technical restrictions (e.g. buffer
size) play a role in the way people use language
(Hint IDK!)?
15Acronyms
- One of the most commonly commented-on features of
'internet language' is the frequent use of
acronyms (IDK, LOL, WTF, IMHO, etc.)? - This is a major indicator of a kind of code which
David Crystal, in his book Language and the
Internet, calls 'Netspeak'
16Language and the Internet Netspeak
- There is a widely held intuition that some sort
of Netspeak exists--a type of language displaying
features that are unique to the Internet, and
encountered in all the above situations, arising
out of its character as a medium which is
electronic, global, and interactive. (Crystal,
18)? - On the next page, he goes on to claim that this
'Netspeak' is working its way into everyday
conversations. - http//www.youtube.com/watch?v4nIUcRJX9-o
17Netspeak in everyday speech?
- Consider the commercial we just saw it's funny
partially because it's absolutely absurdthis
isn't the way people talk. - Now, on to Chapter 2 Here, Crystal makes a very
important insight - A set of characters on a keyboard determines
productive linguistic capacity (the type of
information that can be sent) and the size and
configuration of the screen determines receptive
linguistic capacity (the type of information that
can be seen) (24)?
18Limitations of online media
- So we have one factor affecting languagethe
'hard' limitations of the technology. A keyboard
can only type certain characters (excepting
keymapping and so forth) and a screen can only
display a certain amount of text at once. - This will come in handy later, when we talk about
facets, but I want to talk more about Crystal's
book first.
19Distinctions between speech and writing
- An important distinction for Crystal is that
between traditional speech and writing. - For our talk about genres and facets later, I
want to pull the following concepts out of his
discussion State (Transient/Permanent), Time
(Synchronous/Asynchronous), Context
(extralinguistic content), Formality, Primary
functions, Errors, Unique features of the medium.
20Netspeak Speech or Writing?
- So, where does this 'Netspeak' fit in with regard
to speech and writing? - Table 2.3, Language and the Internet (41-42)?
- Crystal talks about Netspeak as a third medium
or novel medium combining spoken, written, and
electronic properties (48)?
21Summary up to this point
- Language on the Internet, or Netspeak is
different from other forms of language. - Language on the Internet is able to be
categorized as a whole - Netspeak is a sort of hybrid code which has
features of both spoken and written language and
features from the electronic medium - This is a new thing (novel, 48)?
22Well, maybe not...
- The Internet isn't the first communication medium
which combines spoken and written properties, as
well as factors like limited space. - What are some others?
- Notes (written between friends in class)?
- Telegrams
- License Plates
23Chapter 3 Finding an Identity
- Talking about prescriptive and descriptive
language use norms on the Internet, Crystal finds
that style guides disagree! (surprising?)? - And then, Crystal decides that it's time for him
to talk about hackers. - Recap This guy
is going to talk about hackers.
24Hackers (1995)?
25Anyway...
- Crystal talks about specific language hackers
use, like 'suit' to refer to a business executive
(specifically, a non-hacker) (p. 69)? - But-- the Oxford English dictionary notes usage
of this sense of 'suit' from 1979and mentions it
as business lingo, not hacker lingo... - Well, what about the other features of Netspeak,
like all the acronyms?
26Acronyms
- Table 3.2, p. 85-86. I'm no hacker or anything,
but I have been online for about 13 years now.
Let's see if there are any I don't recognize in
this big long list. - atw, awhfy, bbfn, bfd, bg, cfc, cfv, cm, cul, dk,
dur?, eod, f?, fotcl, fwiw, fya, g, gal, gdr,
gmta, gsoh, hhok, hth, icwum, imi, imnsho, iow,
jam, j4f, kc, khuf, mtfbwu, na, nc, nwo, obtw,
o4u, pmji, ptmm, rotf, sc, smote, sohf, sol, t,
ta4n, tafn, tia, tot, tttt, t2ul, ttytt, tuvm,
wadr, w4u, wtfigo, wu, wuwh, X!, Y! (Id be
surprised if these have any real currency
anywhere), yiu, 2bctnd, 2d4, 2g4u, 2l8, 4e, 4yeo. - I'm not saying these aren't ever used, but I
doubt all of them are as common as Crystal seems
to think.
27Acronyms, cont'd.
- Additionally, there are a few interesting
inclusions in his acronym list, so I looked a few
of these up in the OED to see when they are first
attested. - wrt 'With respect/regard to', 1956
- fyi 'For your information', 1941
- rip 'Rest in Peace', 1816
- iou 'I owe you', 1618 (!!)?
- Again, I'm no hacker, but I don't think the
Internet was around in 1618.
28Acronyms, cont'd
- Well, so what?
- Basically, the fact that these sorts of acronyms
were in use pre-Internet weakens the argument
that the Internet has changed language
dramatically. However, Crystals entire
argument should not be lostthese features are
used more often and more dramatically than
elsewhere, theyre just not necessarily new.
29Recap 2, Language and the Internet
- Language on the Internet, or Netspeak is
different from other forms of language. - Language on the Internet is able to be
categorized as a whole - Netspeak is a sort of hybrid code which has
features of both spoken and written language and
features from the electronic medium - This is a new thing (novel)?
- Netspeak is a new variety, and it will change the
way we think about language, beginning a
linguistic revolution - (Ch. 8, p. 238)?
30Other Perspectives
- Christa Dürscheid, a German researcher, refers to
this way of thinking about online language as
Mythos Netzsprache, or the Netspeak Myth.
Jannis Androutsopoulos, citing Dürscheid,
characterizes this concept of Netspeak as a
common way of talking about language online in
the 1990s, and considers it a sort of first
wave of studies. -
- This line of research, which is what Crystal
draws upon in his book, primarily relies on the
synchronous/asynchronous distinction, and the
unique features of online communication
emoticons and acronyms.
31Toward Genre
- What about the particular restrictions of the
Internet medium? Isnt there an argument to be
made there with regard to similarities in
Internet language? - It depends on whether were even talking about
one medium. Crystal divides the Internet in four
(while acknowledging that other modalities do
exist) - E-mail, chat, virtual worlds, and the Web.
What else is there, really? - Well, one good question is whether we can count
the Web as a single category Try comparing
Wikipedia, a political blog, Nike.com, a fansite
about Hannah Montana, and a Microsoft help site
32Problems with broad strokes
- Wikipedia, as a subcategory, is reasonably
consistent, but this is in part because of the
restrictions and guidelines imposed by the
community. If you take blogs as a subgenre of
CMC, you might not find much in common between
the writing of news blogs, photo blogs, and
personal journals.
33Problems with broad strokes, cont'd.
- This is the problem with the concept of
Netspeak The Internet is way too varied a
place to characterize as one medium, or even
four. Take the distinction between chatrooms and
instant messaging both are (near-) synchronous,
and look pretty similar, but IM almost always
involves only two users, while chatrooms are
multi-party discussions. IM is generally
conducted between friends, whereas chatrooms
often contain groups of complete strangers.
Consider the differences, also, within the
category of chatrooms the discourse in a
chatroom for an online course in library science
will be quite different from that of a
romance-themed chatroom for people in their 50s.
34Toward Genre
- So, given these problems, how do we study
language online in a reasonable way? - Answer Susan Herring (at Indiana University) and
her system of 'facets' - To simplify a bit, it's important to talk about
the Internet in terms of genres. A genre can be
summarized as something like 'personal
journal-type blogs' or 'wiki-style pages', but we
need a more reliable classification.
35Genres
- A nice way to think about genres
- Each genre has certain conventions. You're likely
to find a 'signature', like - Mike Dude, Ph. D.
- Assistant Professor, Department of Linguistics
- University of Illinois at Urbana-Champaign
- ...
- at the end of an e-mail or maybe a forum post,
but probably not at the end of - an instant message or blog post.
36Going back to limitations on the medium
- So, although it's not necessarily novel, and
certain genres have a lot in common with, say,
personal/want ads, license plates, etc. - Take the example of text messagingthere's a
limited buffer size, so there's pressure on the
user to make his/her messages shorter. One way
of doing this is through acronyms (IDK, my
BFF...)? - Thus, the form of the genre influences language
37Social factors Paolillo (2001)?
- Paolillo (2001) looks at the IRC chat room india
- Most of the chat was in English, but occasional
switch to Hindi or Punjabiwhen? - Greetings
- Jokes
- Situations of 'belonging' or 'cred'
- Also, 'Leetspeak' at one point (now it's mainly
used for comedic effect) marked belonging to an
'in-group'. - The things people want to accomplish with
language affect the use of language
38A faceted scheme for classification
- Take a genre of CMC, and look at two things
- 1) What influence does the medium have?
synchronicity - buffer size
- other limitations due to the system
- 2) What social factors are there?
- norms of communication
- topic of chat
- And so forth Herring calls these 'medium
factors' and 'situation factors' - http//www.languageatinternet.de/articles/2007/761
/index_html/
39Benefits of the Faceted Scheme
- Able to compare two genres of language online
with greater precisionavoid painting things too
broadly - Gets the researcher thinking about what specific
factors might influence various features of
language online
40Questions?