Brackets
IPA symbols appear in square brackets […] or between slashes /…/. Slashes indicate a phonemic transcription; brackets a phonetic one. For most singing purposes the two are used interchangeably.
Symbols, sounds, and the languages singers sing — a free interactive reference for singers, voice teachers, and choral directors.
The International Phonetic Alphabet is a system in which one symbol represents one sound. It was developed by linguists in the late nineteenth century to give scholars a way to discuss the sounds of any human language without relying on the inconsistent spelling conventions of individual languages.
For singers, the IPA does something specific and practical: it names the exact vowel or consonant the singer is aiming at, independent of how the word is spelled. English spelling is famously unreliable — though, through, cough, and bough each end with a different vowel or consonant sound despite looking similar on the page. IPA removes the ambiguity.
Italian, German, French, Spanish, Latin, Russian, Czech, and Hebrew each have their own spelling-to-sound patterns, and learning every one of them from scratch would be overwhelming. IPA gives the singer one system that works across all of them.
IPA does not replace the ear — it sharpens what the ear is already doing. It does not prescribe how a sound should be colored, shaped, or styled. That remains the singer's choice, the composer's intent, and the tradition's convention. What IPA does is name the target.
IPA symbols appear in square brackets […] or between slashes /…/. Slashes indicate a phonemic transcription; brackets a phonetic one. For most singing purposes the two are used interchangeably.
A primary stress mark ˈ appears before the stressed syllable. hello = [hɛˈloʊ]. A secondary stress mark ˌ indicates a lighter secondary stress in longer words.
A period marks a syllable boundary: [ˈka.ro]. The length mark ː means the vowel is held longer — important in German, where [iː] and [ɪ] are distinct phonemes.
A tilde above a vowel ([ã]) means the vowel is nasalized — central to French. Small marks above or below modify a symbol in specific ways. A full list lives in the Reference section.
IPA names sounds. How those sounds are colored — how bright or dark, how forward or back, how open or closed — is a matter of style, genre, language tradition, and artistic intent.
Several situations cause honest disagreement among singers and coaches. None has a single right answer; all benefit from awareness.
Italian has four vowels where English-speakers often hear two: closed [e] and open [ɛ]; closed [o] and open [ɔ]. The score does not usually mark which is which. Published transcriptions are the most reliable source. When working without one, a good Italian dictionary (or Castel's transcriptions) will show the distinction. Regional Italian accents vary; the standard operatic convention tends toward Tuscan pronunciation.
Singers often encounter three different ah vowels across languages: [a] (bright, front), [ɑ] (dark, back), and [ä] (central, between the two). Italian and Spanish casa uses [a]. English father uses [ɑ]. German Vater is often closer to [ä]. Classical choral tradition often defaults to [a] as a shared ensemble target regardless of language, but this is a convention, not a universal rule.
English can use any of four r-sounds depending on genre, tradition, and period. German can use any of three. French has two that often coexist in the same piece of repertoire. The choice is determined by style and tradition, not by the letter on the page.
Spanish differs substantially between Castilian and Latin American varieties. German pronunciation in Bach-era repertoire differs from modern German. French sung diction in classical repertoire retains features modern spoken French has abandoned. When a score or program identifies a specific tradition, follow it; when it does not, follow the mainstream convention for the genre.
Two charts, every vowel and consonant covered in this guide. Click any symbol to hear it.
A map of tongue position. Horizontal axis shows front-to-back tongue placement; vertical axis shows how open the mouth and jaw are. Gold symbols are rounded (including the "mixed vowels" of French and German).
The single English word "ah" can refer to three distinct IPA targets. This is one of the most common sources of confusion in sung diction.
Some languages — most notably French and German — produce front rounded vowels, combining front tongue position with rounded lips. They are often called mixed vowels because they mix features of both. Singers approaching them typically start by producing the vowel's tongue position first, then gradually rounding the lips without letting the tongue move backward.
A diphthong is two vowel qualities occurring within a single syllable, connected by a glide. One vowel carries the sustained pitch; the other appears as a quick release.
Sustain [ɑ], release to [ɪ].
Sustain [e], release to [ɪ].
Sustain [o], release to [ʊ].
Sustain [ɑ], release to [ʊ].
Sustain [ɔ], release to [ɪ].
Non-rhotic English.
Non-rhotic English.
Non-rhotic English.
Timing is a stylistic choice. Classical and choral traditions typically delay the second vowel until the very end of the note. Musical theater and pop often release earlier or blend the two vowels more smoothly. Barbershop and close-harmony ensembles unify diphthong timing precisely, often with a specific count or gesture.
Consonants are classified by manner of articulation (how the airstream is shaped) and place of articulation (where in the vocal tract the constriction occurs). Within each cell, the symbol on the left is unvoiced; the symbol on the right is voiced.
The letter r represents strikingly different sounds across languages, styles, and historical periods. All four principal variants below are legitimate within their contexts. The choice is determined by the language, the repertoire, the ensemble, and the tradition — not by a universal rule.
A consonant marked with a small vertical line below ([l̩], [n̩], [m̩]) functions as the nucleus of a syllable — carrying the syllable's weight without a separate vowel. English: bottle = [ˈbɒt.l̩]; button = [ˈbʌt.n̩]. German uses these in -en endings.
A complete closure of the vocal folds producing a brief silence, released into the following vowel. English uh-oh = [ˈʔʌʔoʊ]. In careful German sung diction it appears before every word-initial vowel: ich atme sung as [ɪç ˈʔaːt.mə], not [ɪˈça:tmə]. Hebrew uses the glottal stop freely (the letter aleph).
Nine languages, from the cleanest sung diction (Italian, Spanish) to the most intricate (Russian, Czech, Hebrew). Each panel covers the vowels, consonants, distinctive conventions, and example transcriptions you'll meet in the repertoire.
English spelling is unreliable and English sounds vary widely across dialects and genres. What follows is the mainstream sung treatment, with notes where specific genres diverge.
Italian is the cleanest of the major sung languages for IPA purposes: spelling corresponds closely to sound, vowels are pure and stable, and the relationship between letters and sounds is nearly one-to-one.
German sung diction requires precision: vowel length is phonemic, final consonants devoice, and word-initial vowels receive a glottal onset. These details shape the characteristic clarity of German art song and opera.
French sung diction emphasizes vowel purity, nasalization, and the smooth legato flow that distinguishes the French melodic tradition. Liaison and elision bind words into phrases; the mute e rises into voice where speech would drop it.
Spanish shares with Italian the virtue of a clean five-vowel system and highly regular spelling. The main complications are the regional differences between Castilian (Spain) and Latin American varieties.
Latin in singing uses one of three pronunciation traditions, each appropriate to different repertoire. Identify the tradition before marking the score; the same word is pronounced differently in each.
Russian appears in art song (Rachmaninoff, Tchaikovsky, Mussorgsky), opera, and Orthodox sacred choral repertoire. The Cyrillic alphabet requires transliteration, and the sound system introduces features new to most singers.
Note: This section reflects a general-reference pass. For high-stakes performance preparation, verification by a Russian diction specialist is recommended.
Czech appears in the art song, operatic, and choral repertoire of Dvořák, Smetana, Janáček, and Martinů. The stress pattern (always on the first syllable), length-distinctive vowels, and a few distinctive consonants define the sound.
Note: This section reflects a general-reference pass. For high-stakes performance preparation, verification by a Czech diction specialist is recommended.
Hebrew appears in sacred Jewish repertoire, cantorial music, art song (Bloch, Castelnuovo-Tedesco), Israeli art song, and contemporary choral music. Modern (Israeli) pronunciation differs from the older Ashkenazi tradition used in some historical cantorial repertoire.
Note: This section reflects a general-reference pass. For high-stakes performance preparation, verification by a Hebrew diction specialist is recommended.
Professional transcriptions (Castel, Adams, Wall, IPA Source) usually present a phrase in three parallel lines: the original text, the IPA, and a literal or poetic translation. Conventions vary: some sources mark every syllable division, some mark none; some use primary and secondary stress marks, some only primary; some mark closed/open vowels explicitly, some leave them to the singer. The first task with any published transcription is to note what it marks and what it leaves to interpretation.
When multiple voices sing the same word, their vowel targets must align closely. Small differences between [a] and [ä], or between [e] and [ɛ], become audible as chord color problems — the harmony sounds out of tune even when pitch is correct. Classical choirs typically unify toward bright, Italianate vowels. Barbershop and close-harmony ensembles unify to extremely tight tolerances.
When multiple voices sustain a diphthong, the release timing must match. A diphthong released at different moments creates an unintended ensemble blur. Classical ensembles typically delay the release until the end of the note. Barbershop ensembles specify the release to a precise beat or subdivision.
Final consonants must land together. An ensemble singing a final [t] with five different release points produces an obvious flaw even when everything else is right. Conductors commonly mark the exact beat on which final consonants release.
Search any symbol, scan the full alphabet, and check the diacritics you need.
Every IPA symbol used in this guide. Click any symbol to hear it.
"IPA does not tell you how to sing. It tells you what you are aiming at."
Voice coaching, choral diction, and ensemble prep across all the languages above.
Work with Ted