CereVoice Cloud API Docs v2

Using Multiple Voices

The CereVoice Cloud allows the use of multiple voices during synthesis. By placing tags around the input text, the user is able to assign specific text to specific voices. For example, if the following call was sent to the CereVoice Cloud:

curl -X POST "{url}speak?voice=Stuart" -H  "accept: application/json" -H  "Content-Type: text/xml" -H  "Authorization: Bearer <access_token>" -d "<doc>Hello. My name is Stuart. This is my CereProc sister, Heather.<voice name='Heather'>Hello, my name is Heather.</voice></doc>"

The text - Hello, my name is Heather. is spoken by Heather, whilst the rest is spoken by Stuart.

CereProc Tag Set

CereProc has implemented additional TTS functionality that is not part of the SSML specification.

Variant Tags

The variant tag allows the user to request a different version of the synthesis for a particular section of speech. This is a very useful tag that can be used to make sections of speech sound more appropriate, or to vary otherwise repetitive content. The variant number can be increased to produce different versions of the speech. The original version is equivalent to variant 0. For example, to change the version of the word test in This is a test sentence, use:

<s>
    This is a <usel variant="1">test</usel> sentence.
</s>

Setting variant="2" produces another different version, and so on. The variant tag can be used to produce a bespoke rendering of a particular piece of speech. For example, an often-used speech prompt could be tuned to give a different rendering if desired. Please note that the variant tag should mainly be used for creating static prompts(i.e. audio files). The effect of the variant number is different between voices, and may also change when a new version of the same voice is produced (this is because the underlying speech engine is being constantly improved, and the default rendering may change).

Vocal Gestures

Non-speech sounds, such as laughter and coughing, can be inserted into the output speech. The <spurt> tag isused with an audio attribute to select a vocal gesture to include in the synthesis output, for example:

<speak>
    <spurt audio="g0001_004">cough</spurt>, excuse me, <spurt audio="g0001_018">err</spurt>, hello.
</speak>

The <spurt> tag cannot be empty, however the text content of the tag is not read, it is replaced by the gesture.
See the List of vocal gesture IDs for the full list of available gestures.

Emotion Tags

Available in voices with emotional support (for example Adam, Caitlin, Heather, Isabella, Jack, Jess, Katherine, Kirsty, Laura, Sarah, Stuart, Suzanne, William).

Happy Emotion Tag

For example:

<s>
    Today, <voice emotion='happy'>the sun is shining.</voice>
</s>

Sad Emotion Tag

<s>
    The outbreak <voice emotion='sad'>cast a shadow</voice> over the former Victorian holiday resort.
</s>

Calm Emotion Tag

<s>
    The beautiful gardens have been restored to all their <voice emotion='calm'>eccentric Victorian splendour.</voice>
</s>

Cross Emotion Tag

<s>
    When people leave a tip they want to know it will <voice emotion='cross'> not be used</voice> to make up the minimum wage.
</s>

Support

CereProc offers support via email. There are two methods of contacting CereProc Support:

Support Request: The fastest way to contact CereProc Support is via a support request. First log in to the CereProc website. Registered users can then access the support request form. Please select the appropriate product from the list and submit the support request.
Direct Email: CereProc support can be emailed at support@cereproc.com. However, queries sent to this address may take longer to reach the appropriate technical support representative than requests sent using the support request form.

List of vocal gesture IDs

These IDs can be used to insert a 'vocal gesture' (non-speech sound) into synthesis.
Note that gesture g0001_035 is available in Scottish voices only.

Gesture ID	Gesture description
g0001_001	tut
g0001_002	tut tut
g0001_003	cough
g0001_004	cough
g0001_005	cough
g0001_006	clear throat
g0001_007	breath in
g0001_008	sharp intake of breath
g0001_009	breath in through teeth
g0001_010	sigh happy
g0001_011	sigh sad
g0001_012	hmm question
g0001_013	hmm yes

Gesture ID	Gesture description
g0001_014	hmm thinking
g0001_015	umm
g0001_016	umm
g0001_017	err
g0001_018	err
g0001_019	giggle
g0001_020	giggle
g0001_021	laugh
g0001_022	laugh
g0001_023	laugh
g0001_024	laugh
g0001_025	ah positive
g0001_026	ah negative

Gesture ID	Gesture description
g0001_027	yeah question
g0001_028	yeah positive
g0001_029	yeah resigned
g0001_030	sniff
g0001_031	sniff
g0001_032	argh
g0001_033	argh
g0001_034	ugh
g0001_035	ocht
g0001_036	yay
g0001_037	oh positive
g0001_038	oh negative
g0001_039	sarcastic noise

Gesture ID	Gesture description
g0001_040	yawn
g0001_041	yawn
g0001_042	snore
g0001_043	snore phew
g0001_044	zzz
g0001_045	raspberry
g0001_046	raspberry
g0001_047	brrr cold
g0001_048	snort
g0001_050	ha ha (sarcastic)
g0001_051	doh
g0001_052	gasp