如何使用 Web Speech API 构建文本转语音应用程序

笔记首页 >> 教程系列 >> 如何使用 Web Speech API 构建文本转语音应用程序

如何使用 Web Speech API 构建文本转语音应用程序

介绍

您很有可能与提供某种形式的语音体验的应用程序进行过交互。它可以是具有文本转语音功能的应用程序，例如大声朗读您的短信或通知。它也可以是具有语音识别功能的应用程序，例如 Siri 或 Google Assistant。

随着 HTML5 的出现，Web 平台上可用的 API 数量有了非常快的增长。已经开发了几个称为Web Speech API 的 API，它们可以无缝地构建各种网络语音应用程序和体验。这些 API 仍处于实验阶段，尽管在所有现代浏览器中对大多数 API 的支持越来越多。

文本转语音应用程序加载、刷新和播放音频以获取报价的动画 gif。

在本文中，您将构建一个应用程序，该应用程序可以检索随机引语、显示引语，并为用户提供使用文本到语音的浏览器朗读引语的能力。

先决条件

要完成本教程，您需要：

Node.js 安装在本地，您可以按照如何安装 Node.js 和创建本地开发环境来完成。

本教程已通过 Node v14.4.0、npmv6.14.5、axiosv0.19.2、corsv2.8.5、expressv4.17.1 和 jQuery v3.5.1 验证。

使用网络语音 API

Web Speech API 有两个主要接口：

SpeechSynthesis – 用于文本到语音的应用程序。这允许应用使用设备的语音合成器读出它们的文本内容。可用的语音类型由一个SpeechSynthesisVoice对象表示，而要发出的文本由一个SpeechSynthesisUtterance对象表示。见支持表为SpeechSynthesis详细了解浏览器支持的界面。
SpeechRecognition – 适用于需要异步语音识别的应用程序。这允许应用程序从音频输入中识别语音上下文。甲SpeechRecognition对象可以使用构造来创建。该SpeechGrammar接口用于表示应用程序应识别的语法集。见支持表为SpeechRecognition详细了解浏览器支持的界面。

本教程将重点介绍SpeechSynthesis.

获得参考

获取对SpeechSynthesis对象的引用可以通过一行代码完成：

var synthesis = window.speechSynthesis;

以下代码片段显示了如何检查浏览器支持：

if ('speechSynthesis' in window) {
  var synthesis = window.speechSynthesis;
} else {
  console.log('Text-to-speech not supported.');
}

SpeechSynthesis在使用浏览器提供的功能之前检查浏览器是否支持它非常有用。

获得可用的声音

在此步骤中，您将在现有代码的基础上获得可用的语音。该getVoices()方法返回SpeechSynthesisVoice代表设备上所有可用语音的对象列表。

看看下面的代码片段：

if ('speechSynthesis' in window) {
  var synthesis = window.speechSynthesis;

  // Regex to match all English language tags e.g en, en-US, en-GB
  var langRegex = /^en(-[a-z]{2})?$/i;

  // Get the available voices and filter the list to only have English speakers
  var voices = synthesis
    .getVoices()
    .filter((voice) => langRegex.test(voice.lang));

  // Log the properties of the voices in the list
  voices.forEach(function (voice) {
    console.log({
      name: voice.name,
      lang: voice.lang,
      uri: voice.voiceURI,
      local: voice.localService,
      default: voice.default,
    });
  });
} else {
  console.log('Text-to-speech not supported.');
}

在这部分代码中，您将获得设备上可用语音的列表，并使用langRegex正则表达式过滤列表以确保我们只获取英语使用者的语音。最后，您遍历列表中的声音并将每个声音的属性记录到控制台。

构建演讲

在此步骤中，您将通过使用SpeechSynthesisUtterance构造函数并为可用属性设置值来构建语音。

以下代码片段创建了一个用于阅读文本的语音"Hello World"：

if ('speechSynthesis' in window) {
  var synthesis = window.speechSynthesis;

  // Get the first `en` language voice in the list
  var voice = synthesis.getVoices().filter(function (voice) {
    return voice.lang === 'en';
  })[0];

  // Create an utterance object
  var utterance = new SpeechSynthesisUtterance('Hello World');

  // Set utterance properties
  utterance.voice = voice;
  utterance.pitch = 1.5;
  utterance.rate = 1.25;
  utterance.volume = 0.8;

  // Speak the utterance
  synthesis.speak(utterance);
} else {
  console.log('Text-to-speech not supported.');
}

在这里，您可以en从可用语音列表中获取第一种语言语音。接下来，您使用SpeechSynthesisUtterance构造函数创建一个新的话语。然后，你喜欢设置一些属性的话语对象voice，pitch，rate，和volume。最后，它使用的speak()方法说出话语SpeechSynthesis。

注意：话语中可以说出的文本大小是有限制的。每个话语中可以说出的最大文本长度为32,767 个字符。

请注意，您在构造函数中传递了要发出的文本。

您还可以通过设置text发声对象的属性来设置要发声的文本。

这是一个简单的例子：

var synthesis = window.speechSynthesis;
var utterance = new SpeechSynthesisUtterance("Hello World");

// This overrides the text "Hello World" and is uttered instead
utterance.text = "My name is Glad.";

synthesis.speak(utterance);

这将覆盖在构造函数中传递的任何文本。

讲一段话

在前面的代码片段中，我们通过调用实例speak()上的方法来表达话语SpeechSynthesis。我们现在可以将SpeechSynthesisUtterance实例作为参数传递给speak()方法来说出话语。

var synthesis = window.speechSynthesis;

var utterance1 = new SpeechSynthesisUtterance("Hello World");
var utterance2 = new SpeechSynthesisUtterance("My name is Glad.");
var utterance3 = new SpeechSynthesisUtterance("I'm a web developer from Nigeria.");

synthesis.speak(utterance1);
synthesis.speak(utterance2);
synthesis.speak(utterance3);

您还可以对SpeechSynthesis实例执行其他一些操作，例如暂停、恢复和取消话语。因此pause()，resume()、和cancel()方法在SpeechSynthesis实例上也可用。

第 1 步 – 构建文本转语音应用程序

我们已经看到了SpeechSynthesis界面的基本方面。我们现在将开始构建我们的文本转语音应用程序。在我们开始之前，请确保您的机器上安装了Node和 npm。

在终端上运行以下命令，为应用程序设置项目并安装依赖项。

创建一个新的项目目录：

mkdir web-speech-app

移动到新创建的项目目录：

cd web-speech-app

初始化项目：

npm init -y

安装项目所需的依赖- express，cors和axios：

npm install express cors axios

修改文件的"scripts"部分，package.json使其看起来像以下代码段：

包.json

"scripts": {
  "start": "node server.js"
}

现在您已经为应用程序初始化了一个项目，您将继续使用Express为应用程序设置服务器。

创建一个新server.js文件并向其中添加以下内容：

服务器.js

const cors = require('cors');
const path = require('path');
const axios = require('axios');
const express = require('express');

const app = express();
const PORT = process.env.PORT || 5000;

app.set('port', PORT);

// Enable CORS (Cross-Origin Resource Sharing)
app.use(cors());

// Serve static files from the /public directory
app.use('/', express.static(path.join(__dirname, 'public')));

// A simple endpoint for fetching a random quote from QuotesOnDesign
app.get('/api/quote', (req, res) => {
  axios
    .get(
      'https://quotesondesign.com/wp-json/wp/v2/posts/?orderby=rand'
    )
    .then((response) => {
      const [post] = response.data;
      const { title, content } = post || {};

      return title && content
        ? res.json({ status: 'success', data: { title, content } })
        : res
            .status(500)
            .json({ status: 'failed', message: 'Could not fetch quote.' });
    })
    .catch((err) =>
      res
        .status(500)
        .json({ status: 'failed', message: 'Could not fetch quote.' })
    );
});

app.listen(PORT, () => console.log(`> App server is running on port ${PORT}.`));

在这里，您使用 Express 设置了一个 Node 服务器。您使用cors()中间件启用了 CORS（跨源请求共享）。您还可以使用express.static()中间件从/public项目根目录中的目录中提供静态文件。这将使您能够为您即将创建的索引页面提供服务。

最后，您设置了一个GET /api/quote从QuotesOnDesign API 服务获取随机报价的路由。您正在使用axios（一个基于承诺的 HTTP 客户端库）来发出 HTTP 请求。

下面是来自 QuotesOnDesign API 的示例响应：

Output
[
  {
    "title": { "rendered": "Victor Papanek" },
    "content": {
      "rendered": "<p>Any attempt to separate design, to make it a thing-by-itself, works counter to the inherent value of design as the primary, underlying matrix of life.</p>\n",
      "protected": false
    }
  }
]

注意：有关 QuotesOnDesign 的 API 更改的更多信息，请参阅他们的页面记录 4.0 和 5.0 之间的更改。

成功获取报价后，报价的title和content将在dataJSON 响应的字段中返回。否则，500将返回带有HTTP 状态代码的失败 JSON 响应。

接下来，您将为应用程序视图创建一个索引页面。

首先，public在项目的根目录下创建一个新文件夹：

mkdir public

接下来，index.html在新创建的public文件夹中创建一个新文件，并在其中添加以下内容：

公共/index.html

<html>

<head>
    <title>Daily Quotes</title>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
    <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.1/css/bootstrap.min.css" integrity="sha384-WskhaSGFgHYWDcbwN70/dfYBj47jz9qbsMId/iRN3ewGhXQFZCSftd1LZCfmhktB" crossorigin="anonymous">
</head>

<body class="position-absolute h-100 w-100">
    <div id="app" class="d-flex flex-wrap align-items-center align-content-center p-5 mx-auto w-50 position-relative"></div>

    <script src="https://unpkg.com/jquery/dist/jquery.min.js"></script>
    <script src="https://cdn.jsdelivr.net/npm/feather-icons/dist/feather.min.js"></script>
    <script src="main.js"></script>
</body>

</html>

这会为应用程序创建一个基本的索引页面，其中只有一个页面<div id="app">将用作应用程序所有动态内容的安装点。

您还添加了一个指向 Bootstrap CDN 的链接，以获得该应用程序的一些默认Bootstrap 4样式。您还包括用于 DOM 操作和 AJAX 请求的jQuery，以及用于优雅 SVG 图标的Feather图标。

第 2 步 – 构建主脚本

现在，您已经完成了为应用程序提供支持的最后一部分——主脚本。main.js在public您的应用程序目录中创建一个新文件并将以下内容添加到其中：

公共/main.js

jQuery(function ($) {
  let app = $('#app');

  let SYNTHESIS = null;
  let VOICES = null;

  let QUOTE_TEXT = null;
  let QUOTE_PERSON = null;

  let VOICE_SPEAKING = false;
  let VOICE_PAUSED = false;
  let VOICE_COMPLETE = false;

  let iconProps = {
    'stroke-width': 1,
    'width': 48,
    'height': 48,
    'class': 'text-secondary d-none',
    'style': 'cursor: pointer'
  };

  function iconSVG(icon) {}

  function showControl(control) {}

  function hideControl(control) {}

  function getVoices() {}

  function resetVoice() {}

  function fetchNewQuote() {}

  function renderQuote(quote) {}

  function renderVoiceControls(synthesis, voice) {}

  function updateVoiceControls() {}

  function initialize() {}

  initialize();
});

此代码用于jQuery在加载 DOM 时执行函数。您获得对#app元素的引用并初始化一些变量。您还声明了几个将在以下部分中实现的空函数。最后，我们调用initialize()函数来初始化应用程序。

该iconProps变量包含几个属性，用于将 Feather 图标作为 SVG 呈现给 DOM。

有了这些代码，您就可以开始实现这些功能了。修改public/main.js文件实现以下功能：

公共/main.js

// Gets the SVG markup for a Feather icon
function iconSVG(icon) {
  let props = $.extend(iconProps, { id: icon });
  return feather.icons[icon].toSvg(props);
}

// Shows an element
function showControl(control) {
  control.addClass('d-inline-block').removeClass('d-none');
}

// Hides an element
function hideControl(control) {
  control.addClass('d-none').removeClass('d-inline-block');
}

// Get the available voices, filter the list to have only English filters
function getVoices() {
  // Regex to match all English language tags e.g en, en-US, en-GB
  let langRegex = /^en(-[a-z]{2})?$/i;

  // Get the available voices and filter the list to only have English speakers
  VOICES = SYNTHESIS.getVoices()
    .filter(function (voice) {
      return langRegex.test(voice.lang);
    })
    .map(function (voice) {
      return {
        voice: voice,
        name: voice.name,
        lang: voice.lang.toUpperCase(),
      };
    });
}

// Reset the voice variables to the defaults
function resetVoice() {
  VOICE_SPEAKING = false;
  VOICE_PAUSED = false;
  VOICE_COMPLETE = false;
}

该iconSVG(icon)函数将 Feather 图标名称字符串作为参数（例如，'play-circle'）并返回图标的 SVG 标记。检查Feather 网站以查看可用 Feather 图标的完整列表。另请查看Feather 文档以了解有关 API 的更多信息。

该getVoices()函数使用该SYNTHESIS对象来获取设备上所有可用语音的列表。然后，它使用正则表达式过滤列表以获取仅讲英语的人的声音。

接下来，您将实现在 DOM 上获取和呈现报价的函数。修改public/main.js文件实现以下功能：

公共/main.js

function fetchNewQuote() {
  // Clean up the #app element
  app.html('');

  // Reset the quote variables
  QUOTE_TEXT = null;
  QUOTE_PERSON = null;

  // Reset the voice variables
  resetVoice();

  // Pick a voice at random from the VOICES list
  let voice =
    VOICES && VOICES.length > 0
      ? VOICES[Math.floor(Math.random() * VOICES.length)]
      : null;

  // Fetch a quote from the API and render the quote and voice controls
  $.get('/api/quote', function (quote) {
    renderQuote(quote.data);
    SYNTHESIS && renderVoiceControls(SYNTHESIS, voice || null);
  });
}

function renderQuote(quote) {
  // Create some markup for the quote elements
  let quotePerson = $('<h1 id="quote-person" class="mb-2 w-100"></h1>');
  let quoteText = $('<div id="quote-text" class="h3 py-5 mb-4 w-100 font-weight-light text-secondary border-bottom border-gray"></div>');

  // Add the quote data to the markup
  quotePerson.html(quote.title.rendered);
  quoteText.html(quote.content.rendered);

  // Attach the quote elements to the DOM
  app.append(quotePerson);
  app.append(quoteText);

  // Update the quote variables with the new data
  QUOTE_TEXT = quoteText.text();
  QUOTE_PERSON = quotePerson.text();
}

在该fetchNewQuote()方法中，您首先重置 app 元素和变量。然后，您可以Math.random()从VOICES变量中存储的语音列表中随机选择一个语音。您用来$.get()向/api/quote端点发出 AJAX 请求，获取随机报价，并将报价数据呈现到视图旁边的语音控件。

该renderQuote(quote)方法接收一个引用对象作为其参数并将内容添加到 DOM。最后，它更新引用变量：QUOTE_TEXT和QUOTE_PERSON。

如果您查看该fetchNewQuote()函数，您会注意到您调用了该renderVoiceControls()函数。该函数负责渲染播放、暂停和停止语音输出的控件。它还呈现当前使用的声音和语言。

对该public/main.js文件进行如下修改以实现该renderVoiceControls()功能：

公共/main.js

function renderVoiceControls(synthesis, voice) {
  let controlsPane = $('<div id="voice-controls-pane" class="d-flex flex-wrap w-100 align-items-center align-content-center justify-content-between"></div>');

  let voiceControls = $('<div id="voice-controls"></div>');

  // Create the SVG elements for the voice control buttons
  let playButton = $(iconSVG('play-circle'));
  let pauseButton = $(iconSVG('pause-circle'));
  let stopButton = $(iconSVG('stop-circle'));

  // Helper function to enable pause state for the voice output
  let paused = function () {
    VOICE_PAUSED = true;
    updateVoiceControls();
  };

  // Helper function to disable pause state for the voice output
  let resumed = function () {
    VOICE_PAUSED = false;
    updateVoiceControls();
  };

  // Click event handler for the play button
  playButton.on('click', function (evt) {});

  // Click event handler for the pause button
  pauseButton.on('click', function (evt) {});

  // Click event handler for the stop button
  stopButton.on('click', function (evt) {});

  // Add the voice controls to their parent element
  voiceControls.append(playButton);
  voiceControls.append(pauseButton);
  voiceControls.append(stopButton);

  // Add the voice controls parent to the controlsPane element
  controlsPane.append(voiceControls);

  // If voice is available, add the voice info element to the controlsPane
  if (voice) {
    let currentVoice = $('<div class="text-secondary font-weight-normal"><span class="text-dark font-weight-bold">' + voice.name + '</span> (' + voice.lang + ')</div>');

    controlsPane.append(currentVoice);
  }

  // Add the controlsPane to the DOM
  app.append(controlsPane);

  // Show the play button
  showControl(playButton);
}

在这里，您为语音控件和控件窗格创建容器元素。您可以使用iconSVG()之前创建的函数来获取控制按钮的 SVG 标记并创建按钮元素。您定义了paused()和resumed()辅助函数，它们将在为按钮设置事件处理程序时使用。

最后，您将语音控制按钮和语音信息呈现给 DOM。它还被配置为最初只显示播放按钮。

接下来，您将为您在上一节中定义的语音控制按钮实现单击事件处理程序。

设置事件处理程序，如以下代码片段所示：

公共/main.js

// Click event handler for the play button
playButton.on('click', function (evt) {
  evt.preventDefault();

  if (VOICE_SPEAKING) {
    // If voice is paused, it is resumed when the playButton is clicked
    if (VOICE_PAUSED) synthesis.resume();
    return resumed();
  } else {
    // Create utterances for the quote and the person
    let quoteUtterance = new SpeechSynthesisUtterance(QUOTE_TEXT);
    let personUtterance = new SpeechSynthesisUtterance(QUOTE_PERSON);

    // Set the voice for the utterances if available
    if (voice) {
      quoteUtterance.voice = voice.voice;
      personUtterance.voice = voice.voice;
    }

    // Set event listeners for the quote utterance
    quoteUtterance.onpause = paused;
    quoteUtterance.onresume = resumed;
    quoteUtterance.onboundary = updateVoiceControls;

    // Set the listener to activate speaking state when the quote utterance starts
    quoteUtterance.onstart = function (evt) {
      VOICE_COMPLETE = false;
      VOICE_SPEAKING = true;
      updateVoiceControls();
    };

    // Set event listeners for the person utterance
    personUtterance.onpause = paused;
    personUtterance.onresume = resumed;
    personUtterance.onboundary = updateVoiceControls;

    // Refresh the app and fetch a new quote when the person utterance ends
    personUtterance.onend = fetchNewQuote;

    // Speak the utterances
    synthesis.speak(quoteUtterance);
    synthesis.speak(personUtterance);
  }
});

// Click event handler for the pause button
pauseButton.on('click', function (evt) {
  evt.preventDefault();

  // Pause the utterance if it is not in paused state
  if (VOICE_SPEAKING) synthesis.pause();
  return paused();
});

// Click event handler for the stop button
stopButton.on('click', function (evt) {
  evt.preventDefault();

  // Clear the utterances queue
  if (VOICE_SPEAKING) synthesis.cancel();
  resetVoice();

  // Set the complete status of the voice output
  VOICE_COMPLETE = true;
  updateVoiceControls();
});

在这里，您为语音控制按钮设置单击事件侦听器。单击“播放”按钮时，它将开始朗读以开头quoteUtterance然后是的话语personUtterance。但是，如果语音输出处于暂停状态，则会恢复它。

您可以设置VOICE_SPEAKING要true在onstart事件处理程序quoteUtterance。该应用程序还将在personUtterance结束时刷新并获取新报价。

的暂停按钮将暂停声音输出，而停止按钮结束声音输出和从队列中移除所有话语，使用cancel()所述的方法SpeechSynthesis接口。代码updateVoiceControls()每次都会调用该函数以显示适当的按钮。

您updateVoiceControls()在前面的代码片段中对函数进行了几次调用和引用。该函数负责更新语音控件以根据语音状态变量显示适当的控件。

对该public/main.js文件进行如下修改以实现该updateVoiceControls()功能：

公共/main.js

function updateVoiceControls() {
  // Get a reference to each control button
  let playButton = $('#play-circle');
  let pauseButton = $('#pause-circle');
  let stopButton = $('#stop-circle');

  if (VOICE_SPEAKING) {
    // Show the stop button if speaking is in progress
    showControl(stopButton);

    // Toggle the play and pause buttons based on paused state
    if (VOICE_PAUSED) {
      showControl(playButton);
      hideControl(pauseButton);
    } else {
      hideControl(playButton);
      showControl(pauseButton);
    }
  } else {
    // Show only the play button if no speaking is in progress
    showControl(playButton);
    hideControl(pauseButton);
    hideControl(stopButton);
  }
}

在这部分代码中，您首先获得对每个语音控制按钮元素的引用。然后，您指定哪些语音控制按钮应在语音输出的不同状态下可见。

您现在已准备好实现该initialize()功能。该函数负责初始化应用程序。将以下代码片段添加到public/main.js文件中以实现该initialize()功能。

公共/main.js

function initialize() {
  if ('speechSynthesis' in window) {
    SYNTHESIS = window.speechSynthesis;

    let timer = setInterval(function () {
      let voices = SYNTHESIS.getVoices();

      if (voices.length > 0) {
        getVoices();
        fetchNewQuote();
        clearInterval(timer);
      }
    }, 200);
  } else {
    let message = 'Text-to-speech not supported by your browser.';

    // Create the browser notice element
    let notice = $('<div class="w-100 py-4 bg-danger font-weight-bold text-white position-absolute text-center" style="bottom:0; z-index:10">' + message + '</div>');

    fetchNewQuote();
    console.log(message);

    // Display non-support info on DOM
    $(document.body).append(notice);
  }
}

此代码首先检查全局对象speechSynthesis上是否可用，window然后将其分配给SYNTHESIS变量（如果可用）。接下来，设置获取可用语音列表的时间间隔。

您在这里使用间隔是因为有一个已知的异步行为，SpeechSynthesis.getVoices()这使得它在初始调用时返回一个空数组，因为语音尚未加载。间隔确保您在获取随机引用并清除间隔之前获得语音列表。

您现在已成功完成文本转语音应用程序。您可以通过在终端中运行以下命令来启动应用程序：

npm start

5000如果可用，该应用程序将在端口上运行。

localhost:5000在浏览器中访问以观察该应用程序。

语音转文字应用的最终截图

现在，与播放按钮进行交互以听到引用的内容。

结论

在本教程中，您使用 Web Speech API 为 Web 构建了一个文本转语音应用程序。您可以了解有关 Web Speech API 的更多信息，还可以在 MDN Web Docs 中找到一些有用的资源。

如果您想继续改进您的应用程序，您仍然可以实施和试验一些有趣的功能，例如音量控制、音高控制、速度/速率控制、发出的文本百分比等。

本教程的完整源代码可在 GitHub 上找到。

觉得文章有用？

点个广告表达一下你的爱意吧！