README

2024-06-28 16:56:33 -07:00
parent 331f6633ca
commit 386ad24104
7 changed files with 139 additions and 67 deletions
--- a/README.md
+++ b/README.md
@@ -1,53 +1,78 @@
-# Next.js & NextUI Template
+# HeyGen Streaming Avatar NextJS Demo

-This is a template for creating applications using Next.js 14 (app directory) and NextUI (v2).
+This is a sample project and was bootstrapped using [NextJS](https://nextjs.org/).
+Feel free to play around with the existing code and please leave any feedback for the SDK [here](https://github.com/HeyGen-Official/StreamingAvatarSDK/discussions).

-[Try it on CodeSandbox](https://githubbox.com/nextui-org/next-app-template)
+## Getting Started FAQ

-## Technologies Used
+### Setting up the demo

- [Next.js 14](https://nextjs.org/docs/getting-started)
- [NextUI v2](https://nextui.org/)
- [Tailwind CSS](https://tailwindcss.com/)
- [Tailwind Variants](https://tailwind-variants.org)
- [TypeScript](https://www.typescriptlang.org/)
- [Framer Motion](https://www.framer.com/motion/)
- [next-themes](https://github.com/pacocoursey/next-themes)
+1. Clone this repo

-## How to Use
+2. Navigate to the repo folder in your terminal

-### Use the template with create-next-app
+3. Run `npm install` (assuming you have npm installed. If not, please follow these instructions: https://docs.npmjs.com/downloading-and-installing-node-js-and-npm/)

-To create a new project based on this template using `create-next-app`, run the following command:
+4. Enter your HeyGen Enterprise API Token or Trial Token in the `.env` file. Replace `PLACEHOLDER-API-KEY` with your API key. This will allow the Client app to generate secure Access Tokens with which to create streaming sessions.

-```bash
-npx create-next-app -e https://github.com/nextui-org/next-app-template
-```
+   You can retrieve either the API Key or Trial Token by logging in to HeyGen and navigating to this page in your settings: [https://app.heygen.com/settings?nav=API]. NOTE: use the trial token if you don't have an enterprise API token yet.

-### Install dependencies
+5. (Optional) If you would like to use the OpenAI features, enter your OpenAI Api Key in the `.env` file.

-You can use one of them `npm`, `yarn`, `pnpm`, `bun`, Example using `npm`:
+6. Run `npm run dev`

-```bash
-npm install
-```
+### Difference between Trial Token and Enterprise API Token

-### Run the development server
+The HeyGen Trial Token is available to all users, not just Enterprise users, and allows for testing of the Streaming API, as well as other HeyGen API endpoints.

-```bash
-npm run dev
-```
+Each Trial Token is limited to 3 concurrent streaming sessions. However, every streaming session you create with the Trial Token is free of charge, no matter how many tasks are sent to the avatar. Please note that streaming sessions will automatically close after 10 minutes of no tasks sent.

-### Setup pnpm (optional)
+If you do not 'close' the streaming sessions and try to open more than 3, you will encounter errors including stuttering and freezing of the Streaming Avatar. Please endeavor to only have 3 sessions open at any time while you are testing the Streaming Avatar API with your Trial Token.

-If you are using `pnpm`, you need to add the following code to your `.npmrc` file:
+### Starting sessions

-```bash
-public-hoist-pattern[]=*@nextui-org/*
-```
+NOTE: Make sure you have enter your token into the `.env` file and run `npm run dev`.

-After modifying the `.npmrc` file, you need to run `pnpm install` again to ensure that the dependencies are installed correctly.
+To start your 'session' with a Streaming Avatar, first click the 'start' button. If your HeyGen API key is entered into the Server's .env file, then you should see our demo Streaming Avatar (Monica!) appear.

-## License
+After you see Monica appear on the screen, you can enter text into the input labeled 'Repeat', and then hit Enter. The Streaming Avatar will say the text you enter.

-Licensed under the [MIT license](https://github.com/nextui-org/next-app-template/blob/main/LICENSE).
+If you want to see a different Avatar or try a different voice, you can close the session and enter the IDs and then 'start' the session again. Please see below for information on where to retrieve different Avatar and voice IDs that you can use.
+
+### Connecting to OpenAI
+
+A common use case for a Streaming Avatar is to use it as the 'face' of an LLM that users can interact with. In this demo we have included functionality to showcase this by both accepting user input via voice (using OpenAI's Whisper library) and also sending that input to an OpenAI LLM model (using their Chat Completions endpoint).
+
+Both of these features of this demo require an OpenAI API Key. If you do not have a paid OpenAI account, you can learn more on their website: [https://openai.com/index/openai-api/]
+
+Without an OpenAI API Key, this functionality will not work, and the Streaming Avatar will only be able to repeat text input that you provide, and not demonstrate being the 'face' of an LLM. Regardless, this demo is meant to demonstrate what kinds of apps and experiences you can build with our Streaming Avatar SDK, so you can code your own connection to a different LLM if you so choose.
+
+To add your Open AI API Key, fill copy it to the `OPENAI_API_KEY` and `NEXT_PUBLIC_OPENAI_API_KEY` variables in the `.env` file.
+
+### How does the integration with OpenAI / ChatGPT work?
+
+In this demo, we are calling the Chat Completions API from OpenAI in order to come up with some response to user input. You can see the relevant code in components/StreamingAvatar.tsx.
+
+In the initialMessages parameter, you can replace the content of the 'system' message with whatever 'knowledge base' or context that you would like the GPT-4o model to reply to the user's input with.
+
+You can explore this API and the different parameters and models available here: [https://platform.openai.com/docs/guides/text-generation/chat-completions-api]
+
+### Which Avatars can I use with this project?
+
+By default, there are several Public Avatars that can be used in Streaming. (AKA Streaming Avatars.) You can find the Avatar IDs for these Public Avatars by navigating to [app.heygen.com/streaming-avatar](https://app.heygen.com/streaming-avatar) and clicking 'Select Avatar' and copying the avatar id.
+
+In order to use a private Avatar created under your own account in Streaming, it must be upgraded to be a Streaming Avatar. Only 1. Finetune Instant Avatars and 2. Studio Avatars are able to be upgraded to Streaming Avatars. This upgrade is a one-time fee and can be purchased by navigating to [app.heygen.com/streaming-avatar] and clicking 'Select Avatar'.
+
+Please note that Photo Avatars are not compatible with Streaming and cannot be used.
+
+### Which voices can I use with my Streaming Avatar?
+
+Most of HeyGen's AI Voices can be used with the Streaming API. To find the Voice IDs that you can use, please use the List Voices v2 endpoint from HeyGen: [https://docs.heygen.com/reference/list-voices-v2]
+
+Please note that for voices that support Emotions, such as Christine and Tarquin, you need to pass in the Emotion string in the Voice Setting parameter: [https://docs.heygen.com/reference/new-session-copy#voicesetting]
+
+You can also set the speed at which the Streaming Avatar speaks by passing in a Rate in the Voice Setting.
+
+### Where can I read more about enterprise-level usage of the Streaming API?
+
+Please read our Streaming Avatar 101 article for more information on pricing and how to increase your concurrent session limit: https://help.heygen.com/en/articles/9182113-streaming-avatar-101-your-ultimate-guide
--- a/app/layout.tsx
+++ b/app/layout.tsx
@@ -48,7 +48,7 @@ export default function RootLayout({
      <head />
      <body className={clsx("min-h-screen bg-background antialiased")}>
        <Providers themeProps={{ attribute: "class", defaultTheme: "dark" }}>
-          <main className="relative flex flex-col h-screen">
+          <main className="relative flex flex-col h-screen w-screen">
            <NavBar />
            {children}
          </main>
--- a/app/page.tsx
+++ b/app/page.tsx
@@ -1,6 +1,5 @@
 "use client";

-import NavBar from "@/components/NavBar";
 import StreamingAvatar from "@/components/StreamingAvatar";
 import StreamingAvatarCode from "@/components/StreamingAvatarCode";
 import { Tab, Tabs } from "@nextui-org/react";
--- a/components/NavBar.tsx
+++ b/components/NavBar.tsx
@@ -1,3 +1,5 @@
+"use client";
+
 import {
  Link,
  Navbar,
@@ -10,7 +12,7 @@ import { ThemeSwitch } from "./ThemeSwitch";

 export default function NavBar() {
  return (
-    <Navbar>
+    <Navbar className="w-full">
      <NavbarBrand>
        <Link isExternal aria-label="HeyGen" href="https://app.heygen.com/">
          <HeyGenLogo />
@@ -22,21 +24,39 @@ export default function NavBar() {
        </div>
      </NavbarBrand>
      <NavbarContent justify="center">
-        <NavbarItem className="flex flex-row items-center gap-10">
+        <NavbarItem className="flex flex-row items-center gap-4">
+          <Link
+            color="foreground"
+            href="https://app.heygen.com/streaming-avatar"
+          >
+            Avatars
+          </Link>
+          <Link
+            color="foreground"
+            href="https://docs.heygen.com/reference/list-voices-v2"
+          >
+            Voices
+          </Link>
          <Link
            color="foreground"
            href="https://docs.heygen.com/reference/new-session-copy"
          >
            API Docs
          </Link>
+          <Link
+            color="foreground"
+            href="https://help.heygen.com/en/articles/9182113-streaming-avatar-101-your-ultimate-guide"
+          >
+            Guide
+          </Link>
          <Link
            isExternal
            aria-label="Github"
            href="https://github.com/HeyGen-Official/StreamingAvatarSDK"
-            className="flex flex-row justify-center gap-2 text-foreground"
+            className="flex flex-row justify-center gap-1 text-foreground"
          >
            <GithubIcon className="text-default-500" />
-            SDK Github
+            SDK
          </Link>
          <ThemeSwitch />
        </NavbarItem>
--- a/components/StreamingAvatar.tsx
+++ b/components/StreamingAvatar.tsx
@@ -13,18 +13,12 @@ import {
  Spinner,
  Tooltip,
 } from "@nextui-org/react";
+import { Microphone, MicrophoneStage } from "@phosphor-icons/react";
 import { useChat } from "ai/react";
+import clsx from "clsx";
 import OpenAI from "openai";
 import { useEffect, useRef, useState } from "react";
 import StreamingAvatarTextInput from "./StreamingAvatarTextInput";
-import {
-  Camera,
-  Microphone,
-  MicrophoneSlash,
-  MicrophoneStage,
-  Record,
-} from "@phosphor-icons/react";
-import clsx from "clsx";

 const openai = new OpenAI({
  apiKey: process.env.NEXT_PUBLIC_OPENAI_API_KEY,
@@ -32,7 +26,9 @@ const openai = new OpenAI({
 });

 export default function StreamingAvatar() {
-  const [loading, setLoading] = useState(false);
+  const [isLoadingSession, setIsLoadingSession] = useState(false);
+  const [isLoadingRepeat, setIsLoadingRepeat] = useState(false);
+  const [isLoadingChat, setIsLoadingChat] = useState(false);
  const [stream, setStream] = useState<MediaStream>();
  const [debug, setDebug] = useState<string>();
  const [avatarId, setAvatarId] = useState<string>("");
@@ -45,7 +41,7 @@ export default function StreamingAvatar() {
  const avatar = useRef<StreamingAvatarApi | null>(null);
  const mediaRecorder = useRef<MediaRecorder | null>(null);
  const audioChunks = useRef<Blob[]>([]);
-  const { input, setInput, isLoading, handleSubmit } = useChat({
+  const { input, setInput, handleSubmit } = useChat({
    onFinish: async (message) => {
      console.log("ChatGPT Response:", message);

@@ -62,7 +58,15 @@ export default function StreamingAvatar() {
        .catch((e) => {
          setDebug(e.message);
        });
+      setIsLoadingChat(false);
    },
+    initialMessages: [
+      {
+        id: "1",
+        role: "system",
+        content: "You are a helpful assistant.",
+      },
+    ],
  });

  async function fetchAccessToken() {
@@ -80,7 +84,7 @@ export default function StreamingAvatar() {
  }

  async function start() {
-    setLoading(true);
+    setIsLoadingSession(true);
    await updateToken();
    if (!avatar.current) {
      setDebug("Avatar API is not initialized");
@@ -99,7 +103,7 @@ export default function StreamingAvatar() {
      );
      setData(res);
      setStream(avatar.current.mediaStream);
-      setLoading(false);
+      setIsLoadingSession(false);
    } catch (error) {
      console.error("Error starting avatar session:", error);
    }
@@ -136,9 +140,11 @@ export default function StreamingAvatar() {
      { stopSessionRequest: { sessionId: data?.sessionId } },
      setDebug
    );
+    setStream(undefined);
  }

  async function handleSpeak() {
+    setIsLoadingRepeat(true);
    if (!initialized || !avatar.current) {
      setDebug("Avatar API not initialized");
      return;
@@ -148,6 +154,7 @@ export default function StreamingAvatar() {
      .catch((e) => {
        setDebug(e.message);
      });
+    setIsLoadingRepeat(false);
  }

  useEffect(() => {
@@ -242,8 +249,16 @@ export default function StreamingAvatar() {
              >
                <track kind="captions" />
              </video>
+              <Button
+                size="md"
+                onClick={stop}
+                className="bg-gradient-to-tr from-indigo-500 to-indigo-300 absolute bottom-3 right-3 text-white rounded-lg"
+                variant="shadow"
+              >
+                Stop session
+              </Button>
            </div>
-          ) : !loading ? (
+          ) : !isLoadingSession ? (
            <div className="h-full justify-center items-center flex flex-col gap-4 w-96 self-center">
              <Input
                value={avatarId}
@@ -277,12 +292,14 @@ export default function StreamingAvatar() {
            onSubmit={handleSpeak}
            setInput={setText}
            disabled={!stream}
+            loading={isLoadingRepeat}
          />
          <StreamingAvatarTextInput
            label="Chat"
            placeholder="Chat with the avatar (uses ChatGPT)"
            input={input}
            onSubmit={() => {
+              setIsLoadingChat(true);
              if (!input) {
                setDebug("Please enter text to send to ChatGPT");
                return;
@@ -290,6 +307,7 @@ export default function StreamingAvatar() {
              handleSubmit();
            }}
            setInput={setInput}
+            loading={isLoadingChat}
            endContent={
              <Tooltip
                content={!recording ? "Start recording" : "Stop recording"}
--- a/components/StreamingAvatarTextInput.tsx
+++ b/components/StreamingAvatarTextInput.tsx
@@ -1,4 +1,4 @@
-import { Input, Tooltip } from "@nextui-org/react";
+import { Input, Spinner, Tooltip } from "@nextui-org/react";
 import { Airplane, ArrowRight, PaperPlaneRight } from "@phosphor-icons/react";
 import clsx from "clsx";

@@ -10,6 +10,7 @@ interface StreamingAvatarTextInputProps {
  setInput: (value: string) => void;
  endContent?: React.ReactNode;
  disabled?: boolean;
+  loading?: boolean;
 }

 export default function StreamingAvatarTextInput({
@@ -20,6 +21,7 @@ export default function StreamingAvatarTextInput({
  setInput,
  endContent,
  disabled = false,
+  loading = false,
 }: StreamingAvatarTextInputProps) {
  function handleSubmit() {
    if (input.trim() === "") {
@@ -35,19 +37,27 @@ export default function StreamingAvatarTextInput({
        <div className="flex flex-row items-center h-full">
          {endContent}
          <Tooltip content="Send message">
-            <button
-              type="submit"
-              className="focus:outline-none"
-              onClick={handleSubmit}
-            >
-              <PaperPlaneRight
-                className={clsx(
-                  "text-indigo-300 hover:text-indigo-200",
-                  disabled && "opacity-50"
-                )}
-                size={24}
+            {loading ? (
+              <Spinner
+                className="text-indigo-300 hover:text-indigo-200"
+                size="sm"
+                color="default"
              />
-            </button>
+            ) : (
+              <button
+                type="submit"
+                className="focus:outline-none"
+                onClick={handleSubmit}
+              >
+                <PaperPlaneRight
+                  className={clsx(
+                    "text-indigo-300 hover:text-indigo-200",
+                    disabled && "opacity-50"
+                  )}
+                  size={24}
+                />
+              </button>
+            )}
          </Tooltip>
        </div>
      }
--- a/package.json
+++ b/package.json
@@ -3,7 +3,7 @@
  "version": "0.0.1",
  "private": true,
  "scripts": {
-    "dev": "next dev -p 3234",
+    "dev": "next dev --turbo",
    "build": "next build",
    "start": "next start",
    "lint": "eslint . --ext .ts,.tsx -c .eslintrc.json --fix"