feat: task mode

chore: update sdk version (#26 )
chore: update sdk version (#24 )
2024-10-22 16:58:16 -07:00 · 2024-09-30 18:24:59 -07:00 · 2024-09-27 15:39:13 -07:00 · 2024-09-25 19:53:26 -07:00 · 2024-09-23 16:14:07 -07:00 · 2024-09-23 15:55:33 -07:00
12 changed files with 420 additions and 414 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -33,3 +33,4 @@ yarn-error.log*
 # typescript
 *.tsbuildinfo
 next-env.d.ts
+.idea
--- a/README.md
+++ b/README.md
@@ -1,6 +1,6 @@
-# HeyGen Streaming Avatar NextJS Demo
+# HeyGen Interactive Avatar NextJS Demo

-![HeyGen Streaming Avatar NextJS Demo Screenshot](./public/demo.png)
+![HeyGen Interactive Avatar NextJS Demo Screenshot](./public/demo.png)

 This is a sample project and was bootstrapped using [NextJS](https://nextjs.org/).
 Feel free to play around with the existing code and please leave any feedback for the SDK [here](https://github.com/HeyGen-Official/StreamingAvatarSDK/discussions).
@@ -15,7 +15,7 @@ Feel free to play around with the existing code and please leave any feedback fo

 3. Run `npm install` (assuming you have npm installed. If not, please follow these instructions: https://docs.npmjs.com/downloading-and-installing-node-js-and-npm/)

-4. Enter your HeyGen Enterprise API Token or Trial Token in the `.env` file. Replace `PLACEHOLDER-API-KEY` with your API key. This will allow the Client app to generate secure Access Tokens with which to create streaming sessions.
+4. Enter your HeyGen Enterprise API Token or Trial Token in the `.env` file. Replace `HEYGEN_API_KEY` with your API key. This will allow the Client app to generate secure Access Tokens with which to create interactive sessions.

   You can retrieve either the API Key or Trial Token by logging in to HeyGen and navigating to this page in your settings: [https://app.heygen.com/settings?nav=API]. NOTE: use the trial token if you don't have an enterprise API token yet.

@@ -25,35 +25,35 @@ Feel free to play around with the existing code and please leave any feedback fo

 ### Difference between Trial Token and Enterprise API Token

-The HeyGen Trial Token is available to all users, not just Enterprise users, and allows for testing of the Streaming API, as well as other HeyGen API endpoints.
+The HeyGen Trial Token is available to all users, not just Enterprise users, and allows for testing of the Interactive Avatar API, as well as other HeyGen API endpoints.

-Each Trial Token is limited to 3 concurrent streaming sessions. However, every streaming session you create with the Trial Token is free of charge, no matter how many tasks are sent to the avatar. Please note that streaming sessions will automatically close after 10 minutes of no tasks sent.
+Each Trial Token is limited to 3 concurrent interactive sessions. However, every interactive session you create with the Trial Token is free of charge, no matter how many tasks are sent to the avatar. Please note that interactive sessions will automatically close after 10 minutes of no tasks sent.

-If you do not 'close' the streaming sessions and try to open more than 3, you will encounter errors including stuttering and freezing of the Streaming Avatar. Please endeavor to only have 3 sessions open at any time while you are testing the Streaming Avatar API with your Trial Token.
+If you do not 'close' the interactive sessions and try to open more than 3, you will encounter errors including stuttering and freezing of the Interactive Avatar. Please endeavor to only have 3 sessions open at any time while you are testing the Interactive Avatar API with your Trial Token.

 ### Starting sessions

 NOTE: Make sure you have enter your token into the `.env` file and run `npm run dev`.

-To start your 'session' with a Streaming Avatar, first click the 'start' button. If your HeyGen API key is entered into the Server's .env file, then you should see our demo Streaming Avatar (Monica!) appear.
+To start your 'session' with a Interactive Avatar, first click the 'start' button. If your HeyGen API key is entered into the Server's .env file, then you should see our demo Interactive Avatar (Monica!) appear.

-After you see Monica appear on the screen, you can enter text into the input labeled 'Repeat', and then hit Enter. The Streaming Avatar will say the text you enter.
+After you see Monica appear on the screen, you can enter text into the input labeled 'Repeat', and then hit Enter. The Interactive Avatar will say the text you enter.

 If you want to see a different Avatar or try a different voice, you can close the session and enter the IDs and then 'start' the session again. Please see below for information on where to retrieve different Avatar and voice IDs that you can use.

 ### Connecting to OpenAI

-A common use case for a Streaming Avatar is to use it as the 'face' of an LLM that users can interact with. In this demo we have included functionality to showcase this by both accepting user input via voice (using OpenAI's Whisper library) and also sending that input to an OpenAI LLM model (using their Chat Completions endpoint).
+A common use case for a Interactive Avatar is to use it as the 'face' of an LLM that users can interact with. In this demo we have included functionality to showcase this by both accepting user input via voice (using OpenAI's Whisper library) and also sending that input to an OpenAI LLM model (using their Chat Completions endpoint).

 Both of these features of this demo require an OpenAI API Key. If you do not have a paid OpenAI account, you can learn more on their website: [https://openai.com/index/openai-api/]

-Without an OpenAI API Key, this functionality will not work, and the Streaming Avatar will only be able to repeat text input that you provide, and not demonstrate being the 'face' of an LLM. Regardless, this demo is meant to demonstrate what kinds of apps and experiences you can build with our Streaming Avatar SDK, so you can code your own connection to a different LLM if you so choose.
+Without an OpenAI API Key, this functionality will not work, and the Interactive Avatar will only be able to repeat text input that you provide, and not demonstrate being the 'face' of an LLM. Regardless, this demo is meant to demonstrate what kinds of apps and experiences you can build with our Interactive Avatar SDK, so you can code your own connection to a different LLM if you so choose.

 To add your Open AI API Key, fill copy it to the `OPENAI_API_KEY` and `NEXT_PUBLIC_OPENAI_API_KEY` variables in the `.env` file.

 ### How does the integration with OpenAI / ChatGPT work?

-In this demo, we are calling the Chat Completions API from OpenAI in order to come up with some response to user input. You can see the relevant code in components/StreamingAvatar.tsx.
+In this demo, we are calling the Chat Completions API from OpenAI in order to come up with some response to user input. You can see the relevant code in components/InteractiveAvatar.tsx.

 In the initialMessages parameter, you can replace the content of the 'system' message with whatever 'knowledge base' or context that you would like the GPT-4o model to reply to the user's input with.

@@ -61,20 +61,12 @@ You can explore this API and the different parameters and models available here:

 ### Which Avatars can I use with this project?

-By default, there are several Public Avatars that can be used in Streaming. (AKA Streaming Avatars.) You can find the Avatar IDs for these Public Avatars by navigating to [app.heygen.com/streaming-avatar](https://app.heygen.com/streaming-avatar) and clicking 'Select Avatar' and copying the avatar id.
+By default, there are several Public Avatars that can be used in Interactive Avatar. (AKA Interactive Avatars.) You can find the Avatar IDs for these Public Avatars by navigating to [app.heygen.com/interactive-avatar](https://app.heygen.com/interactive-avatar) and clicking 'Select Avatar' and copying the avatar id.

-In order to use a private Avatar created under your own account in Streaming, it must be upgraded to be a Streaming Avatar. Only 1. Finetune Instant Avatars and 2. Studio Avatars are able to be upgraded to Streaming Avatars. This upgrade is a one-time fee and can be purchased by navigating to [app.heygen.com/streaming-avatar] and clicking 'Select Avatar'.
+In order to use a private Avatar created under your own account in Interactive Avatar, it must be upgraded to be a Interactive Avatar. Only 1. Finetune Instant Avatars and 2. Studio Avatars are able to be upgraded to Interactive Avatars. This upgrade is a one-time fee and can be purchased by navigating to [app.heygen.com/interactive-avatar] and clicking 'Select Avatar'.

-Please note that Photo Avatars are not compatible with Streaming and cannot be used.
+Please note that Photo Avatars are not compatible with Interactive Avatar and cannot be used.

-### Which voices can I use with my Streaming Avatar?
+### Where can I read more about enterprise-level usage of the Interactive Avatar API?

-Most of HeyGen's AI Voices can be used with the Streaming API. To find the Voice IDs that you can use, please use the List Voices v2 endpoint from HeyGen: [https://docs.heygen.com/reference/list-voices-v2]
-
-Please note that for voices that support Emotions, such as Christine and Tarquin, you need to pass in the Emotion string in the Voice Setting parameter: [https://docs.heygen.com/reference/new-session-copy#voicesetting]
-
-You can also set the speed at which the Streaming Avatar speaks by passing in a Rate in the Voice Setting.
-
-### Where can I read more about enterprise-level usage of the Streaming API?
-
-Please read our Streaming Avatar 101 article for more information on pricing and how to increase your concurrent session limit: https://help.heygen.com/en/articles/9182113-streaming-avatar-101-your-ultimate-guide
+Please read our Interactive Avatar 101 article for more information on pricing and how to increase your concurrent session limit: https://help.heygen.com/en/articles/9182113-interactive-avatar-101-your-ultimate-guide
--- a/app/api/get-access-token/route.ts
+++ b/app/api/get-access-token/route.ts
@@ -13,7 +13,7 @@ export async function POST() {
        headers: {
          "x-api-key": HEYGEN_API_KEY,
        },
-      }
+      },
    );
    const data = await res.json();

--- a/app/layout.tsx
+++ b/app/layout.tsx
@@ -19,8 +19,8 @@ const fontMono = FontMono({

 export const metadata: Metadata = {
  title: {
-    default: "HeyGen Streaming Avatar SDK Demo",
-    template: `%s - HeyGen Streaming Avatar SDK Demo`,
+    default: "HeyGen Interactive Avatar SDK Demo",
+    template: `%s - HeyGen Interactive Avatar SDK Demo`,
  },
  icons: {
    icon: "/heygen-logo.png",
--- a/app/lib/constants.ts
+++ b/app/lib/constants.ts
@@ -0,0 +1,53 @@
+export const AVATARS = [
+  {
+    avatar_id: "Eric_public_pro2_20230608",
+    name: "Edward in Blue Shirt",
+  },
+  {
+    avatar_id: "Tyler-incasualsuit-20220721",
+    name: "Tyler in Casual Suit",
+  },
+  {
+    avatar_id: "Anna_public_3_20240108",
+    name: "Anna in Brown T-shirt",
+  },
+  {
+    avatar_id: "Susan_public_2_20240328",
+    name: "Susan in Black Shirt",
+  },
+  {
+    avatar_id: "josh_lite3_20230714",
+    name: "Joshua Heygen CEO",
+  },
+];
+
+export const STT_LANGUAGE_LIST = [
+  { label: 'Bulgarian', value: 'bg', key: 'bg' },
+  { label: 'Chinese', value: 'zh', key: 'zh' },
+  { label: 'Czech', value: 'cs', key: 'cs' },
+  { label: 'Danish', value: 'da', key: 'da' },
+  { label: 'Dutch', value: 'nl', key: 'nl' },
+  { label: 'English', value: 'en', key: 'en' },
+  { label: 'Finnish', value: 'fi', key: 'fi' },
+  { label: 'French', value: 'fr', key: 'fr' },
+  { label: 'German', value: 'de', key: 'de' },
+  { label: 'Greek', value: 'el', key: 'el' },
+  { label: 'Hindi', value: 'hi', key: 'hi' },
+  { label: 'Hungarian', value: 'hu', key: 'hu' },
+  { label: 'Indonesian', value: 'id', key: 'id' },
+  { label: 'Italian', value: 'it', key: 'it' },
+  { label: 'Japanese', value: 'ja', key: 'ja' },
+  { label: 'Korean', value: 'ko', key: 'ko' },
+  { label: 'Malay', value: 'ms', key: 'ms' },
+  { label: 'Norwegian', value: 'no', key: 'no' },
+  { label: 'Polish', value: 'pl', key: 'pl' },
+  { label: 'Portuguese', value: 'pt', key: 'pt' },
+  { label: 'Romanian', value: 'ro', key: 'ro' },
+  { label: 'Russian', value: 'ru', key: 'ru' },
+  { label: 'Slovak', value: 'sk', key: 'sk' },
+  { label: 'Spanish', value: 'es', key: 'es' },
+  { label: 'Swedish', value: 'sv', key: 'sv' },
+  { label: 'Turkish', value: 'tr', key: 'tr' },
+  { label: 'Ukrainian', value: 'uk', key: 'uk' },
+  { label: 'Vietnamese', value: 'vi', key: 'vi' },
+];
--- a/app/page.tsx
+++ b/app/page.tsx
@@ -1,34 +1,13 @@
 "use client";

-import StreamingAvatar from "@/components/StreamingAvatar";
-import StreamingAvatarCode from "@/components/StreamingAvatarCode";
-import { Tab, Tabs } from "@nextui-org/react";
-
+import InteractiveAvatar from "@/components/InteractiveAvatar";
 export default function App() {
-  const tabs = [
-    {
-      id: "demo",
-      label: "Demo",
-      content: <StreamingAvatar />,
-    },
-    {
-      id: "code",
-      label: "Code",
-      content: <StreamingAvatarCode />,
-    },
-  ];

  return (
    <div className="w-screen h-screen flex flex-col">
      <div className="w-[900px] flex flex-col items-start justify-start gap-5 mx-auto pt-4 pb-20">
        <div className="w-full">
-          <Tabs items={tabs}>
-            {(items) => (
-              <Tab key={items.id} title={items.label}>
-                {items.content}
-              </Tab>
-            )}
-          </Tabs>
+          <InteractiveAvatar />
        </div>
      </div>
    </div>
--- a/components/InteractiveAvatar.tsx
+++ b/components/InteractiveAvatar.tsx
@@ -0,0 +1,328 @@
+import type { StartAvatarResponse } from "@heygen/streaming-avatar";
+
+import StreamingAvatar, {
+  AvatarQuality,
+  StreamingEvents, TaskMode, TaskType, VoiceEmotion,
+} from "@heygen/streaming-avatar";
+import {
+  Button,
+  Card,
+  CardBody,
+  CardFooter,
+  Divider,
+  Input,
+  Select,
+  SelectItem,
+  Spinner,
+  Chip,
+  Tabs,
+  Tab,
+} from "@nextui-org/react";
+import { useEffect, useRef, useState } from "react";
+import { useMemoizedFn, usePrevious } from "ahooks";
+
+import InteractiveAvatarTextInput from "./InteractiveAvatarTextInput";
+
+import {AVATARS, STT_LANGUAGE_LIST} from "@/app/lib/constants";
+
+export default function InteractiveAvatar() {
+  const [isLoadingSession, setIsLoadingSession] = useState(false);
+  const [isLoadingRepeat, setIsLoadingRepeat] = useState(false);
+  const [stream, setStream] = useState<MediaStream>();
+  const [debug, setDebug] = useState<string>();
+  const [knowledgeId, setKnowledgeId] = useState<string>("");
+  const [avatarId, setAvatarId] = useState<string>("");
+  const [language, setLanguage] = useState<string>('en');
+
+  const [data, setData] = useState<StartAvatarResponse>();
+  const [text, setText] = useState<string>("");
+  const mediaStream = useRef<HTMLVideoElement>(null);
+  const avatar = useRef<StreamingAvatar | null>(null);
+  const [chatMode, setChatMode] = useState("text_mode");
+  const [isUserTalking, setIsUserTalking] = useState(false);
+
+  async function fetchAccessToken() {
+    try {
+      const response = await fetch("/api/get-access-token", {
+        method: "POST",
+      });
+      const token = await response.text();
+
+      console.log("Access Token:", token); // Log the token to verify
+
+      return token;
+    } catch (error) {
+      console.error("Error fetching access token:", error);
+    }
+
+    return "";
+  }
+
+  async function startSession() {
+    setIsLoadingSession(true);
+    const newToken = await fetchAccessToken();
+
+    avatar.current = new StreamingAvatar({
+      token: newToken,
+    });
+    avatar.current.on(StreamingEvents.AVATAR_START_TALKING, (e) => {
+      console.log("Avatar started talking", e);
+    });
+    avatar.current.on(StreamingEvents.AVATAR_STOP_TALKING, (e) => {
+      console.log("Avatar stopped talking", e);
+    });
+    avatar.current.on(StreamingEvents.STREAM_DISCONNECTED, () => {
+      console.log("Stream disconnected");
+      endSession();
+    });
+    avatar.current?.on(StreamingEvents.STREAM_READY, (event) => {
+      console.log(">>>>> Stream ready:", event.detail);
+      setStream(event.detail);
+    });
+    avatar.current?.on(StreamingEvents.USER_START, (event) => {
+      console.log(">>>>> User started talking:", event);
+      setIsUserTalking(true);
+    });
+    avatar.current?.on(StreamingEvents.USER_STOP, (event) => {
+      console.log(">>>>> User stopped talking:", event);
+      setIsUserTalking(false);
+    });
+    try {
+      const res = await avatar.current.createStartAvatar({
+        quality: AvatarQuality.Low,
+        avatarName: avatarId,
+        knowledgeId: knowledgeId, // Or use a custom `knowledgeBase`.
+        voice: {
+          rate: 1.5, // 0.5 ~ 1.5
+          emotion: VoiceEmotion.EXCITED,
+        },
+        language: language,
+      });
+
+      setData(res);
+      // default to voice mode
+      await avatar.current?.startVoiceChat();
+      setChatMode("voice_mode");
+    } catch (error) {
+      console.error("Error starting avatar session:", error);
+    } finally {
+      setIsLoadingSession(false);
+    }
+  }
+  async function handleSpeak() {
+    setIsLoadingRepeat(true);
+    if (!avatar.current) {
+      setDebug("Avatar API not initialized");
+
+      return;
+    }
+    // speak({ text: text, task_type: TaskType.REPEAT })
+    await avatar.current.speak({ text: text, taskType: TaskType.REPEAT, taskMode: TaskMode.SYNC }).catch((e) => {
+      setDebug(e.message);
+    });
+    setIsLoadingRepeat(false);
+  }
+  async function handleInterrupt() {
+    if (!avatar.current) {
+      setDebug("Avatar API not initialized");
+
+      return;
+    }
+    await avatar.current
+      .interrupt()
+      .catch((e) => {
+        setDebug(e.message);
+      });
+  }
+  async function endSession() {
+    await avatar.current?.stopAvatar();
+    setStream(undefined);
+  }
+
+  const handleChangeChatMode = useMemoizedFn(async (v) => {
+    if (v === chatMode) {
+      return;
+    }
+    if (v === "text_mode") {
+      avatar.current?.closeVoiceChat();
+    } else {
+      await avatar.current?.startVoiceChat();
+    }
+    setChatMode(v);
+  });
+
+  const previousText = usePrevious(text);
+  useEffect(() => {
+    if (!previousText && text) {
+      avatar.current?.startListening();
+    } else if (previousText && !text) {
+      avatar?.current?.stopListening();
+    }
+  }, [text, previousText]);
+
+  useEffect(() => {
+    return () => {
+      endSession();
+    };
+  }, []);
+
+  useEffect(() => {
+    if (stream && mediaStream.current) {
+      mediaStream.current.srcObject = stream;
+      mediaStream.current.onloadedmetadata = () => {
+        mediaStream.current!.play();
+        setDebug("Playing");
+      };
+    }
+  }, [mediaStream, stream]);
+
+  return (
+    <div className="w-full flex flex-col gap-4">
+      <Card>
+        <CardBody className="h-[500px] flex flex-col justify-center items-center">
+          {stream ? (
+            <div className="h-[500px] w-[900px] justify-center items-center flex rounded-lg overflow-hidden">
+              <video
+                ref={mediaStream}
+                autoPlay
+                playsInline
+                style={{
+                  width: "100%",
+                  height: "100%",
+                  objectFit: "contain",
+                }}
+              >
+                <track kind="captions" />
+              </video>
+              <div className="flex flex-col gap-2 absolute bottom-3 right-3">
+                <Button
+                  className="bg-gradient-to-tr from-indigo-500 to-indigo-300 text-white rounded-lg"
+                  size="md"
+                  variant="shadow"
+                  onClick={handleInterrupt}
+                >
+                  Interrupt task
+                </Button>
+                <Button
+                  className="bg-gradient-to-tr from-indigo-500 to-indigo-300  text-white rounded-lg"
+                  size="md"
+                  variant="shadow"
+                  onClick={endSession}
+                >
+                  End session
+                </Button>
+              </div>
+            </div>
+          ) : !isLoadingSession ? (
+            <div className="h-full justify-center items-center flex flex-col gap-8 w-[500px] self-center">
+              <div className="flex flex-col gap-2 w-full">
+                <p className="text-sm font-medium leading-none">
+                  Custom Knowledge ID (optional)
+                </p>
+                <Input
+                  placeholder="Enter a custom knowledge ID"
+                  value={knowledgeId}
+                  onChange={(e) => setKnowledgeId(e.target.value)}
+                />
+                <p className="text-sm font-medium leading-none">
+                  Custom Avatar ID (optional)
+                </p>
+                <Input
+                  placeholder="Enter a custom avatar ID"
+                  value={avatarId}
+                  onChange={(e) => setAvatarId(e.target.value)}
+                />
+                <Select
+                  placeholder="Or select one from these example avatars"
+                  size="md"
+                  onChange={(e) => {
+                    setAvatarId(e.target.value);
+                  }}
+                >
+                  {AVATARS.map((avatar) => (
+                    <SelectItem
+                      key={avatar.avatar_id}
+                      textValue={avatar.avatar_id}
+                    >
+                      {avatar.name}
+                    </SelectItem>
+                  ))}
+                </Select>
+                <Select
+                  label="Select language"
+                  placeholder="Select language"
+                  className="max-w-xs"
+                  selectedKeys={[language]}
+                  onChange={(e) => {
+                    setLanguage(e.target.value);
+                  }}
+                >
+                  {STT_LANGUAGE_LIST.map((lang) => (
+                    <SelectItem key={lang.key}>
+                      {lang.label}
+                    </SelectItem>
+                  ))}
+                </Select>
+              </div>
+              <Button
+                className="bg-gradient-to-tr from-indigo-500 to-indigo-300 w-full text-white"
+                size="md"
+                variant="shadow"
+                onClick={startSession}
+              >
+                Start session
+              </Button>
+            </div>
+          ) : (
+            <Spinner color="default" size="lg" />
+          )}
+        </CardBody>
+        <Divider />
+        <CardFooter className="flex flex-col gap-3 relative">
+          <Tabs
+            aria-label="Options"
+            selectedKey={chatMode}
+            onSelectionChange={(v) => {
+              handleChangeChatMode(v);
+            }}
+          >
+            <Tab key="text_mode" title="Text mode" />
+            <Tab key="voice_mode" title="Voice mode" />
+          </Tabs>
+          {chatMode === "text_mode" ? (
+            <div className="w-full flex relative">
+              <InteractiveAvatarTextInput
+                disabled={!stream}
+                input={text}
+                label="Chat"
+                loading={isLoadingRepeat}
+                placeholder="Type something for the avatar to respond"
+                setInput={setText}
+                onSubmit={handleSpeak}
+              />
+              {text && (
+                <Chip className="absolute right-16 top-3">Listening</Chip>
+              )}
+            </div>
+          ) : (
+            <div className="w-full text-center">
+              <Button
+                isDisabled={!isUserTalking}
+                className="bg-gradient-to-tr from-indigo-500 to-indigo-300 text-white"
+                size="md"
+                variant="shadow"
+              >
+                {isUserTalking ? "Listening" : "Voice chat"}
+              </Button>
+            </div>
+          )}
+        </CardFooter>
+      </Card>
+      <p className="font-mono text-right">
+        <span className="font-bold">Console:</span>
+        <br />
+        {debug}
+      </p>
+    </div>
+  );
+}
--- a/components/InteractiveAvatarCode.tsx
+++ b/components/InteractiveAvatarCode.tsx
@@ -2,7 +2,7 @@ import { Card, CardBody } from "@nextui-org/react";
 import { langs } from "@uiw/codemirror-extensions-langs";
 import ReactCodeMirror from "@uiw/react-codemirror";

-export default function StreamingAvatarCode() {
+export default function InteractiveAvatarCode() {
  return (
    <div className="w-full flex flex-col gap-2">
      <p>This SDK supports the following behavior:</p>
@@ -10,13 +10,13 @@ export default function StreamingAvatarCode() {
        <li>
          <div className="flex flex-row gap-2">
            <p className="text-indigo-400 font-semibold">Start:</p> Start the
-            streaming avatar session
+            Interactive Avatar session
          </div>
        </li>
        <li>
          <div className="flex flex-row gap-2">
            <p className="text-indigo-400 font-semibold">Close:</p> Close the
-            streaming avatar session
+            Interactive Avatar session
          </div>
        </li>
        <li>
@@ -47,15 +47,15 @@ const TEXT = `
    const [stream, setStream] = useState<MediaStream> ();
    const mediaStream = useRef<HTMLVideoElement>(null);
    
-    // Instantiate the streaming avatar api using your access token
+    // Instantiate the Interactive Avatar api using your access token
    const avatar = useRef(new StreamingAvatarApi(
        new Configuration({accessToken: '<REPLACE_WITH_ACCESS_TOKEN>'})
        ));

-    // State holding streaming avatar session data
+    // State holding Interactive Avatar session data
    const [sessionData, setSessionData] = useState<NewSessionData>();
    
-    // Function to start the streaming avatar session
+    // Function to start the Interactive Avatar session
    async function start(){
      const res = await avatar.current.createStartAvatar(
      { newSessionRequest: 
@@ -70,7 +70,7 @@ const TEXT = `
      setSessionData(res);
    }
    
-    // Function to stop the streaming avatar session
+    // Function to stop the Interactive Avatar session
    async function stop(){
      await avatar.current.stopAvatar({stopSessionRequest: {sessionId: sessionData?.sessionId}});
    }
@@ -82,7 +82,7 @@ const TEXT = `
    }

    useEffect(()=>{
-      // Handles the display of the streaming avatar
+      // Handles the display of the Interactive Avatar
      if(stream && mediaStream.current){
        mediaStream.current.srcObject = stream;
        mediaStream.current.onloadedmetadata = () => {
@@ -95,4 +95,5 @@ const TEXT = `
      <div className="w-full">
        <video playsInline autoPlay width={500} ref={mediaStream}/>
      </div> 
+    )
  }`;
--- a/components/InteractiveAvatarTextInput.tsx
+++ b/components/InteractiveAvatarTextInput.tsx
@@ -13,7 +13,7 @@ interface StreamingAvatarTextInputProps {
  loading?: boolean;
 }

-export default function StreamingAvatarTextInput({
+export default function InteractiveAvatarTextInput({
  label,
  placeholder,
  input,
--- a/components/NavBar.tsx
+++ b/components/NavBar.tsx
@@ -19,7 +19,7 @@ export default function NavBar() {
        </Link>
        <div className="bg-gradient-to-br from-sky-300 to-indigo-500 bg-clip-text ml-4">
          <p className="text-xl font-semibold text-transparent">
-            HeyGen Streaming Avatar SDK NextJS Demo
+            HeyGen Interactive Avatar SDK NextJS Demo
          </p>
        </div>
      </NavbarBrand>
@@ -28,7 +28,7 @@ export default function NavBar() {
          <Link
            isExternal
            color="foreground"
-            href="https://app.heygen.com/streaming-avatar"
+            href="https://app.heygen.com/interactive-avatar"
          >
            Avatars
          </Link>
@@ -49,7 +49,7 @@ export default function NavBar() {
          <Link
            isExternal
            color="foreground"
-            href="https://help.heygen.com/en/articles/9182113-streaming-avatar-101-your-ultimate-guide"
+            href="https://help.heygen.com/en/articles/9182113-interactive-avatar-101-your-ultimate-guide"
          >
            Guide
          </Link>
--- a/components/StreamingAvatar.tsx
+++ b/components/StreamingAvatar.tsx
@@ -1,350 +0,0 @@
-import {
-  Configuration,
-  NewSessionData,
-  StreamingAvatarApi,
-} from "@heygen/streaming-avatar";
-import {
-  Button,
-  Card,
-  CardBody,
-  CardFooter,
-  Divider,
-  Input,
-  Spinner,
-  Tooltip,
-} from "@nextui-org/react";
-import { Microphone, MicrophoneStage } from "@phosphor-icons/react";
-import { useChat } from "ai/react";
-import clsx from "clsx";
-import OpenAI from "openai";
-import { useEffect, useRef, useState } from "react";
-import StreamingAvatarTextInput from "./StreamingAvatarTextInput";
-
-const openai = new OpenAI({
-  apiKey: process.env.NEXT_PUBLIC_OPENAI_API_KEY,
-  dangerouslyAllowBrowser: true,
-});
-
-export default function StreamingAvatar() {
-  const [isLoadingSession, setIsLoadingSession] = useState(false);
-  const [isLoadingRepeat, setIsLoadingRepeat] = useState(false);
-  const [isLoadingChat, setIsLoadingChat] = useState(false);
-  const [stream, setStream] = useState<MediaStream>();
-  const [debug, setDebug] = useState<string>();
-  const [avatarId, setAvatarId] = useState<string>("");
-  const [voiceId, setVoiceId] = useState<string>("");
-  const [data, setData] = useState<NewSessionData>();
-  const [text, setText] = useState<string>("");
-  const [initialized, setInitialized] = useState(false); // Track initialization
-  const [recording, setRecording] = useState(false); // Track recording state
-  const mediaStream = useRef<HTMLVideoElement>(null);
-  const avatar = useRef<StreamingAvatarApi | null>(null);
-  const mediaRecorder = useRef<MediaRecorder | null>(null);
-  const audioChunks = useRef<Blob[]>([]);
-  const { input, setInput, handleSubmit } = useChat({
-    onFinish: async (message) => {
-      console.log("ChatGPT Response:", message);
-
-      if (!initialized || !avatar.current) {
-        setDebug("Avatar API not initialized");
-        return;
-      }
-
-      //send the ChatGPT response to the Streaming Avatar
-      await avatar.current
-        .speak({
-          taskRequest: { text: message.content, sessionId: data?.sessionId },
-        })
-        .catch((e) => {
-          setDebug(e.message);
-        });
-      setIsLoadingChat(false);
-    },
-    initialMessages: [
-      {
-        id: "1",
-        role: "system",
-        content: "You are a helpful assistant.",
-      },
-    ],
-  });
-
-  async function fetchAccessToken() {
-    try {
-      const response = await fetch("/api/get-access-token", {
-        method: "POST",
-      });
-      const token = await response.text();
-      console.log("Access Token:", token); // Log the token to verify
-      return token;
-    } catch (error) {
-      console.error("Error fetching access token:", error);
-      return "";
-    }
-  }
-
-  async function startSession() {
-    setIsLoadingSession(true);
-    await updateToken();
-    if (!avatar.current) {
-      setDebug("Avatar API is not initialized");
-      return;
-    }
-    try {
-      const res = await avatar.current.createStartAvatar(
-        {
-          newSessionRequest: {
-            quality: "low",
-            avatarName: avatarId,
-            voice: { voiceId: voiceId },
-          },
-        },
-        setDebug
-      );
-      setData(res);
-      setStream(avatar.current.mediaStream);
-      setIsLoadingSession(false);
-    } catch (error) {
-      console.error("Error starting avatar session:", error);
-    }
-  }
-
-  async function updateToken() {
-    const newToken = await fetchAccessToken();
-    console.log("Updating Access Token:", newToken); // Log token for debugging
-    avatar.current = new StreamingAvatarApi(
-      new Configuration({ accessToken: newToken })
-    );
-
-    const startTalkCallback = (e: any) => {
-      console.log("Avatar started talking", e);
-    };
-
-    const stopTalkCallback = (e: any) => {
-      console.log("Avatar stopped talking", e);
-    };
-
-    console.log("Adding event handlers:", avatar.current);
-    avatar.current.addEventHandler("avatar_start_talking", startTalkCallback);
-    avatar.current.addEventHandler("avatar_stop_talking", stopTalkCallback);
-
-    setInitialized(true);
-  }
-
-  async function endSession() {
-    if (!initialized || !avatar.current) {
-      setDebug("Avatar API not initialized");
-      return;
-    }
-    await avatar.current.stopAvatar(
-      { stopSessionRequest: { sessionId: data?.sessionId } },
-      setDebug
-    );
-    setStream(undefined);
-  }
-
-  async function handleSpeak() {
-    setIsLoadingRepeat(true);
-    if (!initialized || !avatar.current) {
-      setDebug("Avatar API not initialized");
-      return;
-    }
-    await avatar.current
-      .speak({ taskRequest: { text: text, sessionId: data?.sessionId } })
-      .catch((e) => {
-        setDebug(e.message);
-      });
-    setIsLoadingRepeat(false);
-  }
-
-  useEffect(() => {
-    async function init() {
-      const newToken = await fetchAccessToken();
-      console.log("Initializing with Access Token:", newToken); // Log token for debugging
-      avatar.current = new StreamingAvatarApi(
-        new Configuration({ accessToken: newToken, jitterBuffer: 200 })
-      );
-      setInitialized(true); // Set initialized to true
-    }
-    init();
-
-    return () => {
-      endSession();
-    };
-  }, []);
-
-  useEffect(() => {
-    if (stream && mediaStream.current) {
-      mediaStream.current.srcObject = stream;
-      mediaStream.current.onloadedmetadata = () => {
-        mediaStream.current!.play();
-        setDebug("Playing");
-      };
-    }
-  }, [mediaStream, stream]);
-
-  function startRecording() {
-    navigator.mediaDevices
-      .getUserMedia({ audio: true })
-      .then((stream) => {
-        mediaRecorder.current = new MediaRecorder(stream);
-        mediaRecorder.current.ondataavailable = (event) => {
-          audioChunks.current.push(event.data);
-        };
-        mediaRecorder.current.onstop = () => {
-          const audioBlob = new Blob(audioChunks.current, {
-            type: "audio/wav",
-          });
-          audioChunks.current = [];
-          transcribeAudio(audioBlob);
-        };
-        mediaRecorder.current.start();
-        setRecording(true);
-      })
-      .catch((error) => {
-        console.error("Error accessing microphone:", error);
-      });
-  }
-
-  function stopRecording() {
-    if (mediaRecorder.current) {
-      mediaRecorder.current.stop();
-      setRecording(false);
-    }
-  }
-
-  async function transcribeAudio(audioBlob: Blob) {
-    try {
-      // Convert Blob to File
-      const audioFile = new File([audioBlob], "recording.wav", {
-        type: "audio/wav",
-      });
-      const response = await openai.audio.transcriptions.create({
-        model: "whisper-1",
-        file: audioFile,
-      });
-      const transcription = response.text;
-      console.log("Transcription: ", transcription);
-      setInput(transcription);
-    } catch (error) {
-      console.error("Error transcribing audio:", error);
-    }
-  }
-
-  return (
-    <div className="w-full flex flex-col gap-4">
-      <Card>
-        <CardBody className="h-[500px] flex flex-col justify-center items-center">
-          {stream ? (
-            <div className="h-[500px] w-[900px] justify-center items-center flex rounded-lg overflow-hidden">
-              <video
-                ref={mediaStream}
-                autoPlay
-                playsInline
-                style={{
-                  width: "100%",
-                  height: "100%",
-                  objectFit: "contain",
-                }}
-              >
-                <track kind="captions" />
-              </video>
-              <Button
-                size="md"
-                onClick={endSession}
-                className="bg-gradient-to-tr from-indigo-500 to-indigo-300 absolute bottom-3 right-3 text-white rounded-lg"
-                variant="shadow"
-              >
-                End session
-              </Button>
-            </div>
-          ) : !isLoadingSession ? (
-            <div className="h-full justify-center items-center flex flex-col gap-4 w-96 self-center">
-              <Input
-                value={avatarId}
-                onChange={(e) => setAvatarId(e.target.value)}
-                placeholder="Custom Avatar ID (optional)"
-              />
-              <Input
-                value={voiceId}
-                onChange={(e) => setVoiceId(e.target.value)}
-                placeholder="Custom Voice ID (optional)"
-              />
-              <Button
-                size="md"
-                onClick={startSession}
-                className="bg-gradient-to-tr from-indigo-500 to-indigo-300 w-full text-white"
-                variant="shadow"
-              >
-                Start session
-              </Button>
-            </div>
-          ) : (
-            <Spinner size="lg" color="default" />
-          )}
-        </CardBody>
-        <Divider />
-        <CardFooter className="flex flex-col gap-3">
-          <StreamingAvatarTextInput
-            label="Repeat"
-            placeholder="Type something for the avatar to repeat"
-            input={text}
-            onSubmit={handleSpeak}
-            setInput={setText}
-            disabled={!stream}
-            loading={isLoadingRepeat}
-          />
-          <StreamingAvatarTextInput
-            label="Chat"
-            placeholder="Chat with the avatar (uses ChatGPT)"
-            input={input}
-            onSubmit={() => {
-              setIsLoadingChat(true);
-              if (!input) {
-                setDebug("Please enter text to send to ChatGPT");
-                return;
-              }
-              handleSubmit();
-            }}
-            setInput={setInput}
-            loading={isLoadingChat}
-            endContent={
-              <Tooltip
-                content={!recording ? "Start recording" : "Stop recording"}
-              >
-                <Button
-                  onClick={!recording ? startRecording : stopRecording}
-                  isDisabled={!stream}
-                  isIconOnly
-                  className={clsx(
-                    "mr-4 text-white",
-                    !recording
-                      ? "bg-gradient-to-tr from-indigo-500 to-indigo-300"
-                      : ""
-                  )}
-                  size="sm"
-                  variant="shadow"
-                >
-                  {!recording ? (
-                    <Microphone size={20} />
-                  ) : (
-                    <>
-                      <div className="absolute h-full w-full bg-gradient-to-tr from-indigo-500 to-indigo-300 animate-pulse -z-10"></div>
-                      <MicrophoneStage size={20} />
-                    </>
-                  )}
-                </Button>
-              </Tooltip>
-            }
-            disabled={!stream}
-          />
-        </CardFooter>
-      </Card>
-      <p className="font-mono text-right">
-        <span className="font-bold">Console:</span>
-        <br />
-        {debug}
-      </p>
-    </div>
-  );
-}
--- a/package.json
+++ b/package.json
@@ -3,15 +3,16 @@
  "version": "0.0.1",
  "private": true,
  "scripts": {
-    "dev": "next dev --turbo",
+    "dev": "node_modules/next/dist/bin/next dev",
    "build": "next build",
    "start": "next start",
    "lint": "eslint . --ext .ts,.tsx -c .eslintrc.json --fix"
  },
  "dependencies": {
    "@ai-sdk/openai": "^0.0.34",
-    "@heygen/streaming-avatar": "^1.0.10",
+    "@heygen/streaming-avatar": "^2.0.7",
    "@nextui-org/button": "2.0.34",
+    "@nextui-org/chip": "^2.0.32",
    "@nextui-org/code": "2.0.29",
    "@nextui-org/input": "2.2.2",
    "@nextui-org/kbd": "2.0.30",
@@ -28,6 +29,7 @@
    "@react-aria/visually-hidden": "3.8.12",
    "@uiw/codemirror-extensions-langs": "^4.22.1",
    "@uiw/react-codemirror": "^4.22.1",
+    "ahooks": "^3.8.1",
    "ai": "^3.2.15",
    "clsx": "2.1.1",
    "framer-motion": "~11.1.1",
Author	SHA1	Message	Date
raojianb	9cce0600e5	feat: task mode	2024-10-22 16:58:16 -07:00
Joby	274a307e83	chore: update sdk version (#26 )	2024-09-30 18:24:59 -07:00
Joby	03ef24b031	chore: update sdk version (#24 )	2024-09-27 15:39:13 -07:00
Joby	21f6c6d468	feat: support knwoledge base (#23 ) * feat: support knwoledge base * feat: support knwoledge base	2024-09-25 19:53:26 -07:00
Joby	d7a7e3174c	feat: update dependencies (#20 )	2024-09-23 16:14:07 -07:00
Joby	e653fa74c4	feat: add language and voice rate (#19 )	2024-09-23 15:55:33 -07:00
Joby	5dd784d63e	feat: add task type (#18 )	2024-09-23 13:38:15 -07:00
Joby	efb98f612b	feat: simplify api (#14 )	2024-09-22 01:54:14 -07:00
Joby	befb6228f5	feat: voice chat demo (#13 )	2024-09-20 21:38:26 -07:00
James Zow	2454a4729d	Update README.md (#8 )	2024-09-06 21:00:48 -07:00
Joby	935b10279b	Feat/livekit (#9 ) * feat: using version 2.0 skd * feat: using version 2.0 skd * feat: using version 2.0 skd	2024-09-06 20:59:55 -07:00
jobyrao-heygen	052c2b3ad1	Merge pull request #6 from HeyGen-Official/fix/text-const-error fix: text constant error	2024-07-25 16:26:44 -07:00
raojianb	b0a98ea95e	fix: text constant error	2024-07-25 16:18:43 -07:00
annie	ab85e604ef	interactive avatar name change	2024-07-22 11:45:31 -07:00
annie	5d0cf3821c	add avatar/voice examples, session start error handling	2024-07-01 12:13:56 -07:00
Joby	47522ddc97	fix: interrupt copy update (#2 )	2024-07-01 11:17:47 +08:00
Joby	03aa74fb3b	feat: interrupt function (#1 )	2024-06-30 05:44:24 +08:00