Skip to content

Commit 1d5c49a

Browse files
AssemblyAISoheyl
authored andcommitted
Project import generated by Copybara.
GitOrigin-RevId: ea71d0332ba79dc276e93de36daf33ebb6066493
1 parent b9d9e38 commit 1d5c49a

File tree

8 files changed

+381
-48
lines changed

8 files changed

+381
-48
lines changed

README.md

Lines changed: 53 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ With a single API call, get access to AI models built on the latest AI breakthro
2323
- [Example](#examples)
2424
- [Core Examples](#core-examples)
2525
- [LeMUR Examples](#lemur-examples)
26-
- [Audio Intelligence+ Examples](#audio-intelligence-examples)
26+
- [Audio Intelligence Examples](#audio-intelligence-examples)
2727
- [Playgrounds](#playgrounds)
2828
- [Advanced](#advanced-todo)
2929

@@ -159,35 +159,6 @@ print(transcript.text)
159159

160160
</details>
161161

162-
<details>
163-
<summary>Summarize the content of a transcript</summary>
164-
165-
```python
166-
import assemblyai as aai
167-
168-
transcriber = aai.Transcriber()
169-
transcript = transcriber.transcribe(
170-
"https://example.org/audio.mp3",
171-
config=aai.TranscriptionConfig(summarize=True)
172-
)
173-
174-
print(transcript.summary)
175-
```
176-
177-
By default, the summarization model will be `informative` and the summarization type will be `bullets`. [Read more about summarization models and types here](https://www.assemblyai.com/docs/Models/summarization#types-and-models).
178-
179-
To change the model and/or type, pass additional parameters to the `TranscriptionConfig`:
180-
181-
```python
182-
config=aai.TranscriptionConfig(
183-
summarize=True,
184-
summary_model=aai.SummarizationModel.catchy,
185-
summary_type=aai.Summarizationtype.headline
186-
)
187-
```
188-
189-
</details>
190-
191162
---
192163

193164
### **LeMUR Examples**
@@ -260,7 +231,7 @@ for result in result:
260231

261232
---
262233

263-
### **Audio Intelligence+ Examples**
234+
### **Audio Intelligence Examples**
264235

265236
<details>
266237
<summary>PII Redact a Transcript</summary>
@@ -286,6 +257,57 @@ transcriber = aai.Transcriber()
286257
transcript = transcriber.transcribe("https://example.org/audio.mp3", config)
287258
```
288259

260+
</details>
261+
<details>
262+
<summary>Summarize the content of a transcript over time</summary>
263+
264+
```python
265+
import assemblyai as aai
266+
267+
transcriber = aai.Transcriber()
268+
transcript = transcriber.transcribe(
269+
"https://example.org/audio.mp3",
270+
config=aai.TranscriptionConfig(auto_chapters=True)
271+
)
272+
273+
for chapter in transcript.chapters:
274+
print(f"Summary: {chapter.summary}") # A one paragraph summary of the content spoken during this timeframe
275+
print(f"Start: {chapter.start}, End: {chapter.end}") # Timestamps (in milliseconds) of the chapter
276+
print(f"Healine: {chapter.headline}") # A single sentence summary of the content spoken during this timeframe
277+
print(f"Gist: {chapter.gist}") # An ultra-short summary, just a few words, of the content spoken during this timeframe
278+
```
279+
280+
[Read more about auto chapters here.](https://www.assemblyai.com/docs/Models/auto_chapters)
281+
282+
</details>
283+
284+
<details>
285+
<summary>Summarize the content of a transcript</summary>
286+
287+
```python
288+
import assemblyai as aai
289+
290+
transcriber = aai.Transcriber()
291+
transcript = transcriber.transcribe(
292+
"https://example.org/audio.mp3",
293+
config=aai.TranscriptionConfig(summarization=True)
294+
)
295+
296+
print(transcript.summary)
297+
```
298+
299+
By default, the summarization model will be `informative` and the summarization type will be `bullets`. [Read more about summarization models and types here](https://www.assemblyai.com/docs/Models/summarization#types-and-models).
300+
301+
To change the model and/or type, pass additional parameters to the `TranscriptionConfig`:
302+
303+
```python
304+
config=aai.TranscriptionConfig(
305+
summarization=True,
306+
summary_model=aai.SummarizationModel.catchy,
307+
summary_type=aai.SummarizationType.headline
308+
)
309+
```
310+
289311
</details>
290312

291313
---
@@ -297,7 +319,6 @@ Visit one of our Playgrounds:
297319
- [LeMUR Playground](https://www.assemblyai.com/playground/v2/source)
298320
- [Transcription Playground](https://www.assemblyai.com/playground)
299321

300-
301322
# Advanced
302323

303324
## How the SDK handles Default Configurations
@@ -329,7 +350,6 @@ transcriber = aai.Transcriber()
329350
transcriber.config = aai.TranscriptionConfig(punctuate=False, format_text=False)
330351
```
331352

332-
333353
In case you want to override the `Transcriber`'s configuration for a specific operation with a different one, you can do so via the `config` parameter of a `.transcribe*(...)` method:
334354

335355
```python

assemblyai/transcriber.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -210,6 +210,10 @@ def summary(self) -> Optional[str]:
210210

211211
return self._impl.transcript.summary
212212

213+
@property
214+
def chapters(self) -> Optional[List[types.Chapter]]:
215+
return self._impl.transcript.chapters
216+
213217
@property
214218
def status(self) -> types.TranscriptStatus:
215219
"The current status of the transcript"

assemblyai/types.py

Lines changed: 21 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -354,8 +354,8 @@ class RawTranscriptionConfig(BaseModel):
354354
# sentiment_analysis: bool = False
355355
# "Enable Sentiment Analysis."
356356

357-
# auto_chapters: bool = False
358-
# "Enable Auto Chapters."
357+
auto_chapters: Optional[bool]
358+
"Enable Auto Chapters."
359359

360360
# entity_detection: bool = False
361361
# "Enable Entity Detection."
@@ -415,7 +415,7 @@ def __init__(
415415
custom_spelling: Optional[Dict[str, Union[str, Sequence[str]]]] = None,
416416
disfluencies: Optional[bool] = None,
417417
# sentiment_analysis: bool = False,
418-
# auto_chapters: bool = False,
418+
auto_chapters: Optional[bool] = None,
419419
# entity_detection: bool = False,
420420
summarization: Optional[bool] = None,
421421
summary_model: Optional[SummarizationModel] = None,
@@ -491,7 +491,7 @@ def __init__(
491491
self.set_custom_spelling(custom_spelling, override=True)
492492
self.disfluencies = disfluencies
493493
# self.sentiment_analysis = sentiment_analysis
494-
# self.auto_chapters = auto_chapters
494+
self.auto_chapters = auto_chapters
495495
# self.entity_detection = entity_detection
496496
self.set_summarize(
497497
summarization,
@@ -707,17 +707,23 @@ def disfluencies(self, enable: Optional[bool]) -> None:
707707

708708
# self._raw_transcription_config.sentiment_analysis = enable
709709

710-
# @property
711-
# def auto_chapters(self) -> bool:
712-
# "Returns the status of the Auto Chapters feature."
710+
@property
711+
def auto_chapters(self) -> bool:
712+
"Returns the status of the Auto Chapters feature."
713+
714+
return self._raw_transcription_config.auto_chapters
713715

714-
# return self._raw_transcription_config.auto_chapters
716+
@auto_chapters.setter
717+
def auto_chapters(self, enable: bool) -> None:
718+
"Enable Auto Chapters."
715719

716-
# @auto_chapters.setter
717-
# def auto_chapters(self, enable: bool) -> None:
718-
# "Enable Auto Chapters."
720+
# Validate required params are also set
721+
if self.punctuate == False:
722+
raise ValueError(
723+
"If `auto_chapters` is enabled, then `punctuate` must not be disabled"
724+
)
719725

720-
# self._raw_transcription_config.auto_chapters = enable
726+
self._raw_transcription_config.auto_chapters = enable
721727

722728
# @property
723729
# def entity_detection(self) -> bool:
@@ -1317,8 +1323,8 @@ class BaseTranscript(BaseModel):
13171323
# sentiment_analysis: bool = False
13181324
# "Enable Sentiment Analysis."
13191325

1320-
# auto_chapters: bool = False
1321-
# "Enable Auto Chapters."
1326+
auto_chapters: Optional[bool]
1327+
"Enable Auto Chapters."
13221328

13231329
# entity_detection: bool = False
13241330
# "Enable Entity Detection."
@@ -1401,7 +1407,7 @@ class TranscriptResponse(BaseTranscript):
14011407
# iab_categories_result: Optional[IABResponse] = None
14021408
# "The list of results when Topic Detection is enabled"
14031409

1404-
# chapters: Optional[List[Chapter]] = None
1410+
chapters: Optional[List[Chapter]]
14051411
# "When Auto Chapters is enabled, the list of Auto Chapters results"
14061412

14071413
# sentiment_analysis_results: Optional[List[Sentiment]] = None

tests/e2e/__init__.py

Whitespace-only changes.

tests/e2e/test_auto_chapters_e2e.py

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
import pytest
2+
3+
import assemblyai as aai
4+
5+
6+
def test_auto_chapters_disabled_by_default():
7+
"""
8+
Tests that excluding `auto_chapters` from the `TranscriptionConfig` will
9+
result in the default behavior of it being disabled
10+
"""
11+
transcript = aai.Transcriber().transcribe(
12+
data="https://assemblyai-test.s3.us-west-2.amazonaws.com/sdk/rogan-1min.mp3",
13+
config=aai.TranscriptionConfig(),
14+
)
15+
16+
assert transcript.status == aai.TranscriptStatus.completed
17+
assert transcript.error is None
18+
assert transcript.config.auto_chapters in (None, False)
19+
assert transcript.chapters is None
20+
21+
22+
def test_auto_chapters_enabled():
23+
"""
24+
Tests that including `auto_chapters=True` in the `TranscriptionConfig`
25+
will enable the auto_chapters feature with a meaningful response
26+
"""
27+
transcript = aai.Transcriber().transcribe(
28+
data="https://assemblyai-test.s3.us-west-2.amazonaws.com/sdk/rogan-1min.mp3",
29+
config=aai.TranscriptionConfig(auto_chapters=True),
30+
)
31+
32+
assert transcript.status == aai.TranscriptStatus.completed
33+
assert transcript.error is None
34+
35+
assert transcript.config.auto_chapters == True
36+
37+
assert transcript.chapters is not None
38+
assert isinstance(transcript.chapters, list)
39+
assert len(transcript.chapters) > 0
40+
41+
last_end_timestamp = 0
42+
for chapter in transcript.chapters:
43+
assert isinstance(chapter, aai.types.Chapter)
44+
assert len(chapter.summary.strip()) > 0
45+
assert len(chapter.headline.strip()) > 0
46+
assert len(chapter.gist.strip()) > 0
47+
48+
assert chapter.start >= last_end_timestamp
49+
assert chapter.end > chapter.start
50+
51+
last_end_timestamp = chapter.end
52+
53+
54+
def test_auto_chapters_failed():
55+
"""
56+
Test that failure to produce auto_chapters will result in an error that
57+
is properly wrapped by the `Transcript` object. In this case, the error
58+
is that the French language model does not support auto_chapters.
59+
"""
60+
transcript = aai.Transcriber().transcribe(
61+
data="https://assemblyai-test.s3.us-west-2.amazonaws.com/sdk/rogan-1min.mp3",
62+
config=aai.TranscriptionConfig(
63+
auto_chapters=True, language_code=aai.LanguageCode.fr
64+
),
65+
)
66+
67+
assert transcript.status == aai.TranscriptStatus.error
68+
assert transcript.error is not None
69+
assert "auto_chapters" in transcript.error

tests/e2e/test_summarization_e2e.py

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
import pytest
2+
3+
import assemblyai.developer_tools.python.sdk as aai
4+
5+
6+
@pytest.mark.summarization
7+
@pytest.mark.parametrize(
8+
"summary_model, summary_type",
9+
[
10+
(None, None), # default is ("informative", "bullets")
11+
(
12+
aai.SummarizationModel.conversational,
13+
aai.SummarizationType.paragraph,
14+
),
15+
(
16+
aai.SummarizationModel.conversational,
17+
aai.SummarizationType.headline,
18+
),
19+
(
20+
aai.SummarizationModel.conversational,
21+
aai.SummarizationType.bullets,
22+
),
23+
(
24+
aai.SummarizationModel.conversational,
25+
aai.SummarizationType.bullets_verbose,
26+
),
27+
(aai.SummarizationModel.catchy, aai.SummarizationType.headline),
28+
(aai.SummarizationModel.catchy, aai.SummarizationType.gist),
29+
(aai.SummarizationModel.informative, aai.SummarizationType.paragraph),
30+
(aai.SummarizationModel.informative, aai.SummarizationType.headline),
31+
(aai.SummarizationModel.informative, aai.SummarizationType.bullets),
32+
(
33+
aai.SummarizationModel.informative,
34+
aai.SummarizationType.bullets_verbose,
35+
),
36+
],
37+
)
38+
def test_summarization_e2e(
39+
summary_model: aai.SummarizationModel,
40+
summary_type: aai.SummarizationType,
41+
):
42+
"""Test all combinations of transcription with summarization."""
43+
config = aai.TranscriptionConfig(
44+
summarization=True, summary_model=summary_model, summary_type=summary_type
45+
)
46+
if summary_model == aai.SummarizationModel.conversational:
47+
config.set_speaker_diarization(True)
48+
49+
transcript = aai.Transcriber().transcribe(
50+
data="https://assemblyai-test.s3.us-west-2.amazonaws.com/e2e_tests/summarization/kelley.wav",
51+
config=config,
52+
)
53+
54+
# Assign defaults
55+
if not summary_type:
56+
summary_type = aai.SummarizationType.bullets
57+
if not summary_model:
58+
summary_model = aai.SummarizationModel.informative
59+
60+
# Check that summarization was enabled
61+
assert transcript.config.summarization
62+
63+
# Check that the response has a successful status
64+
assert transcript.status == aai.TranscriptStatus.completed
65+
66+
# Check that the summary model and type match the request
67+
assert transcript.config.summary_model == summary_model
68+
assert transcript.config.summary_type == summary_type
69+
70+
# Check that there is no error message
71+
assert not transcript.error
72+
73+
# Check that a summary exists on the transcript
74+
assert transcript.summary
75+
76+
# Check that bulleted summaries start with dashes
77+
if summary_type in [
78+
aai.SummarizationType.bullets,
79+
aai.SummarizationType.bullets_verbose,
80+
]:
81+
assert transcript.summary.startswith("- ")
82+
else:
83+
assert not transcript.summary.startswith("- ")

tests/unit/factories.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,17 @@ class Meta:
3737
words = factory.List([factory.SubFactory(UtteranceWordFactory)])
3838

3939

40+
class ChapterFactory(factory.Factory):
41+
class Meta:
42+
model = types.Chapter
43+
44+
summary = factory.Faker("sentence")
45+
headline = factory.Faker("sentence")
46+
gist = factory.Faker("sentence")
47+
start = factory.Faker("pyint")
48+
end = factory.Faker("pyint")
49+
50+
4051
class BaseTranscriptFactory(factory.Factory):
4152
class Meta:
4253
model = types.BaseTranscript

0 commit comments

Comments
 (0)