Manipulating textual data in sound file formats Conrad Parker Last modified: Thu Dec 19 2002 http://www.vergenet.net/~conrad/sounds/textual_data.txt Introduction ============ This information has been assembled in order to aid creators of sound data in applying copyright attribution and licensing information to sound files. In particular, this document is intended for creators of free sound files, and was assembled after a suggestion from Dan Mueth for the Gnome Sound Project. This document examines some popular sound file formats to see which have the ability to usefully store textual data. It also surveys some available free sound tools for the manipulation of these data. 1. Support for textual data in sound file formats ================================================= Much of this information has been extracted from the Audio File Formats FAQ [1]. Notes: 1. the following large constants [2] are used: USHRT_MAX = 65535 LONG_MAX = 2147483647 ULONG_MAX = 4294967295 2. a "null-terminated string" is a block of text which can be of any length. Typically this can be no longer than LONG_MAX characters due to programming conventions. 1.1 AIFF and AIFF-C ------------------- AIFF and AIIF-C [3] are chunk-based formats. Various types of chunk exist for holding textual information. The standard text chunks are for Name, Author, Copyright, and Annotation. These can each store up to LONG_MAX characters. A COMT chunk can store up to USHRT_MAX comments, each of which contains a timestamp and up to USHRT_MAX characters of text. 1.2 NeXT/Sun audio ------------------ An "optional text information" field is included as the last element of the header and can be of arbitrary size (between 4 and LONG_MAX-24 characters). 1.3 RIFF (WAV) -------------- RIFF audio files [4] can contain a chunk of "associated data", which can be one of: * a 'label' or 'note' chunk which holds only a null-terminated string * a 'text with data length' chunk which can contain a "Purpose", a "Country Code", "Language" and "Dialect" information, and the "CodePage" (which names a Windows character set mapping) to use. * an embedded file, of any other RIFF-compliant type, including text. 1.4 IFF/8SVX ------------ This is an IFF chunk-based format with support for an Annotation chunk. 1.5 Creative Voice (VOC) ------------------------ Has "data blocks" of various types, one of which is for text up to a length of 16K characters. 1.6 SampleVision ---------------- Can contain up to 60 characters of text. 1.7 MPEG Audio (MP3) -------------------- The MPEG Audio format is a streaming data format containing minimal header information per frame. (The common term MP3 refers to the storage of MPEG Audio Layer III data). No textual data is stored in an MPEG Audio stream. However, a conventional annotation format known as ID3 exists for MP3 data files. This allows for information including Songname, Artist, and Album, each up to 30 characters, a Year (4 characters), and a Comment of up to 28 characters to be stored. 1.8 Ogg Vorbis, Speex --------------------- Ogg Vorbis [5] is an audio compression format designed for "mid to high quality" recordings of audio and music. Speex [6] is a low bitrate voice codec. The normal file storage method for these codecs is an Ogg bitstream, for which both make use of "Vorbis comment"[7] packets. Vorbis comments are designed for "short, text comments, not arbitrary metadata"[7], and consist of name and value pairs. Although any comment names can be used, the specification proposes a minimal list of standard field names including TITLE, ARTIST, COPYRIGHT and LICENSE [7]. Up to (2^32 - 1) comments may be stored in a Vorbis comment packet. Each comment vector consists of a "name=value" field of maximum total length (2^32 - 1) bytes [7]. 2. Support for manipulating textual data in free sound file tools ================================================================= 2.1 Sox ------- Sox [8] is a commandline tool for playing, converting and applying effects to sound files. As of version 12.17 it does not allow the user to edit textual comments within a sound file, however it does contain code required for writing comments to various types of sound file. 2.2 Audio File Library ---------------------- libaudiofile [9] is an implementation of SGI's Audio File Library API [10]. The API provides a means for accessing "Miscellaneous Data Chunks", which can be of various types including copyright, author, name and annotation. As of version 0.1.10 libaudiofile supports miscelleneous data chunks for the AIFF and AIFF-C formats only. The 'sfinfo' tool which comes with libaudiofile is able to print out the Copyright chunk if present in a file. I am not aware of any other sound tools using libaudiofile which are able to manipulate Miscellaneous Data Chunks. 2.3 libsndfile -------------- libsndfile [11] is "a C library for reading and writing files containing sampled sound". As of version 1.0.3, the API does not provide any explicit way to view or modify textual comments in sound files, however file comments are stored internally during file reads and may be extracted using the SFC_GET_LOG_INFO command. The 'sndfile-info' tool which comes with libsndfile displays all logged information, including any textual comments found. 2.4 ID3 manipulation tools -------------------------- Many tools exist which can manipulate ID3 tags. A number of these are listed at Freshmeat.net [12]. Document History ================ Thu Oct 5 2000: first release Thu Dec 19 2002: corrected Ogg Vorbis information, added Speex, updated libsndfile information to version 1.0.3 References ========== [1] Audio File Formats FAQ, Section 11 "File Formats" v4.0, 14 Nov 1998, Chris Bagwell http://home.sprynet.com/~cbagwell/AudioFormats-11.html [2] ISO C Standard: 4.14/2.2.4.2 Limits of integral types See /usr/include/limits.h on most systems. Note that as we are talking about file formats, these limits are _not_ processor specific. Eg. shipped with GCC has a different definition for ULONG_MAX on Alpha (64-bit) processors which should be ignored when reading these file format specs. [3] AIFF-C specification, available at http://home.sprynet.com/~cbagwell/aiff-c.txt Apple Computer, Inc. 1991 [4] ftp://ftp.cwi.nl/pub/audio/RIFF-format, excerpted from "Multimedia Programming Interface and Data Specification v1.0" [5] Ogg Vorbis, http://xiph.org/ogg/vorbis/index.html Xiph.org Foundation [6] Speex, http://www.speex.org/ Jean-Marc Valin, Xiph.org Foundation [7] Ogg Vorbis I format specification: comment field and header specification, http://xiph.org/ogg/vorbis/doc/v-comment.html Christopher Montgomery (Monty), Xiph.org Foundation [8] SoX (Sound eXchange), http://home.sprynet.com/~cbagwell/sox.html Chris Bagwell [9] Audio File Library, http://www.68k.org/~michael/audiofile/ Michael Pruett [10] Audio File Library specification, available at http://ask.ii.uib.no/ebt-bin/nph-dweb/dynaweb/SGI_Developer/DMediaDev_PG/@Generic__BookTextView/11986 SGI [11] libsndfile, http://www.zip.com.au/~erikd/libsndfile/ Erik de Castro Lopo [12] Freshmeat.net, http://freshmeat.net/search/?q=id3