Production Assets: Data Management
The verification and back-up of data, including original camera files (OCF) and audio, throughout a workflow is crucial. Data loss can mean losing significant investment both in planning and shooting. Our requirements are designed to help minimize costly incidents during production.
There are multiple ways to manage your data backup and verification workflow. Below are critical requirements.
3 : 2 : 1 Back-Up Rule
- Hold at least 3 copies of all Original Camera Footage (OCF) and audio at all times.
- Store the copies on at least 2 different types of media.
- Maintain at least 1 backup offsite.
- 3 copies:
- Camera Media
- RAID 1, 5, 6 or 10
- 2 types of media:
- RAID 1, 5, 6 or 10
- 1 different location (geographic separation):
- One copy of media at the Production House and one copy of media at the Post House.
Please Note: RAID must be connected to computer by a hardware RAID controller. Software based RAID should not be used.
Acceptable Final Archival Formats
- LTO tapes (LTO6, LTO7, LTO8; written in LTFS format).
- Netflix's Content Hub (cloud storage).
- High-Speed RAID(1/5/6/10)-protected external hard drive. (Pending Netflix Approval)
Please Note: RAID 0 is only acceptable for temporary transfer/shuttle drives, and not for backup purposes.
- Camera footage must be visually inspected for possible recording or data transfer issues, referencing camera reports with scene/take information and script notes.
- Visual check is ideally done at real time playback, if possible.
- Visual QC should be performed at a minimum image resolution of 3840x2160 in order to best identify visual artifacts (e.g. dead pixels, moire, etc.)
- When editorial proxies created from the original camera footage are available, a visual verification is required to ensure that all scenes/takes are transferred and accounted for.
What are checksums?
A checksum is like a digital fingerprint of a file -- it is used to identify any type of file as if you were taking its fingerprint.
To generate a checksum, software reads the data from the file and pushes it through an algorithm. This process looks at the actual digital bits (1’s and 0’s) making up the file and spits out a value, typically referred to as a hash. Hashes are represented as a string of numbers and letters. For example, the MD5 hash of the word “checksum” is:
Checksums are incredibly valuable for detecting the slightest change in a file, which can occur at any point during transfer or storage. For example, if a movie file is corrupted during an upload from a hard drive to a server, the checksum of the copied file would differ from the checksum of the original movie file -- even if a single bit is off. This comparison is referred to as checksum verification.
Checksum Verification Overview
In filmmaking practice, checksums are used to ensure the original camera media is completely unaltered in its journey from the camera all the way to final delivery and archival. The “hero checksums” are initially generated for the footage directly recorded by the camera. Whenever the media is transferred, such as from a camera card to a hard drive, a verification process runs the same checksum algorithm on the copied files. The verification cross-checks the results with the hero checksums. If these all match, then then the copied media is considered to be successfully verified.
Netflix-Supported Checksum Types
Netflix currently supports a few checksum types for delivery. Different checksum algorithms, or hash functions, generate different types of checksums. A hash function reads in data of any size and calculates the hash based on the file’s data.
Currently, the Netflix delivery pipeline supports the following checksum types:
MD5 is a cryptographic hash, which essentially means it was designed to encrypt data for security purposes. This encryption method has been utilized as a checksum method due to the wide variety of hashes it can generate.
64 → 64-bit implementation of xxHash
BE → Big Endian (defines how the data in the checksum is stored at the byte level)
Compared to MD5, xxHash64BE is not cryptographic, and it is built for speed. On a 64-bit machine, xxHash64BE will compute over an order of magnitude faster than MD5, and over twice as fast as 32-bit xxHash. The encoding of the hash itself is also more compatible with existing tools in production environments.
Best Practices for Checksum Management
- Checksums are be generated and verified for all original camera footage (OCF). Additionally, checksum manifests accompany all original camera files from on-set to final delivery. Checksum manifests are text files that contain the filename and corresponding checksum of each file in a directory (e.g. MHL).
- The checksum verification method is the same as the method used to generate the original checksums on the original recording media. For example, if the original checksums were generated using MD5, they can only be verified with MD5 checksum validation.
- OCF is be verified with checksums every time a copy is made, up to and including the final delivery to Netflix.
- When offloading original camera footage from the physical recording media, checksum generation and verification may use MD5 or xxHash64BE verification algorithms. Any transfers of the original camera footage, all the way to and including final delivery, uses checksum verification.
- A checksum manifest, such as MHL, accompanies the files through all the transfers, including final delivery.
On Set Media Re-purpose
Original Camera Media may only be reformatted after both:
- Visual QC of all footage has been performed against camera reports/script notes, with all footage accounted for and signed-off on.
- OCFs have been copied and reside in a minimum of three storage mediums (LTO and/or RAID1, 5, 6 or 10).
On set RAID1, 5, 6 or 10 may only be reformatted after both:
- Visual QC of all footage has been accounted for per camera reports and script notes and signed off by editorial.
- OCFs have been copied and reside in a minimum of three storage mediums, including their Final Archival format. (e.g. LTO, Content-Hub or RAID(1, 5, 6 or 10)-protected external hard drive).(Pending Netflix Approval)
Example Data Workflow
There are multiple ways to achieve a compliant image verification workflow, so discussion with Netflix prior to shooting is recommended.
- Finish recording the footage.
- Original Camera Footage (OCF) is sent to a digital imaging technician (DIT).
- Generate checksums for the OCF on the original recording media (e.g. SxS card).
- Copy OCF from recording media to on-set RAID and shuttle drive concurrently. Both copies are run through checksum verification, either during the copy or after the copy has completed.
- Perform a visual verification of all the footage on the RAID.
- Transport the Shuttle Drive to the Lab via encrypted media or locked container.
- Ingest the Shuttle Drive to the lab's storage while verifying the checksums.
- Perform a visual verification of any editorial proxies rendered directly from the OCF, referencing camera reports and/or script notes.
- Final Archival Format:
- OPTION 1 - Create 2x LTO archives, geographically separated in secure locations such as the post facility and the production facility.
- OPTION 2 - Create 1x LTO archive and Upload OCF to Content Hub (Netflix Cloud Server).
- Once LTOcreation and visual verification are complete, notify all necessary teams that Recording Media can be released back to production.
- Once LTOs have been verified and Dailies have been checked for completeness and integrity by editorial, on-set RAID5 may have its storage space released for newer OCFs.
- Dropped LTO5 from list of LTO formats and included LTO8.
- Visual Verification: Update language to indicate that QC should be performed at a minimum image resolution of 3840x2160.
- Checksum Verification: New section introducing the reader to checksums.
- Checksum Verification: Explicitly stated the two checksum types Netflix currently supports.
- Checksum Verification: Improved wording of the "Best Practices" sub-section.
- Example Data Workflow: Renamed from "Sample Data Flow". Reformatted/reworded this section for clarity.
- Additional editorial edits to improve consistency of terminology throughout article.
- Verification: Reformatted/reworded this section for clarity.
- Sample Data Flow: Reformatted/reworded this section for clarity.
- RAID 1 has been added as an acceptable configuration throughout article.
- Added note that RAIDs must be connected via a hardware RAID controller.
- Article rewritten to provide additional clarity and context.
- Image added showing example workflow.
- File and folder naming information has been moved to "Production Assets: File & Folder Naming"