Access to conflict zones in order to investigate and report on human rights violations is very restricted and dangerous for independent journalists, international news agencies, UN investigation bodies, and international human rights organisations. This is the principal reason Mnemonic and other documentation groups depend on verified user-generated content to assist in criminal case-building as well as human rights research.
Mnemonic strives for transparency in its tools, findings, and methodologies; as well as making sure that verified content is publicly available and accessible for journalists, human rights defenders, and lawyers working for the purpose of reporting, advocacy, and accountability purposes.
To achieve transparency, software developed by Mnemonic is released in free and open-source formats. This is done to ensure trust is built and maintained with our partners and collaborators, as well as allowing software to be reused and customised by other groups outside of Mnemonic. Technical integration with existing open-source investigative tools ensures that work is not duplicated. Mnemonic works alongside technologists to develop our open-source tools. Our methodology is developed in collaboration with other archival groups, as well as lawyers and journalists.
Mnemonic’s Digital Evidence Workflow is based on the Electronic Discovery Reference Model developed by Duke University School of Law.
Our collection process involved establishing a standardised metadata schema alongside a database of credible sources for digital content. Sources can be direct submissions from individuals and organisations, publicly available social media accounts and channels, as well as other publicly available information.
1) Establish a database of credible sources for content
Before any collection, archival, or verification of digital materials was possible, we had to establish a database of credible sources for visual content. We have identified over 10.00 credible sources, including individual journalists and field reporters, larger media houses (e.g. local and international news agencies), human rights organisations, local field clinics and hospitals.
Visual content is primarily accessed through social media channels (Twitter, Facebook, YouTube, Websites, Telegram), submitted files (videos, photos, pdf), and external and collaborators’ data sets. Changes in these data sets are tracked, meaning that all versions are saved.
Credibility is determined by analysing whether the source is familiar to us, or our existing network, as well as checking that the source’s content and reportage has been reliable in the past. This might include evaluating how long the source has been reporting and how active they are.
To identify where the source is based, social media channels might be evaluated to determine if videos uploaded are consistently from a specific location, or whether locations differ significantly. Channels and accounts might be analysed to determine whether they use a logo and whether this logo is consistently used across videos. Channels and accounts might be additionally analysed for original content to determine whether the uploader aggregates videos from other news organisations or accounts, or whether the source appears to be the primary uploader.
2) Establish a database of credible sources for verification Mnemonic established a database of credible sources for verification. These sources provide additional information used to verify content on social media platforms or received from sources directly. Content verifiers include citizen journalists, human rights defenders and humanitarian workers based in conflict zones and abroad. To preserve data integrity, sources used for content acquisition do not comprise part of the database for verification.
3) Establish standardised metadata scheme
Before we can preserve or verify any content we must define a system through which content can be managed and organised, this is done through metadata. Establishing a data ontology or metadata scheme is necessary to assist us in organising and managing content as well as helping users in identifying and understanding what happened, and when and where.
Whilst recognising the need for a data ontology, or standardised metadata scheme. We also recognise that the implementation of any metadata scheme is a highly political choice. Given that there are no universally accepted, legally admissible metadata standards, efforts were made to develop a framework in consultation with a variety of international investigative bodies. These include consultations with members of the United Nations Office for High Commissioner of Human Rights, and with other archival institutes, and human rights and research organisations.
Adding metadata happens after content is preserved but it is crucial to define a metadata scheme before collecting and processing content.
Mnemonic’s collection and secure preservation workflow ensures that original content is not lost due to its removal from corporate platforms. This is achieved by collecting and securely storing digital content on external backend servers before it is taken offline and prior to basic verification procedures. Content is then backed up securely on servers throughout the world. We use Sugarcube for this process, a free and open-source software developed for human rights investigations using online user-generated content.
Sugarcube is a tool designed to support journalists, non-profits, academic researchers, human rights organisations and others with investigations using online, publicly-available sources (e.g.tweets, videos, public databases, websites, online databases).
In this preservation pipeline we detect the spoken language, and standardise the data format (whilst preserving the old format). We screenshot and download the web page hosting the content. Files that are in our database get both their
sha256 hash and are time-stamped with Enigio Time - a third party collaborator. We hash and timestamp in order to ensure and prove data integrity which means that data has not been changed or manipulated since it has been archived.
Once content has been safely preserved metadata is extracted from visual content, its parsed and aggregated automatically using our predefined and standardised metadata schema. Location and source details might be included in the parsed metadata which can be useful to geolocate where content originates.
Metadata is added both automatically and manually, depending on how it was collected, e.g open source or closed source. A detailed description and full list of metadata field types are provided on our website. Metadata we collect includes a description of the visual object as given (e.g. YouTube title); the source of the visual content; the original link where the footage was first published; identifiable landmarks; weather (which may be useful for geolocation or time identification); specific languages or regional dialects spoken; identifiable clothes or uniforms; weapons or munitions; device used to record the footage; and media content type.
The processing pipeline also passes video files into keyframes, as well as using the machine learning software, V-FRAME. VFRAME is a collection of open-source computer vision software tools designed for human rights investigations relying on large datasets of visual media. It utilises object detection algorithms that can automatically flag video content depicting predefined objects, such as cluster munitions.
Our data pipeline prepares visual content for initial verification. All possible additional tags and chain of custody information is recorded. This is done to assist users in identifying and understanding what happened in a specific incident, and when and where.
Verification consists of three steps: 1) Verify the source of the video uploader or publisher; 2) Verify the location where the video was filmed; 3) Verify the dates on which the video was filmed and uploaded.
- Verify the source of the video publisher
Firstly we establish whether the source of the video is on our list of credible sources. If not, we determine the new source’s credibility by going through the above procedure.
In some cases, near-duplicate content may be published. For example, if a 10-minute video includes all of a second 30-second video – both videos would be preserved as long as they can be verified. Similarly, videos from news organisations or media houses featuring parts of other videos are also preserved– as long as verification is possible. We also preserve duplications if they are from different sources and the original uploader is unidentifiable.
The video-upload source may differ from the camera operator. In most of the video footage which we verify, only the video uploader and not the camera operator can be identified. In advanced verification of priority cases, the analysis phase includes identifying the camera operator.
- Verify the location where the video was filmed
Each video goes through basic geolocation to verify that it has been captured in the area it claims to. A more accurate geolocation process is implemented for priority content in order to pinpoint its origin to a more accurate location. This is done by comparing visual references (e.g. buildings, mountain ranges, trees, minarets) with satellite imagery from Google Earth, Microsoft Bing, Maxar, OpenStreetMap, as well as geolocated photographs from Google Maps. Satellite imagery is also used to assess damage and destruction whilst investigating attacks targeting civilians and civilian infrastructure.
In addition to this, Mnemonic compares the language spoken in videos against known regional accents and dialects within countries to further verify the location of videos. When possible, we contact sources directly, and consult our existing network of journalists operating inside and outside conflict zones to confirm the locations of specific incidents.
- Verify the dates and times in which the video was filmed and uploaded
We use time and date metadata embedded in videos we directly receive in order to corroborate the date and time of a specific incident. Date and time are extracted using the ExifTool.
We verify the capture date of videos by cross-referencing their publishing date on social media platforms (e.g YouTube, Twitter, Facebook, and Telegram) with dates from reports concerning the same incident. Visual content collected directly from sources is also cross-referenced with reports concerning the incident featured in the video.
Those reports include:
- Reports from international and local media outlets;
- Human rights reports published by international and local organisations, including Human Rights Watch, Amnesty International, and Physicians for Human Rights;
- Incident reports shared by the Mnemonic’s network of citizen reporters on Twitter, Facebook, and Telegram.
Additional tools such as Google reverse imagery and SunCalc can also be used to confirm the capture time and date of the visual content.
Investigation and further analysis
Investigation and further analysis
In some cases, we conduct in-depth open-source investigations. Time and capacity limitations mean not all incidents can be analysed in-depth. However, by developing a replicable workflow it is hoped that others assist in these efforts, and investigate other incidents using similar methods. A detailed overview of our in-depth incident analysis is provided in the investigations pages of our archival websites.
For some incidents, our team of researchers collect witness statements or partner with organisations that do. In the past, this has included working with organisations whose role is collecting accounts of survivors, the injured, family members, or eyewitnesses (e.g. medical staff, managers of hospitals).
Flight Observation Data
In cases of alleged aerial bombings Mnemonic analyses flight observation data and cross-references it with visual content to confirm if flights were observed near locations where aerial bombings were alleged. We use Tableau public to conduct this type of analysis.
Once content has been processed, verified, and analysed, it is reviewed for accuracy. In the event of a discrepancy, content is fed back into the digital evidence workflow for further verification. If content is deemed accurate it moves to the publishing stage of the digital evidence workflow.