The MAVEN project aims to develop a suite of tools for multimedia data processing, by including and combining different functionalities and capabilities. These tools are described below:

Forensic analysis tools: This set of tools deal with the problem of integrity and authenticity verification of digital visual contents, i.e. images and video sequences.

Image Source Identification refers to determining which camera device generated a given digital photo. To this aim, watermarking technology is applied in order to embed within the digital content some information (i.e. the digital watermark) describing the camera device; at a later time, it possible to check whether an image found on the web contains the watermark matching a specific camera information.

There will be two different types of integrity verification tools in images:

  • Informed Image Integrity Verification – will determine if a known set of retouching operations has been applied to an image, by comparing such an image with the corresponding original one. In order to make easier this comparison and having the original images available, some dissimilarity metrics between the processed and the original image will be applied, based on the discipline called Change Detection.
  • Blind Image Integrity Verification – will determine if a digital image is authentic or suffered some manipulation with photo editing software. In this case original images are not available, so image forensics technologies, that are able to capture important information on the image history without any a-priori knowledge, will be exploited. Since the analyst cannot know in advance which is the most appropriate tool to be used on a suspect image, a set of different and complementary forensic tools will be developed and their outputs intelligently merged.

Finally, the Video Integrity Verification tool will determine if a video has been recompressed or not, as well as if some frames have been removed or added in the sequence. In this case, we have access only to the to-be-analyzed video, thus exploiting video forensics technologies. In particular, by starting from a video double encoding detection algorithm, we designed a module that localizes whether a misalignment in the frame structure of the video occurred between two successive encodings.

Objects and scene analysis tools: The Objects and Scene Analysis component actually consists of one tool responsible for providing text localization and extraction functionalities, whereas the second is aimed to provide the capability to detect a particular content within image and video galleries. More in details, the tools allows to categorize a scene (e.g. to say if the scene represents a kitchen, a garden, a sky), to detect the presence of a particular kind of object (e.g. a sofa, a bed), and also to detect the presence of a particular company logo.

The text localization and extraction module consists of two sub modules, one focused on the localization and the second focused on the extraction of the text. The devised text detection approach has been oriented to handle the specific case of “superimposed text” on video frames, in order to take advantage of the specific constraints of this kind of text (such as colour, size, position, and above its invariance in multiple consecutive frames).

Human-trait analysis tools: Two of the tools developed in the MAVEN project are devoted to the analysis of human traits, namely the spoken keyword detection module and the face detection and recognition module.

The spoken keyword detection tool is based on Hidden Markov Models (HMM). It also features an enhanced scoring algorithm than increases the accuracy on short words, which are usually more difficult to find. In addition, it is able to search for a keyword several times faster than real time.

The face detection and recognition tool includes the development of face detection and facial recognition modules. The face detection module aims to find all the faces in an image or video. The face recognition module is responsible for assigning a face an identity picked up from a database of known ones, thus complementing the face detection module.