Writing and using Queries
The following sections are covered:
- Content Migrator Object Model
- Hibernate Query Language
- Executing Queries in Content Migrator
- Viewing Content Data
- Repository Selectors
- Summarizing Content
- Further Reading
Content migrator can be and often is used to capture source content and store this information in the Content Migrator repository. It is often the case when you want to understand more about the content objects that you have stored in Vamosa Content Migrator's repository, the query screen can be used for this purpose. As well as providing the functionality to view stored content, queries are also used as the main selection mechanism for further processing of this content in preparation for the final step of the migration, loading the content into the target platform.
You can access the query screen using the main navigation tabs on the top right of your screen.
It is worth pointing out that all queries will produce empty results unless the 'store content' task has been added to your initial capture content pipeline.
Typically the query screen would be the next port of call after the successful execution of a task pipeline. In moving to the query screen you have the chance to reflect on the results of your task pipeline, with regards to content, be it modifying metadata or actual HTML content itself.
Using the query screen you can select individual projects (master or sub projects) that you wish to query and, depending on the syntax of your query, you can choose to view the metadata that has been applied to each individual content descriptor object. When executing a query from the query screen a project must be selected.

The layout of the query screen consists of a number of query libraries and their individual queries down the left hand side of the screen and a main area for creating/editing and executing the individual queries, which will also display the query results in a table grid.
There are 2 types of query that exist within Content Migrator, NORMAL queries and SELECTOR queries.
Norrmal queries are executed in the query tool, these queries are used to return results from the Content Migrator repository and display the relevant information in the query results panel. Selector queries can not be executed within the query tool, these queries are used in task pipelines - specifically within the Repository Selector task, these queries are used to select the content objects that are to be processed in the associated task pipeline. Each query type will be covered in more detail later.
Content Migrator Object Model

The above diagram shows the most important objects from the Vamosa Content Migrator object model.
A Programme is the highest level 'container' object. Programmes have 0 or more Project objects. A programme has the concept of a Translation Table. This translation table is used to re-factor all of your links to point to the 'new' location of migrated content, used in the later stages of a migration project. All projects held within a programme will have their links translated using this table, links do not cross translation tables of multiple programmes. With this in mind it is good practice to hold all content nominated for migration within the same project, regardless of the number of projects that this may require. This will ensure all links are maintained across each of the sites that you are migrating.
A Project is either MasterProject or a SubProject. Both share common properties such as name and description. A SubProject needs to be associated with a MasterProject. Typically a master project will map to a site that you have nominated for migration. Each site that is within the migration should be held in a programme, thus maintaining links between sites. Master projects do not need to be specifically a whole site, they can be a site collection, with one master project representing each of the high level sub domains of a single website. Sub projects are used to contain a copy of your master project content, this allows sub project content to be modified and restored from the master without the need of capturing the content from stage one again.
A Project will have 0, 1 or many ContentDescriptors, one for each URL that is to be migrated. These ContentDescriptors will have 0, 1 or many Content objects and OutboundLink objects. The content objects represent the source content of each content descriptor, be it HTML, XML or binary content. Out bound links belong to HTML content with a direct mapping to each outbound link of the source content. ContentDescriptors may also contain Metadata, which is not an object itself, but rather a property of a ContentDescriptor. This metadata provides additional information on the content descriptor, from source metadata, migration processing metadata and target platform metadata.
The properties of the objects can all be used in the HQL queries with Content Migrator.
There are many classes that make up the complete object model, however not all of these classes are relevant when writing HQL queries. With this in mind, the following classes will be discussed to assist you in creating HQL queries and referencing the correct object properties and package structure. For a complete reference of classes and properties see the API documentation.
Programme
A Programme is the high level container for master projects and
their relevant sub projects, if any exist. The Programme class is
accessible via com.vamosa.projects.Programme.
A Programme contains the following properties that would be referenced from HQL:
- id: String
- name: String
- customerName: String
- inDevelopment: Boolean
- description: String
The following sample query can be used to count all content descriptor objects that exist in the same programme as the current project. This example also shows the use of a sub query.
SELECT count(*)
FROM com.vamosa.content.ContentDescriptor cd
WHERE cd.project.programme.id IN
(SELECT project.programme.id
FROM com.vamosa.projects.Project project
WHERE project.id = :projectId
)
Project
Projects, which can be either a master or a sub project, contain
content descriptor objects. Projects, both master and sub projects,
reside in com.vamosa.projects. Project is the
superclass of MasterProject and
SubProject. It is unlikely that a query would be
written to directly access a project using its package structure,
in general terms they would be accessed from a ContentDescriptor
reference object. If you are required to access a project directly
the following package structure would be used
com.vamosa.projets.Project.
Project contains the following properties that would be referenced from HQL:
- id: Number
- name: String
- description: String
- state: String
The following query will select the name and the state of the currently selected project.
SELECT project.name as NAME, project.state as STATE
FROM com.vamosa.projects.Project project
WHERE project.id = :projectId
MasterProjects can be queries using
com.vamosa.projects.MasterProject.
In addition, sub projects would be referenced using
com.vamosa.projects.SubProject. Sub projects contain
the following additional properties:
- subprojecttype: String
- masterProject: MasterProject
The following will select the name for the master projects and its sub projects, if the currently selected project has any.
SELECT project.masterProject as master, project.name as subproject
FROM com.vamosa.projects.SubProject project
WHERE project.masterProject.id = :projectId
ContentDescriptor
A content descriptor is a reference to a piece of content that has been imported into the Content Migrator repository, typically through a Web Selector task. A ContentDescriptor refers to a piece of content identified within a project by a specific URL. The object provides access to the content data, the acquired metadata and any outbound links to other objects if the content-type is "text/html" or "application/xhtml+xml".
ContentDescriptor is accessible via the
com.vamosa.content.ContentDescriptor package path.
ContentDescriptors have the following properties available to it when creating HQL queries.
- id: Long
- url: String
- project: Project
- content: List of Content objects
- outboundLinks: Set of OutboundLink objects
- metadata: Map of String-based key-value pairs
The following query will find all HTML content that exists
within the currently selected project. This query also uses a
further conditional where clause to only select
objects that contain a Identify Metadata.Content-Type
metadata attribute with a value that is like
%text/html%. By default the URL of the content
descriptor will be listed in any results found.
FROM com.vamosa.content.ContentDescriptor cd
WHERE cd.project.id=:projectId
AND cd.metadata['Identify Metadata.Content-Type'] LIKE '%text/html%'
Content
A content object stores the data described by its partner content descriptor. In most cases there's only ever one version content related to a content descriptor, however there may be more than 1. This should be considered if writing queries to query specific content versions.
Typically all content, other than text-base content stored with
content-type text/% or
application/xhtml+xml, is stored as Base64-encoded
data in the contentData property of this object.
Content is accessible from the following package path
com.vamosa.content.Content.
Content has the following properties available to it when writing HQL queries.
- id: Long
- contentData: String
- contentDescriptor: ContentDescriptor
When referencing the contentData property remember this is often a large string of raw HTML and should not be returned when writing select HQL queries, you should also refrain from querying contentData of binary objects on the Query Screen.
The example below shows a query that is actually querying the HTML content of content descriptor objects. This query will list all content that has empty metadata keywords.
FROM com.vamosa.content.ContentDescriptor cd
WHERE cd.project.id=:projectId
AND cd.content[0].contentData LIKE '%meta name="keywords" content=""%'
Content objects are more often referenced by ContentDescriptor objects with regards to HQL queries.
Metadata
Metadata relating to content is stored in key-value pair map belonging to content descriptors.
We have seen examples earlier of metadata being accessed in a HQL query, however for completeness a further example is given below.
This example will select all ContentDescriptor objects of the
selected project that contain a Identify
Metadata.Status-Code metadata attribute with a
404 value, producing a result set of all
ContentDescriptors/urls that could not be found during the web
selector task, assuming the content has been captured from the
web.
FROM com.vamosa.content.ContentDescriptor cd
WHERE cd.project.id=:projectId
AND cd.metadata['Identify Metadata.Status-Code'] = '404'
OutboundLink
An outbound link is a link found on web page, described by a content descriptor, that refers to another web page or asset. Outbound links found by the web selector task will include links to websites found outside of the crawl URL patterns that limit the scope of the web selector task.
OutboundLink is accessible from the following package path
com.vamosa.content.OutboundLink.
OutboundLink has the following properties available to it when writing HQL queries.
- id: Long
- url: String
- contentDescriptor: ContentDescriptor
OutboundLink objects are more often referenced by ContentDescriptor objects with regards to HQL queries.
The following example will count all ContentDescriptor objects that contain a specific URL in its outbound links. This example will create a join to outbound links and return the number of content descriptors that have an outbound link to 'http://www.vamosa.com'
SELECT count(*)
FROM com.vamosa.content.ContentDescriptor as cd
JOIN cd.outboundLinks as ol
WHERE ol.url = 'http://www.vamosa.com'
AND cd.project.id = :projectId
Hibernate Query Language
Content migrator does not use the SQL, syntax as it suffers from a lack of cross-platform portability between the various database platforms, choosing instead to use Hibernate Query Language, HQL, an object-oriented derivative.
HQL is much like SQL in its syntax and is used to execute queries against a database, much in the way that SQL is. HQL is based on relational object models and makes the SQL object oriented. HQL uses Java Classes and properties instead of database tables and columns. Hibernate automatically generates the SQL query required to be executed against the underlying database, regardless of the database platform, in our case only the supported database platforms of Content Migrator.
The following example will return a list of all content objects from a project:
FROM com.vamosa.content.ContentDescriptor cd
WHERE cd.project.id=:projectId
Note the reference to the ContentDescriptor object using the full path through the Vamosa package structure. We also alias the ContentDescriptor class, in the same way that SQL would a table and access its associated Project class and its properties, the id, in a similar way that SQL would reference table columns.
The :projectId is prefixed by a : as
it is an input parameter into the query, taken from the selected
project in the 'Project' drop-down box of the query screen. This
value is required in all Content Migrator queries.
We do not need to explicitly state the 'SELECT' part of the query, this is assumed but can be overridden.
HQL supports the familiar clauses of SQL, such as
where, order by and group by
and can be used in Content Migrator as expected.
Common aggregate functions are also supported, such as
avg(...), sum(...),
min(...), max(...) and
count(...)
For more information on HQL visit The Hibernate Query Language
With the exception of names of Java classes and properties, queries are case-insensitive. So select is the same as SELECT, but com.vamosa.content.ContentDescriptor is not the same as com.vamosa.content.contentdescriptor.
Executing Queries in Content Migrator
In order to execute a query you must first select a query from the query library's on the left hand side of the screen. Once a query is selected you are presented with the screen shown below.

The form at the top of the screen allows you to select what project you wish to execute a query against from the Project drop down list.
Once a project is selected you must then decide what metadata
you want returned in your query results. The URL of content
descriptors is listed with the results in the case that the query
returns content descriptors, e.g. FROM
com.vamosa.content.ContentDescriptor. Queries that begin
with 'SELECT' will return the columns defined in the select clause.
To select the metadata that you wish to view simply select the
metadata attributes listed in the Columns list box on the
right hand side of the form.
By default all metadata available for the content descriptors of the project are selected. To select a subset of metadata select the attribute your require, holding down the 'control' key while clicking on further attributes to expand your selection. If you select a project and only the URL is listed in the Columns list box, the project that you have selected is likely to have 0 content descriptors stored against it and as such no metadata.
Now that you have selected the project and the metadata to list in the results you are ready to execute your query. Simply click on the Execute button to execute your query and the query results will slide up from the query results panel at the foot of the browser.
These results will list the URL of each content descriptor returned in the left most column, followed by a column for each of the metadata attributes that you selected. You will notice a small eyeglass beside each URL that is returned, clicking on this icon will open the content data for that content descriptor in a new browser tab. We will cover this window later.
Occasionally you will run a summarise query that will list a high level summary of project data. We will cover these queries later, however it is worth pointing out that these results will be presented differently in that they will provide a count of objects for a given condition.
Clicking on the Close Query Results button will hide the query results that have appeared.
Viewing Content
Occasionally you will want to not only query the metadata of the content descriptors but will be required to view the content data of each content descriptor. The eyeglass icon beside each URL in the query results panel can be clicked to view the HTML content. Clicking the eyeglass icon will open a new browser tab or window, displaying the content descriptor content data.

The content window screen provides a number of additional areas of information, with different options available for viewing the content data.
Content data can be viewed in both browser view and Edit view. Browser view is shows content rendered in the browser, as the content would appear to most users. The Edit view shows the content source, the HTML markup. Using Edit view you can manually edit the content and save the changes to individual content objects. These changes will be reflected in the Browser view.
When viewing the content data in browser view any underlying Javascript will be running, as it would in any browser. This Javascript can be toggled on and off using the 'Toggle Javascript' button.
To the right of the browser view we have 3 additional information panels: Metadata, Outbound Links and Inbound Links.
Metadata lists each metadata attribute and its value for the content descriptor object. Each metadata value can be edited here, either updated or deleted. You can not create new metadata or edit the metadata attribute name, this must be done using an enhance script.
Outbound Links and Inbound Links are not editable. Outbound Links show the links that are outbound from this content descriptor, regardless of a corresponding content descriptor existing in the Content Migrator repository. Inbound links will only show the URLs of other content descriptors that link to this content descriptor, in this case there must be a corresponding content descriptor in this project for the inbound link URL.
Be aware when clicking links within the content viewer you will be presented with the live content of the target location you clicked in the browser view. This will not update the source content in the Edit view, nor will it update the metadata and links panels. Clicking the link is affectively moving away from the content descriptor object and out of the Content Migrator environment.
Repository Selector Queries
As mentioned earlier, Content migrator contains two query types. Normal queries have existed since version 1 of Vamosa Content Migrator. However version 3 introduces repository selector queries.
The main difference between the two query types is that normal queries are predominantly for the query screen while repository selector queries are used solely in the Repository Selector task.
The repository selector task requires, as an input parameter, a selector query. This query will be used to select the content descriptors from the repository that are to be processed in the task pipeline.
Repository selector queries must include the relevant objects related to a content descriptor if they are to be available to the tasks in the task pipeline. These objects include Content, Metadata and OutboundLinks.
The following example will create a repository selector for all content descriptors of a project, whilst also making metadata available to the tasks in the task pipeline (Content and ContentDescriptor are required objects for enhance 'per object' tasks and must be included in your selector queries. You must select these objects as they are passed form the query results into the scripting runtime environment.).
FROM com.vamosa.content.ContentDescriptor cd j
JOIN FETCH cd.content
JOIN FETCH cd.metadata
WHERE cd.project.id=:projectId
If the task pipeline contains a reference to outbound links, the
pipeline will fail at this point and throw a 'lazy loading' error.
To resolve this problem the query should be modified to include a
JOIN FETCH to OutboundLink.
FROM com.vamosa.content.ContentDescriptor cd j
JOIN FETCH cd.content
JOIN FETCH cd.metadata
JOIN FETCH cd.outboundLink
WHERE cd.project.id=:projectId
The main point to take note of when creating repository selectors is to include a JOIN FETCH to fetch the associated objects into the content descriptors and make them available to the scripting runtime environment.
Repository selectors must, as a minimum, include references for ContentDescriptor and Content. These selectors can be extended to contain conditions for limiting the content descriptors returned, for example only objects that contain a specified metadata attribute and value.
Summarizing Content
HQL in Content Migrator allows you to create and execute queries that will return lists of content descriptors and their associated metadata values. We can also use HQL to summarise information held against a project.
The following query will return a summary of the content type of a content descriptor and the total number of objects that exist in the selected project for that type.
SELECT metadata, count(*)
FROM com.vamosa.content.ContentDescriptor cd
JOIN cd.metadata as metadata
WHERE cd.project.id=:projectId
AND index(metadata)='Identify Metadata.Content-Type'
GROUP BY metadata
ORDER BY count(*) DESC
There are some new clauses in this query that are worth reviewing.
In the select clause we make use of the count()
aggregate function. We use * as a wildcard for the
count in this case, like SQL we could use a column(property) that
we want to count to just as easy. We also use the group
by and order by clauses. These clauses will be
familiar if you are familiar with SQL. Basically we are saying we
want our results to 'group' all metadata counts together for each
value(content type) and order by descending object count.
From the example given we are summarising based on a specific metadata value.
The following example will produce a list of URLs representing a content descriptor and a total number of outbound links that URL has.
SELECT cd.url, count(ol)
FROM com.vamosa.content.ContentDescriptor cd
JOIN cd.outboundLinks as ol
WHERE cd.project.id=:projectId
GROUP BY cd.url
ORDER BY count(ol) DESC
Similar queries can be created to summarise the available properties of the Content Migrator Object Model for your projects.
Further Reading
We have covered the basics of using the Content Migrator query screen, touching on HQL and the fundamentals of creating and executing queries and the difference between normal queries and selector queries.
There are many resources available on the web on HQL, the following are particularly good.
Viewing Content Migrator Concepts is a good way of understanding the Content Migrator object model.