This manual covers generic usage of the Shepherd Project Framework.
The Shepherd Project is a capture-mark-recapture (CMR) software application for storing and analyzing data about animals in a study population. Its ultimate goal is to make it easy for a CMR project to quickly go from data collection to population analysis.
A single software instance of the Shepherd Project application is deployed into a web server and represents a distinct mark-recapture study. We will call this distinct instance of the application for a particular study a “library”.
The Shepherd Project divides its managed mark-recapture data into these distinct types:
An Encounter is an individual sighting of a member of a target population of a single species. An encounter report is submitted to the framework via a web interface and may represent (if enough data is present for identification) a “mark” (first sighting) or “recapture” (subsequent re-sighting) of an individual from a study population. Each encounter contains photos and data that represent one individual at one point in time. An encounter can be added to a previously identified marked individual in the database, representing a re-sighting of that animal, or it can be allocated as a new marked individual and given a name or tag number (e.g. “A-001”), representing a new animal previously undocumented in the library. An encounter may also remain “Unassigned”, indicating that the encounter does not contain enough data to be identified as a new or previously seen individual at the current time, though it may be matched to other encounters in the future. In the default Shepherd Project configuration, encounters can be assigned into an “Unidentifiable” state, indicating that they do contain enough data to ever be assigned to a marked individual (most often because no distinguishing marks or “tags” are visible in submitted photos).
A Marked Individual is an uniquely identified member of a population and includes one or more reported encounters. It is up to each library and its research staff to determine the minimum amount of data and procedures required for a unique identification (e.g., a distinct ear tag, a visual photo-identification, digital extraction of spot patterning, a distinct DNA pattern, etc.). As the study acquires more and more encounters for each individual in the library, it will be able to build up robust metrics for population analysis, allowing its research team to better understand population trends.
An Occurrence represents a observation of multiple individuals together and includes one or more encounters over a short duration of time. The purpose of an occurrence is to provide an hierarchical category to represent groups of individuals and potentially the relationships among them at a point in time and space. For example, an occurrence might represent a pods of baleen whales, which are typically fluid in membership, even over brief periods of observation (e.g., less than an hour), or more stable groups, such as sperm whales, which might remain relatively stable over a longer period of continuous observation (e.g., several hours).
A Biological Sample (a.k.a. “Tissue Sample” but not strictly a tissue) represents the retrieval of a small amount of biological material from an animal. For example, this may be a direct biopsy, a fecal sample, a blood sample, or a mucus sample. Because a biological sample is collected at a location and point in time, it is added to an Encounter, representing an additional part of an animal sighting record. The Shepherd Project currently allows you to add data for these types of subsequent analyses upon a Biological Sample:
- One or more haplotype analyses and determinations
- One or more genotype analyses (microsatellite markers)
- One or more genetic sex determinations
- One or more biological/chemical measurement. For example, you could record a stable isotope determination of ”-2.4 ppm for 13C” or record a pollutant measurement detected in the sample.
Wherever possible, the data attributes recorded for an Encounter or a Marked Individual are named according to their Darwin Core equivalents. A definition of the Darwin Core can be found on the TDWG web site:
“The Darwin Core is body of standards. It includes a glossary of terms (in other contexts these might be called properties, elements, fields, columns, attributes, or concepts) intended to facilitate the sharing of information about biological diversity by providing reference definitions, examples, and commentaries. The Darwin Core is primarily based on taxa, their occurrence in nature as documented by observations, specimens, and samples, and related information.”
This is not a clean one-to-one mapping (the Darwin Core does not specifically address mark-recapture or its terminology), but our implementation does make it fairly easy to map a TapirLink provider or IPT web application to the Encounter database table if you're using the Shepherd Project Framework with a relational database. This allows mark-recapture data to serve a dual purpose: local population analysis and broader biodiversity analysis in frameworks such as the GBIF and OBIS.
Let's take a look at the fields recorded for Encounter and Marked Individual records. If you're using a relational database (e.g., the Shepherd Project framework ships with an Apache Derby database that is created on first startup), these fields (or “attributes”) map to columns in the appropriately named database tables (i.e., “Encounter” and “MarkedIndividual”).
The following attributes are described in the Darwin Core quick reference at: http://rs.tdwg.org/dwc/terms/#dcterms:type
Wherever possible, this class will be extended with Darwin Core attributes for greater adoption of the standard.
- catalogNumber - “An identifier (preferably unique) for the record within the data set or collection.” In the Shepherd Project, the catalogNumber is the primary key for encounters in the database.
- otherCatalogNumbers - “A list (concatenated and separated) of previous or alternate fully qualified catalog numbers or other human-used identifiers for the same Occurrence, whether in the current or any other data set or collection.” Often times, researchers will assign an encounter number to a sighting in the field, and that number will differ to the catalogNumber assigned when the sighting is added to the database in the Shepherd Project. This field allows multiple catalog numbers to be recorded to account for this.
- individualID - “An identifier for an individual or named group of individual organisms represented in the Occurrence. Meant to accommodate resampling of the same individual or group for monitoring purposes. May be a global unique identifier or an identifier specific to a data set.”
- locationID - “An identifier for the set of location information (data associated with dcterms:Location). May be a global unique identifier or an identifier specific to the data set.”
- decimalLatitude - “The geographic latitude (in decimal degrees, using the spatial reference system given in geodeticDatum) of the geographic center of a Location. Positive values are north of the Equator, negative values are south of it. Legal values lie between -90 and 90, inclusive.”
- decimalLongitude - “The geographic longitude (in decimal degrees, using the spatial reference system given in geodeticDatum) of the geographic center of a Location. Positive values are east of the Greenwich Meridian, negative values are west of it. Legal values lie between -180 and 180, inclusive.”
- verbatimLocality - “The original textual description of the place.”
- maximumDepthInMeters - “The greater depth of a range of depth below the local surface, in meters.”
- maximumElevationInMeters - “The upper limit of the range of elevation (altitude, usually above sea level), in meters.”
- sex - “The sex of the biological individual(s) represented in the Occurrence. Recommended best practice is to use a controlled vocabulary.”
- day - “The integer day of the month on which the Event occurred.”
- month - “The ordinal month in which the Event occurred.”
- year - “The four-digit year in which the Event occurred, according to the Common Era Calendar.”
- verbatimEventDate - “The verbatim original representation of the date and time information for an Event.”
- occurrenceRemarks - “Comments or notes about the Occurrence.”
- modified - “The most recent date-time on which the resource was changed. For Darwin Core, recommended best practice is to use an encoding scheme, such as ISO 8601:2004(E).”
- occurrenceID - “An identifier for the Occurrence (as opposed to a particular digital record of the occurrence). In the absence of a persistent global unique identifier, construct one from a combination of identifiers in the record that will most closely make the occurrenceID globally unique.”
- recordedBy - “A list (concatenated and separated) of names of people, groups, or organizations responsible for recording the original Occurrence. The primary collector or observer, especially one who applies a personal identifier (recordNumber), should be listed first.”
- behavior - “A description of the behavior shown by the subject at the time the Occurrence was recorded. Recommended best practice is to use a controlled vocabulary.”
- eventID - “An identifier for the set of information associated with an Event (something that occurs at a place and time). May be a global unique identifier or an identifier specific to the data set.”
- dynamicProperties - “A list (concatenated and separated) of additional measurements, facts, characteristics, or assertions about the record. Meant to provide a mechanism for structured content such as key-value pairs.”
- identificationRemarks - “Comments or notes about the Identification.”
- genus - “The full scientific name of the genus in which the taxon is classified.”
- specificEpithet - “The name of the first or species epithet of the scientificName.”
The following fields are specific to this mark-recapture project and do not have an easy to map Darwin Core equivalent.
- dwcImageURL - An URL to a thumbnail image representing the encounter.
- measurements - An array of physical Measurement objects that define point-in-time measurements on the animal (e.g., height, weight, etc.)
- livingStatus - Defines whether the sighting represents a living or deceased individual. Currently supported values are: “alive” and “dead”.
- dwcDateAdded - Date the encounter was added to the library.
- researcherComments - Additional comments added by library users
- submitterID - Username of the logged in researcher assigned to the encounter.
- submitterEmail, submitterPhone, submitterAddress - name, email, phone, address of the encounter submitter.
- informothers - other email addresses to inform of status changes to this encounter (e.g., other members of a tour group reporting this encounter)
- photographerName, photographerEmail, photographerPhone, photographerAddress - name, email, phone, address of the encounter photographer.
- additionalImageNames - names and relative paths of the photos submitted for this encounter.
- interestedResearchers - researcher email addresses to notify when data for this encounter is modified.
- hour - hour of the sighting
- minutes - minutes of the sighting
- patterningCode - an open ended string that allows a type of patterning to be identified. As an example, see the use of color codes at splashcatalog.org, allowing pre-defined fluke patterning types to be used to help narrow the search for a marked individual.
- images - an array of the SinglePhotoVideo objects that represent individual photos and videos gathered during this encounter.
- tissueSamples - an array of the TissueSample objects that represent biological samples (biopsies, blood samples, etc.) gathered during this encounter.
The following data are recorded for marked individuals.
- individualID - unique identification string of the MarkedIndividual, such as 'A-109. This is the same intended value as Encounter.individualID.
- alternateid - alternate id for the MarkedIndividual, such as a physical tag number of reference in another database.
- encounters - an array of the encounters of this marked individual.
- comments - additional comments added by researchers.
- sex - overall determined sex of the MarkedIndividual.
- nickName - nickname for the MarkedIndividual.
- nickNamer - individual (person) who nicknamed this marked individual.
- dataFiles - an array of filenames of additional data files added to the MarkedIndividual.
- interestedResearchers - a list of email addresses to notify when this MarkedIndividual is modified.
- dateTimeCreated - creation time of this data object.
- dynamicProperties - from the Darwin Core: “A list (concatenated and separated) of additional measurements, facts, characteristics, or assertions about the record. Meant to provide a mechanism for structured content such as key-value pairs.”
- patterningCode - an open-ended string that allows a type of patterning to be identified. As an example, see the use of color codes at splashcatalog.org, allowing pre-defined fluke patterning types to be used to help narrow the search for a marked individual.
The following data are recorded for occurrences.
- encounters - an array of the encounters comprising this encounter.
- comments - additional comments added by researchers.
- groupBehavior - description of the aggregate behaviors of the individuals in the group represented by this occurrence.
- individualCount - estimated number of total individuals in this occurrence, which may be greater than the number “captured” and recorded.