Wikidata:WikiProject Performing arts/Data structure/Data modelling issues

From Wikidata
Jump to navigation Jump to search

This page contains an overview of the data modelling issues on Wikidata encountered in the area of the performing arts. Diverging modelling approaches are documented as much as possible in form of SPARQL queries. The overview serves as a basis for community discussions about how to approach the data modelling issues. The modelling approaches retained will then serve as a basis for more user friendly documentation of data modelling practices and for maintenance lists.

For a full list of issues that were identified in the context of data ingests, see the following Google Doc.

For a list of issues that were identified as part of CAPACOA's Wikidata project, see this Google Doc.

For Issues regarding Performing Arts Organizations there is a specific projectpage: Wikidata:WikiProject_Performing_arts/Data_structure/Data_modelling_issues/organizations

Many items confound architectural structures and organizations[edit]

Description of the issue[edit]

Many Wikidata items confound architectural structures (buildings, venues) and organizations. This is a problem when we add statements that may apply to either of them, such as inception (P571). Items having this issue may display one or several of the following characteristics:

  • The presence of several instance of (P31) claims that relate both to architectural structures and organizations;
  • The presence of properties that relate to the architectural structure and to the organization;
  • The presence of external identifiers that relate to architectural structures and to organizations.

Rationale to resolve the issue[edit]

Create separate items for the architectural structure and the organization. Assign each property and external identifier to the right item.

Example: Teatro alla Scala[edit]

Remarks / Observations[edit]

  • Some external reference databases (authority files) do not specify whether they are referencing a corporate body or an architectural structure (example).
  • The opera company may change its name over time or it may be replaced by a new company altogether. Depending on how the historicization of the company name (or the successive companies) is handled, this may cause issues when ingesting data from external databases that do not reflect these changes (all entries to the opera company may point to "La Scala" or similar). A similar issue exists with regard to the entries in the authority files, as most of them treat the given company as one, even if it has changed its name over time. Therefore, we should avoid creating a new item for each name change in a corporate body. Use official name (P1448) instead with start time/end time qualifiers.
  • The disentanglement of the Wikidata items raises the question which Wikidata item the corresponding Wikipedia article should be linked to. Depending on the main focus of each article, the various articles in different languages that originally pointed to the same Wikidata item may in the future point to different Wikidata items. As a consequence, some language links on Wikipedia will be broken. This issue may be resolved by introducing a new property used to indicate the Wikipedia article the item is mainly treated in for each language. Eventually, the language links on Wikipedia should rely on these statements (the hard 1:1 link between Wikipedia article and Wikidata item being a fiction that cannot be maintained over time).
  • External identifiers should not be blindly matched to Wikidata items based on VIAF, GND and the like, as there are still many wrongly matched items. In the longer term, links to authority files should be validated with regard to the class of the item.

SPARQL Queries[edit]

List of theatres (venues), with their various type statements (in descending order of the number of types)[edit]

Maintenance tasks:

SELECT   ?item 
         ?itemLabel
         (replace(group_concat(distinct ?type;separator="; "), "http://www.wikidata.org/entity/", "") as ?types)  #Strip the path in order to get only the Q-number.
         (group_concat(distinct ?typeLabel_en;separator="; ") as ?typeLabels_en)
WHERE 
{
  ?item wdt:P31/wdt:P279* wd:Q24354.
  ?item wdt:P31 ?type.
  OPTIONAL { ?item wdt:P31/rdfs:label ?typeLabel_en . FILTER (lang(?typeLabel_en) = "en") }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
group by ?item                          #List all the variables for which the values are not concatenated!
         ?itemLabel
order by desc (COUNT(?type))
Try it!


List of items for performing arts buildings which are at the same time defined as organizations[edit]

Maintenance tasks:

  • Remove any instance of (P31) statements that refer to an organization. Create a separate item for the organization (sometimes you may want to proceed the other way round, creating a new item for the theatre building; you may also inspect the list of reverse statements to see to which type incoming statements predominantly refer to).
SELECT   ?item ?itemLabel
WHERE 
{
  ?item wdt:P31/wdt:P279* wd:Q57660343.
  ?item wdt:P31/wdt:P279* wd:Q43229.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!

List of items with both a Flanders Arts Institute organization ID and a Flanders Arts Institute venue ID[edit]

Issue is documented at the following page: [[1]]

List of properties used on theatre (venue) items[edit]

To be used to identify problematic properties used in combination with theatre venues (e.g. properties relating to organizations).

SELECT ?property ?propertyLabel ?count WITH {
  SELECT ?property (COUNT(DISTINCT ?statement) AS ?count) WHERE {
    ?item wdt:P31/wdt:P279* wd:Q24354;
          ?p ?statement.
    ?property a wikibase:Property;
              wikibase:claim ?p.
    FILTER(?property != wd:P31)
  }
  GROUP BY ?property
  ORDER BY DESC(?count)
  LIMIT 1000
} AS %results WHERE {
  INCLUDE %results.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "de-CH,en,fr-CH,it-CH,en-US,en". }
}
ORDER BY DESC(?count)
Try it!

List of theatre (venue) items with a VIAF identifier[edit]

Many (but not all) VIAF identifiers relate to corporate bodies and should be removed from the venue items (create separate items for the corporate bodies).

SELECT   ?item 
         ?itemLabel
WHERE 
{
  ?item wdt:P31/wdt:P279* wd:Q24354.
  ?item wdt:P214 ?value.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!

List of theatre (venue) items with a GND identifier[edit]

Many (but not all) GND identifiers relate to corporate bodies and should be removed from the venue items (create separate items for the corporate bodies).

SELECT   ?item 
         ?itemLabel
WHERE 
{
  ?item wdt:P31/wdt:P279* wd:Q24354.
  ?item wdt:P227 ?value.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!

List of theatre (venue) items with a Library of Congress identifier[edit]

Many (but probably not all) LoC identifiers relate to corporate bodies and should be removed from the venue items (create separate items for the corporate bodies).

SELECT   ?item 
         ?itemLabel
WHERE 
{
  ?item wdt:P31/wdt:P279* wd:Q24354.
  ?item wdt:P244 ?value.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!

List of theatre (venue) items with a BnF identifier[edit]

Many (but probably not all) BnF identifiers relate to corporate bodies and should be removed from the venue items (create separate items for the corporate bodies).

SELECT   ?item 
         ?itemLabel
WHERE 
{
  ?item wdt:P31/wdt:P279* wd:Q24354.
  ?item wdt:P268 ?value.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!

List of theatre (venue) items with a SUDOC identifier[edit]

Many (but probably not all) SUDOC identifiers relate to corporate bodies and should be removed from the venue items (create separate items for the corporate bodies).

SELECT   ?item 
         ?itemLabel
WHERE 
{
  ?item wdt:P31/wdt:P279* wd:Q24354.
  ?item wdt:P269 ?value.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!

List of theatre (venue) items with an Open Corporates identifier[edit]

All Open Corporates identifiers relate to corporate bodies and should be removed from the venue items (create separate items for the corporate bodies).

SELECT   ?item 
         ?itemLabel
WHERE 
{
  ?item wdt:P31/wdt:P279* wd:Q24354.
  ?item wdt:P1320 ?value.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!

List of theatre (venue) items with a field of work (P101) statement[edit]

All field of work (P101) statements should be removed from the venue items. They may be applied to the related corporate body items instead.

SELECT   ?item 
         ?itemLabel
WHERE 
{
  ?item wdt:P31/wdt:P279* wd:Q24354.
  ?item wdt:P101 ?value.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!

List of theatre (venue) items with a Twitter username (P2002) statement[edit]

All Twitter username (P2002) statements should be removed from the venue items. They may be applied to the related corporate body items instead.

SELECT   ?item 
         ?itemLabel
WHERE 
{
  ?item wdt:P31/wdt:P279* wd:Q24354.
  ?item wdt:P2002 ?value.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!

List of theatre (venue) items with a Facebook ID (P2013) statement[edit]

All Facebook ID (P2013) statements should be removed from the venue items. They may be applied to the related corporate body items instead.

SELECT   ?item 
         ?itemLabel
WHERE 
{
  ?item wdt:P31/wdt:P279* wd:Q24354.
  ?item wdt:P2013 ?value.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!

List of theatre (venue) items with a award received (P166) statement[edit]

Most award received (P166) statements should probably be removed from the venue items and be applied to the related corporate body items instead.

SELECT   ?item 
         ?itemLabel
WHERE 
{
  ?item wdt:P31/wdt:P279* wd:Q24354.
  ?item wdt:P166 ?value.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!


List of theatre (venue) items with a director / manager (P1037) statement[edit]

Issue is documented on the following page: [[2]]

Modelling world premieres[edit]

Description of the issue[edit]

Different modelling approaches have been used to describe world premieres:

Proposed rationale to resolve the issue[edit]

The first modelling approach is less onerous when it comes to describing just a few characteristics of the world premiere. The second approach, in contrast, is much more versatile when it comes to adding a lot of information about the specific theatre production. We therefore suggest that date of first performance (P1191) and location of first performance (P4647) statements may be used on work items to provide some minimal information about the world premiere. However, whenever further information is to be provided about the world premiere, a separate item is to be created for the first production of the play or even for its very first performance.

SPARQL Queries[edit]

Plays with location (P276) or cast member (P161) qualifiers on the date of first performance (P1191) statement[edit]

Proposed maintenance task: Use location of first performance (P4647) to indicate the location of the world premiere. If further information about the world premiere is provided, create a separate item for the first production of the play and provide it there. date of first performance (P1191) statements should not have any qualifiers.

SELECT DISTINCT ?item ?itemLabel
WHERE {
  ?item (wdt:P31/wdt:P279*) wd:Q25379;
        p:P1191 [
          pq:P161|pq:P276 ?value  
        ].
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!

Plays with a related performing arts production that has a premiere type (P4634) statement[edit]

The query illustrates the proposed approach for modelling world premieres and other premiere types.

SELECT DISTINCT ?play ?playLabel ?production ?premiereTypeLabel
WHERE {
  ?play (wdt:P31/wdt:P279*) wd:Q25379.
  ?production wdt:P144 ?play;
              wdt:P31/wdt:P279* wd:Q43099500;
              wdt:P4634 ?premiereType.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!