Alphabetical order of entries in categories

From Clavis Canonum
Revision as of 13:23, 31 July 2024 by Christof Rolker (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Is there a way to ignore certain words ("The", "Collectio") in the alphabetical order of pages in categories? (The issue is found in all categories.) Also, can the automatic alphabetical order handle Roman and/or Arab numbers? So, would "LX" or "XL" be first, an "54" or "154"? (The issue appeared in the List of Manuscript; I fixed it manually.)

Sort order for MSS

There is indeed a way to add a sort key to pages in the category listing, cf. https://www.mediawiki.org/wiki/Help:Categories#Sort_key I have done it manually for Dionysiana adaucta at the entry for the category Category:Collection not in Clavis database. The sort key is created like this: [[Category:Collection not in Clavis database|dionysiana adaucta]]
I have moved the page to Collectio Dionysiana adaucta using a standardised title. See my comments on standardised titles. --Christof Rolker (talk) 21:36, 21 August 2022 (CEST)
NB: Creating good sort keys is tricky: Which words at the beginning are to be eliminated? "The" surely, but also "Collectio" and "Collection". But if we eliminate "The Collection" then "The Collection in Clm 22289" will be sorted under "in Clm 22289". Also, numbers (Roman and Arabic) will have to be padded with leading zeros to make them properly sortable. We could try to do this with an automated bot that updates the sort keys every night (or whatever). Or we'd have to include the sort key into each and every category tag. At the moment, I am not sure about the best approach. Clemens Radl (talk) 19:56, 27 July 2022 (CEST)
Thanks. My feeling is that we can deal with the collections manually but may need something for the manuscripts. Christof Rolker (talk) 06:50, 28 July 2022 (CEST)
For the collections, see now my suggestions for standardised titles. We keep "Collectio" in all cases where it is part of an established title, and "Collectio canonum in..." for all collections known by the shelf mark of a copy, and get rid of the preceding "The..." in titles. That should make sortkey/DEFAULTSORT unnecessary for most cases (except numerals). --Christof Rolker (talk) 21:36, 21 August 2022 (CEST)
OK. This sounds good. I'll look into manuscript sorting. The algorithm for the key will be quite simple: all arabic numbers get a left side padding with zeros and roman numbers will be substituted with their appropriately padded arabic equivalent. Thus Modena, Bibliotheca Capitolare, O.II.2 will get a sort key such as: modena, bibliotheca capitolare, o.00000002.00000002. --Clemens Radl (talk) 10:02, 28 July 2022 (CEST)
OK, I re-investigated that problem and there is a better solution which I missed, initially: You can define a default sort key for a page using the magic word: {{DEFAULTSORT:...}} in the source code in a line of its own best placed at the beginning of the section where alle the categories are entered (see documentation here and somewhat more detailed, here). I have manually created this for some Paris (lat. 13656, lat. 3858, lat. 4282), all part of the Category:Manuscript of IT. Notice, how the three mss. are sorted nicely. So, my suggestion mentioned above to use the category tag itself is obsolete. I think it will be best if we stick to Christof's proposal: I try to create sort keys for mss. automatically, whereas the sort keys for the collections have to be added by hand. --Clemens Radl (talk) 14:17, 3 August 2022 (CEST)
Looks good! How many zeros do we really need? The highest numbers in shelf marks are five-figure numbers, IRRC: Brussels, London (Add. MSS only), Munich (Clm), Paris (lat.), and Vatican (lat.). So, I think "Paris, BnF, lat. 03858" should be enough. - For the Roman numbers we would have to invent keys, right? So "Ivrea, BC, 00094" for Ivrea, BC, XCIV, for example? Christof Rolker (talk) 14:38, 3 August 2022 (CEST)
Yes, you are right, five digits should be sufficient (Brussels being another candidate, but they also do not exceed five digits). (With these things I tend to err on the side of caution.) And yes, you're also right about the roman numbers, that's exactly what I plan to do (substitute them with their arabic equivalent). --Clemens Radl (talk) 14:45, 3 August 2022 (CEST)
For the collections, I think adding sortkeys manually using {{DEFAULTSORT:...}} is best; I'm testing it for the early collections (already in the catergories saec. VI, VII, VIII). Christof Rolker (talk) 11:20, 13 August 2022 (CEST)
Just a heads up: I have now written a small program that could operate as a wiki bot and automatically add keys to all manuscript pages. I think, it works fairly well, but there are some minor issues. The list of current keys can be viewed on a separate page. My proposal: whoever wants to have a look at the data, please move over to the mentioned site, feel free to add comments, suggestions, etc. After a couple of days (latest on Tuesday next week, as I'll be on vacation afterwards) I'd run the bot once. If we are happy with the results, I'd install the bot so that it patrols the manuscript pages on a regular basis and adds keys as needed. --Clemens Radl (talk) 13:46, 4 August 2022 (CEST)
You are great. Thanks! Christof Rolker (talk) 17:46, 4 August 2022 (CEST)
I have just now let the bot run once (started not automatically but by hand). As far as I can see, no damage has been done. All existing mss. pages now have a default sort key. The program runs quite fast, so I think, I'll pick a suitable time in the night (European time) and let it run daily. This means: whenever you add a new ms. page, on the next day a default sortkey will be present and active.
For all practical purpose, this is enough. for minute issues, see the discussion page to "list of MSS keys". Christof Rolker (talk) 11:39, 13 August 2022 (CEST)

Sort order for collections

For the collections, the best way is to have standardised page titles based on the most commonly used title of the respective collection (as opposed to the sometimes rather long titles based on the 2005 book). These page titles should not contain articles, a rule also followed by other wikis. For collection titles containing numbers (e.g. 4L, 74T and the like), DEFAULTSORT should be used. So, the page on 183T should be (and now is) Collectio CLXXXIII titulorum rather than "The Liber canonum diversorum sanctorum patrum or Collectio CLXXXIII titulorum or the collection of S. Maria Novella", and at the end of the "categories" sections contains DEFAULTSORT:Collectio 183 titulorum. I have moved a handful of pages accordingly, and added DEFAULTSORT to all collection continaing "librorum", "titulorum", "partium", and "capitulorum". Christof Rolker (talk) 10:59, 21 August 2022 (CEST)

For a list of all collections, see this Google Spreadsheet. You can copy&paste the standardised page title from colum E, the suggested DEFAULTSORT from column F, and suggested categories from column V. Be careful with the latter in particular, as all information here is based on the 2005 Clavis manual and may contain some errors; also, when processing empty cells, the formular produces some nonsensical category tags which you should simply delete. Christof Rolker (talk) 16:39, 21 August 2022 (CEST)