In the educational and social sciences, it is a common practice to share research instruments such as questionnaires or tests. For example, the Open Test Archive provides researchers and practitioners access to a large number of open access instruments. aifeducation assumes AI-based classifiers should be shareable, similarly to research instruments, to empower educational and social science researchers and to support the application of AI for educational purposes. Thus, aifeducation aims to make the sharing process as easy as possible.
For this aim, every object generated with aifeducation can be prepared for publication in a few basic steps. In this vignette, we would like to show you how to make your AI ready for publication and how to use models from other persons.
Now we will start with a guide on preparing text embedding models.
Every object of class
TextEmbeddingModel comes with
several methods allowing you to provide important information for
potential users of your model.
First, every model needs a clear description how it was developed,
modified and how it can be used. You can add a description via the
This method allows you to provide a description in English and in the native language of your model to make the distribution of your model easier. You can write your description in HTML which allows you to add links to other sources or publications, to add tables or to highlight important aspects of your model.
We would like to recommend that you write at least an English description to allow a wider community to recognize your work. Furthermore, your description should include:
can provide a summary of your description. This is very important if you
would like to share your work on a repository. With
keywords_native you can set a
vector of keywords which help to find your work with search engines. We
would like to recommend that you at least provide this information in
You can access a model’s description by using the method
Besides a description of your work it is necessary to provide
information about other people who were involved in creating a model.
This can be done with the method
First of all you have to decide the type of information you would
like to add. You have two choices: “developer”, and “modifier”, which
you set with
type="developer" stores all information about the people
involved and the process of developing the model. If you use a
transformer model from Hugging
Face, the people and their description of the model should be
entered as developers. In all other cases you can use this type for
providing a description of how you developed the model.
In some cases you might wish to modify an existing model. This might
be the case if you use a transformer model and you adapt the model to a
specific domain or task. In this case you rely on the work of other
people and modify their work. In these cases you can describe your
modifications by setting
For every type of person you can add the relevant individuals via
authors. Please use the R’s function
personList() for this. With
provide a free text how to cite the work of the different persons. With
url you can provide a link to relevant sites of the
You can access the information by using
Finally, you must provide a license for using your model. This can be
Please note, that in most cases the license for your model must be “GPL-3” since some of the software used to create your model are licensed under “GPL-3”. Thus, derivative work must also be licensed under “GPL-3”.
The documentation of your work is not part of the software. Here you can set other licenses such as Creative Common (CC) or Free Documentation License (FDL). You can set the license for your documentation by using the method ‘set_documentation_license’.
Now you are able to share your work. Please remember to save your now fully described object as described in the following section 2.2.
Saving a created text embedding model is very easy by using the
save_ai_model. This function provides a unique
interface for all text embedding models. For saving your work you can
pass your model to
model and the directory where to save
the model to
model_dir. Please do only pass the path of a
directory and not the path of a file to this function. Internally the
function creates a new folder in the directory where all files belonging
to a model are stored.
As you can see all three text embedding models are saved within the
same directory named “text_embedding_models”. Within this directory the
function creates a unique folder for every model. The name of this
folder is specified with
If you set
append_ID=FALSE the the name of the folder is created by
using the models’ names. If you change the argument
append_ID=TRUE and set
dir_name=NULL the unique ID of the model is added to the
directory. The ID is added automatically to ensure that every model has
a unique name. This is important if you would like to share your work
with other persons.
If you want to load your model, just call the function
load_ai_model and you can continue using your model. The
following code assumes that you have specified
In the case you set
append_ID=TRUE loading the models may look as shown in the
following code snippet:
Please note that you have to add the name of the model to the directory path. In our example we have stored three models in the directory “text_embedding_models”. Each model is saved within its own folder. The folder’s name is created automatically with the help of the name of the model. Thus, for loading a model you must specify which model you want to load by adding the model’s name to the directory path as shown above.
At this point you may wonder why there is an ID in model’s name
although you did not enter an ID on model’s creation. The ID is added
automatically to ensure that every model has a unique name. This is
important if you would like to share your work with other persons.
During saving the ID is appended automatically by setting
Now you are ready to share your work. Just provide all files within
your model folder. For the BERT model in the example above this would be
depending on how you saved the model.
Adding the model description of a classifier is similar to
TextEmbeddingModels. With the methods
get_model_description you can provide a detailed
native) of your
classifier in English and the native language of your classifier. With
abstract_native you can
provide the corresponding abstract of your descriptions while
keywords_native take a vector
with the corresponding keywords.
eng="This classifier targets the realization of the need for competence from
the self-determination theory of motivation by Deci and Ryan in lesson plans
and materials. It describes a learner’s need to perceive themselves as capable.
In this classifier, the need for competence can take on the values 0 to 2.
A value of 0 indicates that the learners have no space in the lesson plan to
perceive their own learning progress and that there is no possibility for
self-comparison. At level 1, competence growth is made visible implicitly,
e.g. by demonstrating the ability to carry out complex exercises or peer
control. At level 2, the increase in competence is made explicit by giving
each learner insights into their progress towards the competence goal. For
example, a comparison between the target vs. actual development towards the
learning objectives of the lesson can be made, or the learners receive
explicit feedback on their competence growth from the teacher. Self-assessment
is also possible. The classifier was trained using 790 lesson plans, 298
materials and up to 1,400 textbook tasks. Two people who received coding
training were involved in the coding and the inter-coder reliability for the
need for competence increased from a dynamic iota value of 0.615 to 0.646 over
two rounds of training. The Krippendorffs alpha value, on the other hand,
decreased from 0.516 to 0.484. The classifier is suitable for use in all
settings in which lesson plans and materials are to be reviewed with regard
to their implementation of the need for competence.",
native="Dieser Classifier bewertet Unterrichtsentwürfe und Lernmaterial danach,
ob sie das Bedürfnis nach Kompetenzerleben aus der Selbstbestimmungstheorie
der Motivation nach Deci und Ryan unterstützen. Das Kompetenzerleben stellt
das Bedürfnis dar, sich als wirksam zu erleben. Der Classifer unterteilt es
in drei Stufen, wobei 0 bedeutet, dass die Lernenden im Unterrichtsentwurf
bzw. Material keinen Raum haben, ihren eigenen Lernfortschritt wahrzunehmen
und auch keine Möglichkeit zum Selbstvergleich besteht. Bei einer Ausprägung
von 1 wird der Kompetenzzuwachs implizit, also z.B. durch die Durchführung
komplexer Übungen oder einer Peer-Kontrolle ermöglicht. Auf Stufe 2 wird der
Kompetenzzuwachs explizit aufgezeigt, indem jede:r Lernende einen objektiven
Einblick erhält. So kann hier bspw. ein Soll-Ist-Vergleich mit den Lernzielen
der Stunde erfolgen oder die Lernenden erhalten dezidiertes Feedback zu ihrem
Kompetenzzuwachs durch die Lehrkraft. Auch eine Selbstbewertung ist möglich.
Der Classifier wurde anhand von 790 Unterrichtsentwürfen, 298 Materialien und
bis zu 1400 Schulbuchaufgaben traniert. Es waren an der Kodierung zwei Personen
beteiligt, die eine Kodierschulung erhalten haben und die
Inter-Coder-Reliabilität für das Kompetenzerleben würde über zwei
Trainingsrunden von einem dynamischen Iota-Wert von 0,615 auf 0,646 gesteigert.
Der Krippendorffs Alpha-Wert sank hingegen von 0,516 auf 0,484. Er eignet sich
zum Einsatz in allen Settings, in denen Unterrichtsentwürfe und Lernmaterial
hinsichtlich ihrer Umsetzung des Kompetenzerlebens überprüft werden sollen.",
abstract_eng="This classifier targets the realization of the need for
competence from Deci and Ryan’s self-determination theory of motivation in l
esson plans and materials. It describes a learner’s need to perceive themselves
as capable. The variable need for competence is assessed by a scale of 0-2.
The classifier was developed using 790 lesson plans, 298 materials and up to
1,400 textbook tasks. A coding training was conducted and the inter-coder
reliabilities of different measures
(i.e. Krippendorff’s Alpha and Dynamic Iota Index) of the individual categories
were calculated at different points in time.",
abstract_native="Dieser Classifier bewertet Unterrichtsentwürfe und
Lernmaterial danach, ob sie das Bedürfnis nach Kompetenzerleben aus der
Selbstbestimmungstheorie der Motivation nach Deci & Ryan unterstützen. Das
Kompetenzerleben stellt das Bedürfnis dar, sich als wirksam zu erleben. Der
Classifer unterteilt es in drei Stufen und wurde anhand von 790
Unterrichtsentwürfen, 298 Materialien und bis zu 1400 Schulbuchaufgaben
entwickelt. Es wurden stets Kodierschulungen durchgeführt und die
Inter-Coder-Reliabilitäten der einzelnen Kategorien zu verschiedenen
keywords_eng=c("Self-determination theory", "motivation", "lesson planning", "business didactics"),
keywords_native=c("Selbstbestimmungstheorie", "Motivation", "Unterrichtsplanung", "Wirtschaftsdidaktik")
In the case of a classifier, the description should include:
Again, you can provide your description in HTML to include tables (e.g. for reporting the reliability of the initial coding process) or links to other sources and publications.
Please do not report the performance values of your
classifier in the description. These can be accessed directly
With the methods
get_publication_info you can provide the bibliographic
information of your classifier.
In contrast to TextEmbeddingModels there are not different types of author groups.
Finally, you can manage the license for using your classifier via
Similar to TextEmbeddingModels your classifier has to be licensed via “GPL-3” since some of the software used for creating a classifier applies this license. For the documentation you can choose between further license since the documentation is not part of the software. For setting and receivinf the license you can call the methods ‘set_documentation_license’ and ‘get_documentation_license’
Now you are ready for sharing your classifier. Please remember to save your changes as described in the following section 3.2.
If you have created a classifier, saving and loading is very easy due
to the functions
load_ai_model. The process for saving a model is similar to
the process for text embedding models. You only have to pass the model
and a directory path to the function
folder name is set with
In contrast to text embedding models you can specify the additional
save_format. In the case of pytorch models this
arguments allows you to choose between
save_format = "safetensors" and
save_format = "pt". We recommend to chose
save_format = "safetensors" since this is a safer method to
save your models. In the case of tensorflow models this arguments allows
you to choose between
save_format = "keras",
save_format = "tf" and
save_format = "h5". We
recommend to chose
save_format = "keras" since this is the
recommended format by keras. If you set
save_format = "default" .safetensors is used for pytorch
models and .keras is used for tensorflow models.
If you would like to load a model you can call the function
Note: Classifiers depend on the framework which was used during creation. Thus, a classifier is always initalized with its original framework. The argument
ml_frameworkhas no effect.
In the case you would like to share your classifier with a broader
audience we recommend to set
append_ID = TRUE. This will create a folder name
automatically using the classifier’s name and unique ID. Similar to text
embedding models ID is added to the name during the creation of the
classifier ensuring a unique name for your model. With this options the
folder name may look like
If you would like to share your classifier with other persons you
have to provide all files within this folder
Since all files are stored with a specific structure do
not change or edit the files manually.
Please note that you need the
was used for training in order to predict new data with the classifier.
You can request the name, label, and configuration of that model with
if you would like to share your classifier, ensure that you also share
the corresponding text embedding model.
If you would like to apply your classifier to new data, two steps are
necessary. First, you must transform the raw text into a numerical
expression by using exactly the same text embedding model that
was used to train your classifier. The resulting object can be passed to
predict and you will receive the predictions
together with an estimate of certainty for each class/category.