Understanding the Originality Index Calculation

  • Updated

The Originality Index reflects the level of originality concerns identified within a document. It highlights the extent to which content may require further review by presenting results as either a percentage or a risk classification, such as Low, Medium, or High. This approach enables educators to quickly and clearly assess the originality status of a submission.

This article outlines how the Originality Index is calculated for checked submissions, providing a clear explanation of the purpose and interpretation of the results. This helps to understand how originality is evaluated and how the outcomes support effect review and decision-making.

How are sentences flagged for Similarity?

Inspera Originality examines the following components when assessing document text for potential similarity to matching sources:

  • Textual Similarity - Examines sentence structure and formulation to assess the degree of similarity to a sentence in an external source.
  • Contextual Similarity - Analyzes a sentence in the document to assess the degree of similarity in overall meaning and idea to a sentence in a matching source. This includes identifying the use of synonyms and paraphrasing, ensuring broader coverage when the original meaning is preserved despite changes in how it is expressed
  • Word Proximity - Analyzes sentences to determine the average closeness between similar words in the sentence from the document and a sentence in an external source. 

The approach goes beyond basic text analysis to identify the underlying meaning and context within the content.

In addition to the elements outlined above:

  • Inspera Originality also applies advanced lemmatization techniques to ensure accurate identification of the root forms of words. This is essential for precise and meaningful language analysis. 
  • Inspera Originality performs fuzziness analysis to evaluate the degree of ambiguity and clarity within the text. This helps assess the precision of the language used and supports a more nuanced understanding of the content.

Understanding the Calculation Logic

The Originality Index is calculated based on the number of sentences highlighted in the Originality Report. 

To illustrate, consider a document composed of 10 sentences, each Originality Index Calculation weighted equally at 10%. The sentence weight refers to the impact a sentence has on the Originality Index, which is determined by the sentence’s assigned:

  • Similarity percentage
  • Contextual similarity percentage
  • Word proximity
  • Possible inclusion of manipulations
  • Overall classification (Exact Match, Possibly Altered Text, Contextual Similarity).

For example, in a document consisting of 10 sentences, each sentence represents 10% of the total Originality Index. If a sentence is flagged or highlighted due to any issue identified during the originality check, it contributes its full weight (10%) to the document’s overall score. When a sentence is flagged for multiple issues, it is counted only once and does not increase its contribution.

Certain elements are excluded from the Originality Index calculation. These include non-textual content, such as images used in place of text, as well as results derived from the metadata analysis.

 Logic Behind Originality Index Calculation
Logic Behind Originality Index Calculation

In summary, the Originality Index is calculated considering the number of flagged sentences, with the proportional weight of each flagged sentence comprising the overall Originality Index assigned to a submitted document.

Impacting Elements in Originality Index Calculation

  1. Threshold Configurations

    By default, Inspera Originality applies predefined percentage thresholds to classify a document’s originality score into one of three risk levels: Low, Medium, or High.

    • 0–20% Document classified as Low Risk
    • 20–80% Document classified as Medium Risk
    • 80–100% Document classified as High Risk

    These default thresholds provide a standardized and well-tested framework. 

    However, administrators have the flexibility to override these settings and define custom thresholds within the institutional configuration to better align with specific needs and originality assessment practices. 

    Skjermbilde 2026-03-23 124940.png
  2. Sentence Threshold Configurations

    In addition to configuring thresholds for document classification, Inspera Originality also provides threshold configuration at the sentence level. These thresholds determine whether individual sentences are flagged as exact matches, possibly altered text, or contextual similarity

    Skjermbilde 2026-03-23 125355.png
  3. Original Language Similarity Flags

    Sentences flagged for Original Language Similarity contribute to the attributed Originality Index for a submitted document. A flagged sentence can be classified as Exact Match, Possibly Altered Text, or Contextual Similarity. A sentence's contribution to the Originality Index varies based on the level of detected sentence and the classification assigned to it.

    In contrast, exclude actions in relation to Original Language Similarity also impact the attributed Originality Index.

    • Excluding a matching source in the Summary View for Original Language Similarity will affect the attributed Originality Index 
    • Excluding a matching source in the In-Depth View for Original Language Similarity will affect the attributed Originality Index. 
    • Excluding a matching sentence flagged for Original Language Similarity will affect attributed Originality Index

    The percentage at which a flagged sentence is classified as Exact Match, Possibly Altered Text, or Contextual Similarity depends on the configuration set by the administrator at the institutional level.

  4. Translated Language Similarity Flags

    Sentences flagged for Translated Language Similarity contribute to the attributed Originality Index for a submitted document. A flagged sentence can be classified as Exact Match, Possibly Altered Text, or Contextual Similarity. A sentence's contribution to the Originality Index varies based on the level of detected sentence and the classification assigned to it.

    In contrast, exclude actions in relation to Translated Language Similarity also impact the attributed Originality index.

    • Excluding a matching source in the Summary View for Translated Language Similarity will affect the attributed Originality Index 
    • Excluding a matching source in the In-Depth View for any of the languages included in Translated Language Similarity will affect the attributed Originality Index. 
    • Excluding a matching sentence flagged for any of the languages included in Translated Language Similarity will affect the attributed Originality Index.

    The percentage at which a flagged sentence is classified as Exact Match, Possibly Altered Text, or Contextual Similarity depends on the configuration set by the administrator at the institutional level.

  5. AI Prediction Flags

    It is important to note that the AI Prediction feature in Inspera Originality also contributes to the Originality Index assigned to a submitted document. 

    Sentences flagged by AI Prediction cannot be excluded. This means the impact of AI Prediction remains static. The Originality Index may vary depending on the extent of AI Prediction detected.

  6. Manipulation Flags

    In addition to Original and Translated Language similarity flags, sentences can also be flagged for manipulations, namely character replacements and hidden text. Such manipulations also affect the attributed originality. In contrast, excluding manipulations will also affect the document’s Originality Index.

FAQ

  1. When a sentence is flagged as Possibly Altered Text, it indicates that the sentence was likely paraphrased but still displays similarity in overall formulation and structure to the sentence it was matched against. 

    In contrast, Contextual Similarity signifies that the sentence shares a similar overall meaning or idea with the matched sentence, even if its formulation and structure differ.

  2. Unlike the Originality Report, the Offline Report displays all similarity matches, including those that account for a very small percentage of similarity (e.g., 0.97%). These matching sources are available in the Originality Report only under a filtered view. As a result, it may appear in some cases that the Offline Report contains more matching sources than the Originality Report itself. 

    However, this issue can be mitigated by using the Excluded Source Criteria feature at the assignment level. This allows Educators to define a percentage threshold that excludes matching sources below a specified similarity percentage from consideration entirely.

  3. These types of originality score variations do not occur frequently and are caused by minor differences that arise during the analysis process. For example, if the same document is submitted multiple times, the analyzer may occasionally identify one additional source in one instance and one fewer source in another. 

    Additionally, shorter sentences or those of lesser relevance are not taken into account, which also contributes to the occasional discrepancy. This can result in a slight variation in the originality score for a particular submission. Nonetheless, any variation in the overall originality score is typically minimal.

  4. Regardless of the originality issues sentences may be flagged for, they each hold a different weight. Meaning they impact the attributed Originality Index in different ways. For example, a flagged sentence containing 8 words may contribute 6%, whereas another one with 22 words may contribute 13%. 

    In general, longer sentences or more substantial matches tend to have a greater impact on the Originality Index. Nonetheless, it’s important to highlight that impact does not rely solely on sentence word count, as matches are affected by additional factors during the originality check.

  5. Sentences flagged for similarity in the Originality Report may have more than one matching source. Meaning that similarity has been detected against multiple sources. 

    As a result, excluding the primary source does not always remove the sentence from consideration, as the applicable secondary source effectively takes the place of the primary source in the Originality Report for the flagged sentence.

Was this article helpful?

0 out of 0 found this helpful