A fused forensic text comparison system using lexical features, word and character N-grams

Shunichi Ishihara

doi:10.1109/icacci.2014.6968504

ScienceGate Book Chapters

JOURNAL ARTICLE

A fused forensic text comparison system using lexical features, word and character N-grams

Shunichi Ishihara

Year: 2014 Vol: 2665 Pages: 2762-2768

DOI: 10.1109/icacci.2014.6968504

Get Full-Text PDF Get Analytical Report

Abstract

This study investigates the degree that the performance of a likelihood ratio (LR)-based forensic text comparison (FTC) system improves by using logistic-regression fusion on LRs that were separately estimated by three different procedures, involving lexical features, word-based N-grams and character-based N-grams. This study uses predatory chatlog messages. The number of words used for modelling each group of messages is 500 words. The performance of the FTC system is assessed in terms of its validity (= accuracy) and reliability (= precision) using the log-likelihood-ratio cost (C<inf>llr</inf>) and 95% credible intervals (CI), respectively. It is demonstrated that 1) out of the three procedures, the lexical features procedure performed best in terms of C<inf>llr</inf>; and that 2) the fused system outperformed all three of the single procedures. The C<inf>llr</inf> value of the fused system is better than that of the procedure with lexical features by a value of 0.14. It is also reported that the validity and reliability of a system is negatively correlated; the fused system that yielded the best result in terms of C<inf>llr</inf> has the worst CI value.

Keywords:

Character (mathematics) Computer science Reliability (semiconductor) Artificial intelligence Word (group theory) Value (mathematics) Natural language processing Mathematics Machine learning Power (physics)

Metrics

Cited By

0.48

FWCI (Field Weighted Citation Impact)

Refs

0.74

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Authorship Attribution and Profiling

Physical Sciences → Computer Science → Artificial Intelligence

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

A fused forensic text comparison system using lexical features, word and character N-grams

Abstract

Metrics

Citation History

Topics

Related Documents

A Comparative Study of Likelihood Ratio Based Forensic Text Comparison Procedures: Multivariate Kernel Density with Lexical Features vs. Word N-grams vs. Character N-grams

Using Word N-Grams as Features in Arabic Text Classification

Complex Word Identification Using Character n-grams

UCD : Diachronic Text Classification with Character, Word, and Syntactic N-grams

Empirical Evaluations Using Character and Word N-Grams on Authorship Attribution for Telugu Text