Basic Stylometry Beta (early access)

Discussion about the New Testament, apocrypha, gnostics, church fathers, Christian origins, historical Jesus or otherwise, etc.
User avatar
Peter Kirby
Site Admin
Posts: 8021
Joined: Fri Oct 04, 2013 2:13 pm
Location: Santa Clara
Contact:

Re: Basic Stylometry Beta (early access)

Post by Peter Kirby »

Very good feedback. Thanks. Glad just to know anyone's actually interested in using it, other than me. Those all sound like good features. And 'automated word discovery' could itself lead to increased accuracy and/or decreased subjectivity. Thanks again for this valuable feedback.

Also yes if I loaded the Greek on the back end you could (a) save the wait on the upload and (b) use a 'get' request with all the data referenced in the URL. Meaning you could share results by URL.
"... almost every critical biblical position was earlier advanced by skeptics." - Raymond Brown
User avatar
Peter Kirby
Site Admin
Posts: 8021
Joined: Fri Oct 04, 2013 2:13 pm
Location: Santa Clara
Contact:

Re: Basic Stylometry Beta (early access)

Post by Peter Kirby »

The program right now is very raw. It has about 600 lines of Perl, written over 3 to 4 days or so. Point is, I am by no means opposed to putting more work into it.

Of course the biggest problems are more 'theoretical' or scientific than technical.... Ie, finding what techniques offer increased accuracy, better detection of unreliable results, and/or or allow results using less data.
"... almost every critical biblical position was earlier advanced by skeptics." - Raymond Brown
Aleph One
Posts: 95
Joined: Sun Nov 02, 2014 12:13 am

Re: Basic Stylometry Beta (early access)

Post by Aleph One »

Hey this is a tip that I'd venture other users could find useful:

Selected (highlighted) text in a program like Note/Wordpad can be unproblematically dragged right into any of the boxes in Peter's program.

To quickly select a single, entire line of text in (e.g.) Wordpad, triple click anywhere within it. (OR a single click in the empty margin to the left of the line's start will work too). This is especially helpful when adding the 'word formulas' you want into the program. Download and open Peter's "greek.txt" from this thread, then simply triple-click and drag each common word formula you want to use right into the appropriate boxes in the program. This method works especially well because each word formula (i.e. all the grammatical permutations of single base word) is on it's own (non word-wrapped) line, so even though a huge word formulas may appear to occupy many lines of text, triple-clicking will still select the entire word formula without problem.

And, as all should have learned in kindergarten: [Ctrl]+[A] is the keyboard shortcut for Select All! So adding a document into the program (when correctly formatted as Peter specified in this thread's original post, like "justin.txt" from the collection provided earlier in this thread, for example) is as easy as opening the text file, pressing [Ctrl]+[A], and dragging the highlighted body into your box-of-choice in the program.

If you narrow your program web browser window to just the left or right half of your monitor, and open a your text files in a Note/Wordpad window occupying the other half, simply clicking and dragging selections right into the program's boxes makes the whole process much faster and easier than you might expect!

This was just the way that worked best in my own case, of course, so YMMV!

Jeff

P.S.: So, what's this "Mac" word I keep hearing mean, again? :D
Last edited by Aleph One on Sun Jun 07, 2015 10:08 pm, edited 1 time in total.
User avatar
Peter Kirby
Site Admin
Posts: 8021
Joined: Fri Oct 04, 2013 2:13 pm
Location: Santa Clara
Contact:

Re: Basic Stylometry Beta (early access)

Post by Peter Kirby »

An advanced text editor like EditPlus is recommended for the time being. It has useful features such as "join"/"split" lines and the ability to 'pre-process' text in order to remove quotations, for example, with regular expressions. (I can share the regular expressions I'm using to remove quotes.)

https://www.editplus.com/

(IMO the best feature is really that EditPlus has never stalled, hung, or stuttered no matter how many MB's of text I throw into it...)

I agree with the general idea that Jeff is talking about. What I ended up doing is putting each author (or each sample) on a single line in my text files. When 'word wrap' is turned off in EditPlus, it becomes very easy to scroll through them and copy them.

And I haven't tried them personally but there are various extensions with form "Memory":

https://chrome.google.com/webstore/deta ... d?hl=en-US
https://chrome.google.com/webstore/deta ... fgno?hl=en

Thanks for sharing what you've learned here, Jeff.

Please let me know if you think of anything else.

(If the program is popular, I will want to spin up a new server--my blog becomes inaccessible if the program is busy churning through text!)
"... almost every critical biblical position was earlier advanced by skeptics." - Raymond Brown
Aleph One
Posts: 95
Joined: Sun Nov 02, 2014 12:13 am

Re: Basic Stylometry Beta (early access)

Post by Aleph One »

Peter Kirby wrote:I agree with the general idea that Jeff is talking about. What I ended up doing is putting each author (or each sample) on a single line in my text files.
Hell yea heh! :idea: That's a great idea! Put each document as one long line in your text file (along with the common word formulas and all, if you wish) and shuffling them in and out of the program to compare possibilities and different combinations would be quite easy.

Another potentially helpful tip I've found is that (at least in Chrome), you can right-click on the tab-header and use "Duplicate tab" in order to make an additional copy of whatever state you have program in right then (in other words, any text within the boxes IS duly duplicated, along with the program tab). And I don't think this causes any problems for the program's functioning (as far as I can tell).
User avatar
Peter Kirby
Site Admin
Posts: 8021
Joined: Fri Oct 04, 2013 2:13 pm
Location: Santa Clara
Contact:

Re: Basic Stylometry Beta (early access)

Post by Peter Kirby »

Peter Kirby wrote:(I can share the regular expressions I'm using to remove quotes.)
Put everything on a single line first!

This removes anything in fancy curly double quotes.

Code: Select all

“[^”]*”
This removes anything in fancy curly single quotes.

Code: Select all

‘[^’]*’
This removes anything in Greek-style / German-style quotes. Apply in this order. (this is the most common type in the TLG)

Code: Select all

«[^»]*»

Code: Select all

»[^»]*»
This removes anything in <diamond-shaped brackets>.

Code: Select all

<[^>]*>
This removes anything in [square brackets]. Maybe I could have done something simpler that works in EditPlus but IDK...

Code: Select all

\[[ A-Za-z0-9\\\/\(\)\*\|\.\:\;\,\–\=\+\'\{\}\"\<\>\“\”\«\»\?\!\&\†\#]*\]
(If you understand regular expressions and are wondering why I don't just use "lazy" regular expressions, EditPlus doesn't seem to have that feature.)

This removes any stray non-Greek text that is occasionally found in the TLG (with lower case letters). Enable 'case sensitive' first!

Code: Select all

[A-Z]*[a-z][A-Za-z]*
If it's the TLG/Diogenes that you don't have, the link is in the OP....
Another potentially helpful tip I've found is that (at least in Chrome), you can right-click on the tab-header and use "Duplicate tab" in order to make an additional copy of whatever state you have program in right then (in other words, any text within the boxes IS duly duplicated, along with the program tab). And I don't think this causes any problems for the program's functioning (as far as I can tell).
Good one. I didn't know that.
"... almost every critical biblical position was earlier advanced by skeptics." - Raymond Brown
User avatar
Ben C. Smith
Posts: 8994
Joined: Wed Apr 08, 2015 2:18 pm
Location: USA
Contact:

Re: Basic Stylometry Beta (early access)

Post by Ben C. Smith »

Peter, do you happen to have more text files available with Greek samples, like those samples you provided up the thread of Josephus, Origen, Mark, and Justin?
ΤΙ ΕΣΤΙΝ ΑΛΗΘΕΙΑ
User avatar
Peter Kirby
Site Admin
Posts: 8021
Joined: Fri Oct 04, 2013 2:13 pm
Location: Santa Clara
Contact:

Re: Basic Stylometry Beta (early access)

Post by Peter Kirby »

Ben C. Smith wrote:Peter, do you happen to have more text files available with Greek samples, like those samples you provided up the thread of Josephus, Origen, Mark, and Justin?
Yes. I will prepare a nice set of them, in a single text file formatted one to a line.
"... almost every critical biblical position was earlier advanced by skeptics." - Raymond Brown
User avatar
Ben C. Smith
Posts: 8994
Joined: Wed Apr 08, 2015 2:18 pm
Location: USA
Contact:

Re: Basic Stylometry Beta (early access)

Post by Ben C. Smith »

Peter Kirby wrote:
Ben C. Smith wrote:Peter, do you happen to have more text files available with Greek samples, like those samples you provided up the thread of Josephus, Origen, Mark, and Justin?
Yes. I will prepare a nice set of them, in a single text file formatted one to a line.
That would be most helpful! Thanks.

Ben.
ΤΙ ΕΣΤΙΝ ΑΛΗΘΕΙΑ
User avatar
Peter Kirby
Site Admin
Posts: 8021
Joined: Fri Oct 04, 2013 2:13 pm
Location: Santa Clara
Contact:

Re: Basic Stylometry Beta (early access)

Post by Peter Kirby »

Ben C. Smith wrote:
Peter Kirby wrote:
Ben C. Smith wrote:Peter, do you happen to have more text files available with Greek samples, like those samples you provided up the thread of Josephus, Origen, Mark, and Justin?
Yes. I will prepare a nice set of them, in a single text file formatted one to a line.
That would be most helpful! Thanks.

Ben.
Okay, here it is. :thumbup:
Attachments
greekcompendium.txt
(15.6 MiB) Downloaded 665 times
"... almost every critical biblical position was earlier advanced by skeptics." - Raymond Brown
Post Reply