Eu quero criar um sed
comando que removerá todos esses caracteres estranhos de um determinado documento:
sed -n 's/\|®MD-IT¯\|®MD\+BO¯\|®MDNM¯®LL\.8LI,0LI¯\|®LL0LI,0LI¯\|®MD\+IT¯\|®LL.8LI,0LI¯®MDIT¯\|®MDNM¯®FL¯®LL.8LI,0LI¯\|®FL¯®MD-BO¯\|®FL¯®MD-BO¯\|®MD-BO¯\|¯®OF1IN,1IN¯®FC¯®LL1LI,0LI¯\|\|®SF1,1¯\|®FM1FT=0LI,LR=1;\|®MDSU¯®FN1¯\|®MDNM¯¯\|®IV-RTF\|\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\|¯®BF0¯\|®FS1\|-------------------------------------\|¯®FW1\|\|//gp'
Esses códigos foram todos criados em outro aplicativo Nota Bene
e tenho muitos arquivos com esses códigos que gostaria de converter em texto simples e possivelmente até remarcação.
O problema é que os caracteres não são substituídos. Eu tentei fazer isso Sublime Text
e consegui remover o documento usando find-replace (regex). Seria melhor para mim criar um sed
script do que usar Sublime
para essa tarefa.
Eu também tentei usar Ed
, mas também não pegou as substituições.
Aqui está um arquivo nb de amostra quando aberto em `Sublime Text:
®SSDEFAULTS¯®LR1¯®JU¯®MD+BO¯®UFTimes New Roman¯®SZ12Pt¯Glossary®MD+BO¯®TS.5IN,1IN,1.5IN,2IN,2.5IN,3IN,3.5IN,4IN,4.5IN,5IN,5.5IN,6IN¯ ®MD-BO¯
®NJ¯®LR1¯®LL.5LI,0LI¯®MD+BO¯®LL0LI,0LI¯®MDNM¯®LR1¯®LL.5LI,0LI¯A fortiori proposition: If X is true, then how much greater is Y true? To move logically from a stronger argument to establish a weaker argument. The weaker argument is sometimes presented by the speaker as the stronger argument.
®LL0LI,0LI¯®LR1¯®LL.5LI,0LI¯®LL0LI,0LI¯®LR1¯®LL.5LI,0LI¯Accusative of motion/direction - Indicates movement to the noun marked by the accusative and is to be distinguished from the accusative of local determination which indicates location without motion (Joüon and Muraoka 2006, 428).
Anadiplosis - A figure of speech in which the word that a colon ends with, or a like sounding word, is the word that begins the next colon ®GC|CI:R#=47;AU=Brown, Raymond E.;YR=1990;TI=New Jerome biblical commentary;PG=245;XT=;F[=;F]=;F#=;ID=;XX=Print;CT=;FL=¯(Brown, Fitzmyer, Murphy, et al. 1990, 245)®GC¯.
®LL0LI,0LI¯®LR1¯®LL.5LI,0LI¯®LL0LI,0LI¯®LR1¯®LL.5LI,0LI¯Anaphoric use of the article - When the article is used to indicate that the word to which it is attached is the one previously mentioned (Williams and Beckman 2007, 36).
®LL0LI,0LI¯®LR1¯®LL.5LI,0LI¯®LL0LI,0LI¯®LR1¯®LL.5LI,0LI¯Anaptyxis - The insertion of a vowel into a word to avoid a consonant cluster.
®LL0LI,0LI¯®LR1¯®LL.5LI,0LI¯®LL0LI,0LI¯®LR1¯®LL.5LI,0LI¯Aoristic perfect - I use the phrase 'aoristic perfect' to refer to one of the ways the qatal form can be rendered into English. Aoristic perfect denotes a past situation the implications of which are no longer felt in the present. The situation may have extended over a period of time and it may have occurred more than once. It may have occurred in the recent or distant past but from the standpoint of the speaker it is to be regarded as a fact having occurred and hence as a fact belonging to the past (Joüon and Muraoka 2006, 337; Driver 1998, 12). The term 'aoristic perfect' and indeed the other categorizations of perfect in this grammar, all relate to the interpretation of qatal verbs in their given contexts. The qatal form in and of itself does not convey these meanings.
®LL0LI,0LI¯®LR1¯®LL.5LI,0LI¯®LL0LI,0LI¯®LR1¯®LL.5LI,0LI¯Beth essentiae - ®LAHebrew¯ÿHá®LAEnglish¯ that is used to indicate the predicate of a clause or a word used predicatively (Joüon and Muraoka 2006, 458).
É assim que eu gostaria que o texto fosse lido:
Glossary
A fortiori proposition: If X is true, then how much greater is Y true? To move logically from a stronger argument to establish a weaker argument. The weaker argument is sometimes presented by the speaker as the stronger argument.
Accusative of motion/direction - Indicates movement to the noun marked by the accusative and is to be distinguished from the accusative of local determination which indicates location without motion (Joüon and Muraoka 2006, 428).
Anadiplosis - A figure of speech in which the word that a colon ends with, or a like sounding word, is the word that begins the next colon (Brown, Fitzmyer, Murphy, et al. 1990, 245).
Anaphoric use of the article - When the article is used to indicate that the word to which it is attached is the one previously mentioned (Williams and Beckman 2007, 36).
Anaptyxis - The insertion of a vowel into a word to avoid a consonant cluster.
Aoristic perfect - I use the phrase 'aoristic perfect' to refer to one of the ways the qatal form can be rendered into English. Aoristic perfect denotes a past situation the implications of which are no longer felt in the present. The situation may have extended over a period of time and it may have occurred more than once. It may have occurred in the recent or distant past but from the standpoint of the speaker it is to be regarded as a fact having occurred and hence as a fact belonging to the past (Joüon and Muraoka 2006, 337; Driver 1998, 12). The term 'aoristic perfect' and indeed the other categorizations of perfect in this grammar, all relate to the interpretation of qatal verbs in their given contexts. The qatal form in and of itself does not convey these meanings.
|> sed -n l Glossary.NB
\256SSDEFAULTS\257\256LR1\257\256JU\257\256MD+BO\257\256UFTimes New R\
oman\257\256SZ12Pt\257Glossary\256MD+BO\257\256TS.5IN,1IN,1.5IN,2IN,2\
.5IN,3IN,3.5IN,4IN,4.5IN,5IN,5.5IN,6IN\257\t\256MD-BO\257\r$
\256NJ\257\256LR1\257\256LL.5LI,0LI\257\256MD+BO\257\256LL0LI,0LI\257\
\256MDNM\257\256LR1\257\256LL.5LI,0LI\257A fortiori proposition: If X\
is true, then how much greater is Y true? To move logically from a s\
tronger argument to establish a weaker argument. The weaker argument \
is sometimes presented by the speaker as the stronger argument.\r$
\256LL0LI,0LI\257\256LR1\257\256LL.5LI,0LI\257\256LL0LI,0LI\257\256LR\
1\257\256LL.5LI,0LI\257Accusative of motion/direction - Indicates mov\
ement to the noun marked by the accusative and is to be distinguished\
from the accusative of local determination which indicates location \
without motion (Jo\374on and Muraoka 2006, 428).\r$
Anadiplosis - A figure of speech in which the word that a colon ends \
with, or a like sounding word, is the word that begins the next colon\
\256GC|CI:R#=47;AU=Brown, Raymond E.;YR=1990;TI=New Jerome biblical \
commentary;PG=245;XT=;F[=;F]=;F#=;ID=;XX=Print;CT=;FL=\257(Brown, Fit\
zmyer, Murphy, et al. 1990,\240245)\256GC\257.\r$
\256LL0LI,0LI\257\256LR1\257\256LL.5LI,0LI\257\256LL0LI,0LI\257\256LR\
1\257\256LL.5LI,0LI\257Anaphoric use of the article - When the articl\
e is used to indicate that the word to which it is attached is the on\
e previously mentioned (Williams and Beckman 2007, 36). \r$
\256LL0LI,0LI\257\256LR1\257\256LL.5LI,0LI\257\256LL0LI,0LI\257\256LR\
1\257\256LL.5LI,0LI\257Anaptyxis - The insertion of a vowel into a wo\
rd to avoid a consonant cluster.\r$
\256LL0LI,0LI\257\256LR1\257\256LL.5LI,0LI\257\256LL0LI,0LI\257\256LR\
1\257\256LL.5LI,0LI\257Aoristic perfect - I use the phrase 'aoristic \
perfect' to refer to one of the ways the qatal form can be rendered i\
nto English. Aoristic perfect denotes a past situation the implicatio\
ns of which are no longer felt in the present. The situation may have\
extended over a period of time and it may have occurred more than on\
ce. It may have occurred in the recent or distant past but from the s\
tandpoint of the speaker it is to be regarded as a fact having occurr\
ed and hence as a fact belonging to the past (Jo\374on and Muraoka 20\
06, 337; Driver 1998, 12). The term 'aoristic perfect' and indeed the\
other categorizations of perfect in this grammar, all relate to the \
interpretation of qatal verbs in their given contexts. The qatal form\
in and of itself does not convey these meanings. \r$
\256LL0LI,0LI\257\256LR1\257\256LL.5LI,0LI\257\256LL0LI,0LI\257\256LR\
1\257\256LL.5LI,0LI\257Beth essentiae - \256LAHebrew\257\377H\341\256\
LAEnglish\257 that is used to indicate the predicate of a clause or a\
word used predicatively (Jo\374on and Muraoka 2006, 458).\r$
\256LL0LI,0LI\257\256LR1\257\256LL.5LI,0LI\257\256LL0LI,0LI\257\256LR\
1\257\256LL.5LI,0LI\257Classic perfect - I use the phrase 'classic pe\
rfect' to refer to one of the ways the qatal form can be rendered int\
o English. Classic perfect refers to the continuing present relevance\
of a past situation from the perspective of the speaker (Comrie 1976\
, 52). By perfect I do not necessarily imply that a previous situatio\
n has resulted in a state but that the situation has implications rel\
evant to the present. The situation is not merely past and over but s\
omehow persists and continues to intrude into the present. Such verbs\
are usually translated into English using the perfect or present ten\
se. I have included under this definition quasi-stative verbs which r\
efer to attributes which were acquired before, but which are assumed \
to continue in some way up to the present moment (Driver 1998, 11; Jo\
\374on and Muraoka 2006, 333; Waltke and O'Connor 1990, 487). In some\
grammars these are treated separately. However, that creates too man\
y functions for the one perfect form. The term 'classic perfect' and \
indeed the other categorizations of perfect in this grammar all relat\
e to the \256MD+IT\257interpretation \256MD-IT\257of qatal verbs in t\
heir given contexts. The qatal form by itself does not convey these m\
eanings.\r$
\256LL0LI,0LI\257\256LR1\257\256LL.5LI,0LI\257\256LL0LI,0LI\257\256LR\
1\257\256LL.5LI,0LI\257Cohortative of praise. The cohortative is ofte\
n used in Psalms to indicate that praise, freely undertaken, has begu\
n. This usage is close to the cohortative of resolve but not identica\
l with it. The emphasis falls not on what the writer is intending to \
do, but what he has already undertaken. \r$
Cohortative of resolve - The cohortative mood normally expresses the \
will of the speaker, but when the speaker has the ability to carry ou\
t what he wants it takes on the coloring of resolve (Van der Merwe et\
al. 1997, 152; Waltke and O'Connor 1990, 573).\r$
\256LL0LI,0LI\257\256LR1\257\256LL.5LI,0LI\257\256LL0LI,0LI\257\256LR\
1\257\256LL.5LI,0LI\257Concluding \256LAHebrew\257\377h\353\377H\351\
\256LAEnglish\257 - A special use of the word \256LAHebrew\257\377h\
\353\377H\351\256LAEnglish\257 found towards the end of several Psalm\
s and approximating in meaning to: the conclusion of the matter is th\
at\205\r$
\256LL0LI,0LI\257\256LR1\257\256LL.5LI,0LI\257\256LL0LI,0LI\257\256LR\
1\257\256LL.5LI,0LI\257Conjunctive waw - Waw used to connect clauses \
Sed também pode ser usado como script (mais fácil de desenvolver): crie um arquivo "nb2txt" com
e:
Sua expressão regular usa
\|
(padrão alternativo no GNUsed
, barra literal na maioria das outrassed
implementações) e\+
(uma ou mais ocorrências no GNUsed
, literal+
na maioria das outrassed
implementações). Se você usa GNUsed
, este padrão excluirá qualquer padrão como®MD-IT¯
ou®MDDDDDBO¯
. Se você estiver usando umased
implementação diferente, provavelmente não encontrará nenhuma correspondência.Melhor usar expressões regulares estendidas, suportadas pela maioria das
sed
versões por anos:Também sugiro remover as alternativas vazias (
\|
no início e no final do padrão), embora elas não prejudiquem neste caso.E o infinito
\.\.\.\.\.\.\.\.\.\.\.\.
e----
deve ser substituído por\.{42}
e-{23}
com o número real de pontos ou traços. Ou talvez\-{10,}
para se livrar de qualquer ocorrência de 10 ou mais pontos.A partir da
sed -n l
listagem fica claro que você tem um arquivo com muitos caracteres 174 (em decimal ou 256 em octal) e [caracter 175] (em decimal), ou 257 (em octal). Listado como\256
e\257
que pode ser interpretado como Unicode\xae
(código hexadecimalae
-ou256
em octal-) ou just®
, se interpretado como um caractere de "um byte", e Unicode\xaf
(código hexadecimalaf
-ou257
em octal) ou just¯
, se interpretado como um caractere único caractere byte,Se você usar utf8 como a codificação padrão (o usual no Linux).
E que parecem
start
eend
alguma codificação interna dos.nb
arquivos. A remoção de strings que começam\xae
e terminam com\xaf
parece nos deixar um passo mais perto de sua solicitação: