1000字范文,内容丰富有趣,学习的好帮手!
1000字范文 > java正则包含特殊字符_java – 捕获由特殊字符嵌套/包含的正则表达式组

java正则包含特殊字符_java – 捕获由特殊字符嵌套/包含的正则表达式组

时间:2021-08-26 21:33:11

相关推荐

java正则包含特殊字符_java  – 捕获由特殊字符嵌套/包含的正则表达式组

我正在尝试搜索出现在波浪号(〜)符号边框内的单词.

e.g. ~albert~ is a ~good~ boy.

我知道这可以通过使用〜来实现. ?〜,它已经适合我了.但是有些特殊情况需要匹配嵌套的波浪语句子.

e.g. ~The ~spectacle~~ was ~broken~

在上面的例子中,我必须分别捕捉’The Spectacle’,’spectacle’和’broken’.这些将被逐字翻译或随附文章(An,The,whatever)翻译.原因是在我的系统中:

1) 'The spectacle' requires a separate translation on a specific cases.

2) 'Spectacle' also needs translation on specific cases.

3) IF a tranlsation exist for The spectacle, we will use that, ELSE

we will use

另一个解释这个的例子是:

~The ~spectacle~~ was ~borken~, but that was not the same ~spectacle~

that was given to ~me~.

在上面的例子中,我将翻译为:

1) 'The spectacle' (because the translation case exists for 'The spectacle', otherwise I would've only translated spectacle on it's own)

2) 'broken'

3) 'spectacle'

4) me

我在组合表达式时遇到问题,该表达式将确保在我的正则表达式中捕获它.到目前为止,我设法与之合作的是’〜. ?〜”.但我知道,通过某种形式的前瞻或外观,我可以让它发挥作用.有人可以帮我吗?

其中最重要的方面是回归校对,这将确保现有的东西不会破坏.如果我设法做到了,我会发布它.

注:如果它有帮助,目前我将有只有一级嵌套需要分解的实例.所以~~眼镜~~将是最深层次的(直到我需要更多!!!!!)

解决方法:

我刚才写过这样的东西,虽然我没有测试过:

(~(?(?=.*?~~.*?~).*?~.*?~.*?~|[^~]+?~))

要么

(~(?(?=.*?~[A-Za-z]*?~.*?~).*?~.*?~.*?~|[^~]+?~))

另一种选择

(~(?:.*?~.*?~){0,2}.*?~)

^^ change to max depth

这是最好的

要添加更多,请在您看到一堆的两个地方添加一些额外的.*?〜.

主要问题

如果我们允许无限制嵌套我们怎么知道它将在何处结束并开始?笨拙的图表:

~This text could be nested ~ so could this~ and this~ this ~Also this~

| | |_________| | |

| |_______________________________| |

|____________________________________________________________________|

要么:

~This text could be nested ~ so could this~ and this~ this ~Also this~

| | | | |_________|

| |______________| |

|___________________________________________________|

编译器不知道选择哪个

为你的句子

~The ~spectacle~~ was ~broken~, but that was not the same ~spectacle~ that was given to ~me~.

| | ||_____| | | |

| | |_____________| | |

| |____________________________________________________| |

|___________________________________________________________________|

要么:

~The ~spectacle~~ was ~broken~, but that was not the same ~spectacle~ that was given to ~me~.

| |_________|| |______| |_________| |__|

|_______________|

我该怎么办?

使用交替字符(如@tbraun建议的那样),以便编译器知道从哪里开始和结束:

{This text can be {properly {nested}} without problems} because {the compiler {can {see {the}}} start and end points} easily. Or use a compiler:

注意:我不做Java太多,所以有些代码可能不正确

import java.util.List;

String[] chars = myString.split('');

int depth = 0;

int lastMath = 0;

List results = new ArrayList();

for (int i = 0; i < chars.length; i += 1) {

if (chars[i] === '{') {

depth += 1;

if (depth === 1) {

lastIndex = i;

}

}

if (chars[i] === '}') {

depth -= 1;

if (depth === 0) {

results.add(StringUtils.join(Arrays.copyOfRange(chars, lastIndex, i + 1), ''));

}

if (depth < 0) {

// Balancing problem Handle an error

}

}

}

标签:java,regex

来源: https://codeday.me/bug/0708/1400600.html

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。