java - RegEx for ["abc", ["123", "cba"]] -
i'm not strong in regex, appreciated.
i need parse such strings:
["text", "text", ["text",["text"]],"text"] and output should (4 strings):
text, text, ["text",["text"]], text i've tried pattern (\\[[^\\[,^\\]]*\\])|(\"([^\"]*)\"):
string data="\"aa\", \"aaa\", [\"bb\", [\"1\",\"2\"]], [cc]"; pattern p=pattern.compile("(\\[[^\\[,^\\]]*\\])|(\"([^\"]*)\")"); but output (quotes in output not critical):
"aa", "aaa", "bb", "1", "2", [cc] how improve regex?
i'm not sure regex able kind of stuff on own. here way though:
// data string string input = "\"aa\", \"a, aa\", [\"bb\", [\"1\", \"2\"]], [cc], [\"dd\", [\"5\"]]"; system.out.println(input); // char can't ever within data string char tempreplacement = '#'; // escape strings containing commas, e.g "hello, world", ["x, y", 42] while(input.matches(".*\"[^\"\\[\\]]+,[^\"\\[\\]]+\".*")) { input = input.replaceall("(\"[^\"\\[\\]]+),([^\"\\[\\]]+\")", "$1" + tempreplacement + "$2"); } // while there "[*,*]" substrings while(input.matches(".*\\[[^\\]]+,[^\\]]+\\].*")) { // replace nested "," chars replacement char input = input.replaceall("(\\[[^\\]]+),([^\\]]+\\])", "$1" + tempreplacement + "$2"); } // split string remaining "," (i.e. non nested) string[] split = input.split(","); list<string> output = new linkedlist<string>(); for(string s : split) { // replace replacement chars "," s = s.replaceall(tempreplacement + "", ","); s = s.trim(); output.add(s); } // syso system.out.println("split:"); for(string s : output) { system.out.println("\t" + s); } output:
"aa", "a, aa", ["bb", ["1", "2"]], [cc], ["dd", ["5"]] split: "aa" "a, aa" ["bb", ["1","2"]] [cc] ["dd", ["5"]] ps: code seems complex 'cause commented. here more concise version:
public static list<string> split(string input, char tempreplacement) { while(input.matches(".*\"[^\"\\[\\]]+,[^\"\\[\\]]+\".*")) { input = input.replaceall("(\"[^\"\\[\\]]+),([^\"\\[\\]]+\")", "$1" + tempreplacement + "$2"); } while(input.matches(".*\\[[^\\]]+,[^\\]]+\\].*")) { input = input.replaceall("(\\[[^\\]]+),([^\\]]+\\])", "$1" + tempreplacement + "$2"); } string[] split = input.split(","); list<string> output = new linkedlist<string>(); for(string s : split) { output.add(s.replaceall(tempreplacement + "", ",").trim()); } return output; } call:
string input = "\"aa\", \"a, aa\", [\"bb\", [\"1\", \"2\"]], [cc], [\"dd\", [\"5\"]]"; list<string> output = split(input, '#');
Comments
Post a Comment