Ruby extract data from string using regex -
i'm doing web scraping, format data
sr.no. course_code course_name credit grade attendance_grade the actual string receive of following form
1 ca727 principles of compiler design 3 m the things interested in course_code, course_name , grade, in example values be
course_code : ca727 course_name : principles of compiler design grade : is there way me use regular expression or other technique extract information instead of manually parsing through string. i'm using jruby in 1.9 mode.
let's use ruby's named captures , self-describing regex!
course_line = / ^ # starting @ front of string (?<srno>\d+) # capture 1 or more digits; call result "srno" \s+ # eat whitespace (?<code>\s+) # capture non-whitespace can; call "code" \s+ # eat whitespace (?<name>.+\s) # capture as can # (while letting rest of regex still work) # make sure end non-whitespace character. # call "name" \s+ # eat whitespace (?<credit>\s+) # capture non-whitespace can; call "credit" \s+ # eat whitespace (?<grade>\s+) # capture non-whitespace can; call "grade" \s+ # eat whitespace (?<attendance>\s+) # capture non-whitespace; call "attendance" $ # make sure we're @ end of line /x str = "1 ca727 principles of compiler design 3 m" parts = str.match(course_line) puts " course code: #{parts['code']} course name: #{parts['name']} grade: #{parts['grade']}".strip #=> course code: ca727 #=> course name: principles of compiler design #=> grade:
Comments
Post a Comment