How to find formatting of a subset of text in an Excel document cell

Question

How to find formatting of a subset of text in an Excel document cell

Using Python, I need to find all the substrings in the given cell of the Excel worksheet, in bold or italics.

My problem is similar to this:

Using an XLRD module and Python to determine the font style of a cell (italic or not)

.. but the solution is not applicable for me, since I cannot assume that the same formatting is used for all content in the cell. A value in one cell might look like this:

1. Some bold fonts Some normal texts. Some italic texts.

Is there a way to find the character range formatting in a cell using xlrd (or any other Python Excel module)?

+5

python xlrd

westmark Sep 11 '12 at 14:15

source share

3

xlrd . load_workbook() kwarg formatting_info=True, rich_text_runlist_map, ( (row, col) ) . (offset, font_index), offset , , font_index font_list ( - , load_workbook()), , , , , , ..

+4

Vyassa Baratham 10 . '16 16:56

I don’t know if you can do this using xlrd, but since you are asking about any other Excel module Python: openpyxl cannot do this in version 1.6.1.

Rich text is restored in function get_string()c openpyxl/reader/strings.py. It would be relatively easy to set up a second table with raw rows in this module.

+2

Anthon Mar 19 '13 at 15:46

source share

Greg Sadetsky · Accepted Answer · 2016-06-12T15:35:26+0000

@Vyassa , XLS "" (, ) "" (, , ).

import xlrd

# accessing Column 'C' in this example
COL_IDX = 2

book = xlrd.open_workbook('your-file.xls', formatting_info=True)
first_sheet = book.sheet_by_index(0)

for row_idx in range(first_sheet.nrows):
  text_cell = first_sheet.cell_value(row_idx, COL_IDX)
  text_cell_xf = book.xf_list[first_sheet.cell_xf_index(row_idx, COL_IDX)]

  # skip rows where cell is empty
  if not text_cell:
    continue
  print text_cell,

  text_cell_runlist = first_sheet.rich_text_runlist_map.get((row_idx, COL_IDX))
  if text_cell_runlist:
    print '(cell multi style) SEGMENTS:'
    segments = []
    for segment_idx in range(len(text_cell_runlist)):
      start = text_cell_runlist[segment_idx][0]
      # the last segment starts at given 'start' and ends at the end of the string
      end = None
      if segment_idx != len(text_cell_runlist) - 1:
        end = text_cell_runlist[segment_idx + 1][0]
      segment_text = text_cell[start:end]
      segments.append({
        'text': segment_text,
        'font': book.font_list[text_cell_runlist[segment_idx][1]]
      })
    # segments did not start at beginning, assume cell starts with text styled as the cell
    if text_cell_runlist[0][0] != 0:
      segments.insert(0, {
        'text': text_cell[:text_cell_runlist[0][0]],
        'font': book.font_list[text_cell_xf.font_index]
      })

    for segment in segments:
      print segment['text'],
      print 'italic:', segment['font'].italic,
      print 'bold:', segment['font'].bold

  else:
    print '(cell single style)',
    print 'italic:', book.font_list[text_cell_xf.font_index].italic,
    print 'bold:', book.font_list[text_cell_xf.font_index].bold

How to find formatting of a subset of text in an Excel document cell

More articles: